Research Project

Explainable AI for Intrusion Detection

Multi-dataset evaluation using SHAP, LIME, and a novel XAI Confidence Score across CIC-IDS-2017, UNSW-NB15, and CSE-CIC-IDS-2018.

3
Datasets
4
Models
99.7%
Best Accuracy
0.216
Cross-Dataset Jaccard
10M+
Records Evaluated

Benchmark Datasets

Three intrusion detection datasets with varying complexity and attack types.

Multi-class

CIC-IDS-2017

14 classes including BENIGN, DoS, DDoS, Bot, Web attacks, and infiltration. Generated by CICFlowMeter from pcap captures.

78 features
2.8M records
14 classes
Multi-class

UNSW-NB15

10 classes with diverse attack types including Analysis, Backdoor, Exploits, Fuzzers, Reconnaissance, and Worms.

45 features
2.5M records
10 classes
Binary

CSE-CIC-IDS-2018

Binary classification of normal traffic vs DDoS attacks using LOIC-HTTP tool. Extension of the 2017 dataset.

80 features
10M records
2 classes

Why XAI Matters in Cybersecurity

Black-box models aren't enough for security operations.

🔍

Trust Decisions

Security analysts need to understand WHY traffic is flagged before taking action. XAI provides transparent reasoning.

⚖️

Reduce False Positives

Understanding feature contributions helps analysts quickly dismiss false alarms and focus on real threats.

📊

Regulatory Compliance

Many regulations require explainable AI decisions. XCS provides quantifiable explanation reliability scores.

🛡️

Adversarial Defense

Understanding model decisions helps identify potential adversarial attacks targeting the IDS system.

Architecture

End-to-end pipeline from raw traffic to explainable predictions.

XAI-IDS Architecture Diagram
XAI-IDS Pipeline Diagram
01

Data Ingestion

Raw PCAP files processed by CICFlowMeter to extract 78 network flow features.

02

Feature Selection

Top 20 features selected by XGBoost importance. RobustScaler applied (no data leakage).

03

Model Training

XGBoost, Random Forest, LightGBM trained with SMOTE. VotingEnsemble combines all three.

04

Explainability

SHAP for global importance, LIME for local explanations. XCS measures reliability.

How It Works

From raw network traffic to explainable predictions.

01

Data Collection

Network flows captured from CIC-IDS-2017, UNSW-NB15, and CSE-CIC-IDS-2018 datasets with 78+ features per flow.

02

Model Training

XGBoost, Random Forest, LightGBM, and VotingEnsemble trained on 20 selected features with SMOTE for class balance.

03

SHAP + LIME

SHAP provides global feature importance. LIME explains individual predictions. Jaccard similarity measures agreement.

04

XCS Scoring

XCS = 0.4×Confidence + 0.3×SHAP Stability + 0.3×Jaccard. Scores > 0.3 indicate reliable explanations.

Model Comparison

XGBoost, Random Forest, LightGBM, and VotingEnsemble across all datasets.

ModelAccuracyPrecisionRecallF1-Score
XGBoost Best0.99660.99640.99660.9964
VotingEnsemble0.98860.99450.98860.9911
RandomForest0.98570.99400.98570.9893
LightGBM0.97440.99300.97440.9828
Confusion Matrix

Confusion Matrix

CIC-IDS-2017 classification results

SHAP Beeswarm

SHAP Beeswarm

Feature impact on predictions

ModelAccuracyPrecisionRecallF1-Score
XGBoost Best0.80040.81200.80040.7982
VotingEnsemble0.78670.82400.78670.8002
RandomForest0.76350.82020.76350.7848
LightGBM0.76300.82910.76300.7863
Confusion Matrix

Confusion Matrix

UNSW-NB15 classification results

SHAP Beeswarm

SHAP Beeswarm

Feature impact on predictions

ModelAccuracyPrecisionRecallF1-Score
RandomForest Best0.99930.99930.99930.9992
VotingEnsemble0.99930.99930.99930.9992
XGBoost0.99900.99900.99900.9990
LightGBM0.99900.99900.99900.9990
Confusion Matrix

Confusion Matrix

CSE-CIC-IDS-2018 classification results

SHAP Beeswarm

SHAP Beeswarm

Feature impact on predictions

Model Comparison

Model Comparison

Performance across all 3 datasets

Cross-Dataset

Cross-Dataset Comparison

Feature importance overlap analysis

Explainability

Understanding model decisions with SHAP and LIME.

SHAP Global

SHAP Global Importance

Top features driving predictions

Jaccard

SHAP vs LIME Agreement

Jaccard similarity of top-k features

LIME Benign

LIME - Benign Traffic

Local explanation for normal traffic

LIME DDoS

LIME - DDoS Attack

Local explanation for DDoS detection

XCS

XCS Distribution

XAI Confidence Score across predictions

XCS Scatter

XCS vs Confidence

Correlation analysis

XAI Confidence Score

Measuring the reliability of individual explanations.

XCS = 0.4 × Confidence + 0.3 × (1 − SHAP_Instability) + 0.3 × Jaccard(SHAP, LIME)
40%
Model Confidence
30%
SHAP Stability
30%
SHAP-LIME Agreement

Cross-dataset Jaccard: 0.216 — indicating dataset-specific feature patterns. XCS > 0.3 indicates acceptable explanation reliability.

Reproduce Results

Get the same results on your machine.

01

Clone & Install

git clone the repo and run pip install -r requirements.txt

02

Run Pipeline

Run python run_pipeline.py for synthetic data or --download for real data.

03

Kaggle Notebook

Open xai_ids_multidataset.ipynb on Kaggle with GPU (Tesla T4) for full multi-dataset evaluation.

04

View Results

Check RESULTS.md for full tables and model_metadata.json for raw metrics.

Live Demo

Select a traffic scenario to see XGBoost predictions with SHAP explanations and XCS scoring.

xai_ids_inference.py — Real XGBoost Model
live_prediction — ONNX Runtime Web

Loading real XGBoost model (2.6 MB)...

This runs the actual trained model in your browser

Team

MT

Mohammad Thabet Hassan

Lead Developer
@MohammadThabetHassan
FS

Fahad Sadek

Contributor
@fahad6789123
AS

Ahmed Sami

Contributor
@AhmedSamiAlameri
MK

Dr. Mehak Khurana

Supervisor
Contact

Limitations

✓ Multi-Dataset Evaluation

Evaluated on 3 real IDS datasets. XGBoost achieves 99.66% (CICIDS2017), 80% (UNSW-NB15), 99.9% (CICIDS2018).

⚠ Class Imbalance

UNSW-NB15 minority classes (Analysis, Backdoor, Shellcode, Worms) have lower detection rates (~40-50% recall).

◈ Cross-Dataset Generalization

Cross-dataset Jaccard = 0.216. Models trained on one dataset may not generalize well to others.

✓ XCS Implementation

Novel XCS metric measures explanation reliability. XCS > 0.3 indicates acceptable confidence.