XAI-IDS | Explainable AI Intrusion Detection System

Data

Benchmark Datasets

Three intrusion detection datasets with varying complexity and attack types.

Multi-class

CIC-IDS-2017

14 classes including BENIGN, DoS, DDoS, Bot, Web attacks, and infiltration. Generated by CICFlowMeter from pcap captures.

78 features

2.8M records

14 classes

Multi-class

UNSW-NB15

10 classes with diverse attack types including Analysis, Backdoor, Exploits, Fuzzers, Reconnaissance, and Worms.

45 features

2.5M records

10 classes

Binary

CSE-CIC-IDS-2018

Binary classification of normal traffic vs DDoS attacks using LOIC-HTTP tool. Extension of the 2017 dataset.

80 features

10M records

2 classes

Why Explainability

Why XAI Matters in Cybersecurity

Black-box models aren't enough for security operations.

🔍

Trust Decisions

Security analysts need to understand WHY traffic is flagged before taking action. XAI provides transparent reasoning.

⚖️

Reduce False Positives

Understanding feature contributions helps analysts quickly dismiss false alarms and focus on real threats.

📊

Regulatory Compliance

Many regulations require explainable AI decisions. XCS provides quantifiable explanation reliability scores.

🛡️

Adversarial Defense

Understanding model decisions helps identify potential adversarial attacks targeting the IDS system.

System Design

Architecture

End-to-end pipeline from raw traffic to explainable predictions.

Data Ingestion

Raw PCAP files processed by CICFlowMeter to extract 78 network flow features.

Feature Selection

Top 20 features selected by XGBoost importance. RobustScaler applied (no data leakage).

Model Training

XGBoost, Random Forest, LightGBM trained with SMOTE. VotingEnsemble combines all three.

Explainability

SHAP for global importance, LIME for local explanations. XCS measures reliability.

Methodology

How It Works

From raw network traffic to explainable predictions.

Data Collection

Network flows captured from CIC-IDS-2017, UNSW-NB15, and CSE-CIC-IDS-2018 datasets with 78+ features per flow.

Model Training

XGBoost, Random Forest, LightGBM, and VotingEnsemble trained on 20 selected features with SMOTE for class balance.

SHAP + LIME

SHAP provides global feature importance. LIME explains individual predictions. Jaccard similarity measures agreement.

XCS Scoring

XCS = 0.4×Confidence + 0.3×SHAP Stability + 0.3×Jaccard. Scores > 0.3 indicate reliable explanations.

Performance

Model Comparison

XGBoost, Random Forest, LightGBM, and VotingEnsemble across all datasets.

Model	Accuracy	Precision	Recall	F1-Score
XGBoost Best	0.9966	0.9964	0.9966	0.9964
VotingEnsemble	0.9886	0.9945	0.9886	0.9911
RandomForest	0.9857	0.9940	0.9857	0.9893
LightGBM	0.9744	0.9930	0.9744	0.9828

Confusion Matrix

CIC-IDS-2017 classification results

SHAP Beeswarm

Feature impact on predictions

Model	Accuracy	Precision	Recall	F1-Score
XGBoost Best	0.8004	0.8120	0.8004	0.7982
VotingEnsemble	0.7867	0.8240	0.7867	0.8002
RandomForest	0.7635	0.8202	0.7635	0.7848
LightGBM	0.7630	0.8291	0.7630	0.7863

Confusion Matrix

UNSW-NB15 classification results

SHAP Beeswarm

Feature impact on predictions

Model	Accuracy	Precision	Recall	F1-Score
RandomForest Best	0.9993	0.9993	0.9993	0.9992
VotingEnsemble	0.9993	0.9993	0.9993	0.9992
XGBoost	0.9990	0.9990	0.9990	0.9990
LightGBM	0.9990	0.9990	0.9990	0.9990

Confusion Matrix

CSE-CIC-IDS-2018 classification results

SHAP Beeswarm

Feature impact on predictions

Model Comparison

Performance across all 3 datasets

Cross-Dataset Comparison

Feature importance overlap analysis

Transparency

Explainability

Understanding model decisions with SHAP and LIME.

SHAP Global Importance

Top features driving predictions

SHAP vs LIME Agreement

Jaccard similarity of top-k features

LIME - Benign Traffic

Local explanation for normal traffic

LIME - DDoS Attack

Local explanation for DDoS detection

XCS Distribution

XAI Confidence Score across predictions

XCS vs Confidence

Correlation analysis

Novel Metric

XAI Confidence Score

Measuring the reliability of individual explanations.

XCS = 0.4 × Confidence + 0.3 × (1 − SHAP_Instability) + 0.3 × Jaccard(SHAP, LIME)

40%

Model Confidence

30%

SHAP Stability

30%

SHAP-LIME Agreement

Cross-dataset Jaccard: 0.216 — indicating dataset-specific feature patterns. XCS > 0.3 indicates acceptable explanation reliability.

Reproducibility

Reproduce Results

Get the same results on your machine.

Clone & Install

git clone the repo and run pip install -r requirements.txt

Run Pipeline

Run python run_pipeline.py for synthetic data or --download for real data.

Kaggle Notebook

Open xai_ids_multidataset.ipynb on Kaggle with GPU (Tesla T4) for full multi-dataset evaluation.

View Results

Check RESULTS.md for full tables and model_metadata.json for raw metrics.

Interactive

Live Demo

Select a traffic scenario to see XGBoost predictions with SHAP explanations and XCS scoring.

xai_ids_inference.py — Real XGBoost Model

live_prediction — ONNX Runtime Web

⏳

Loading real XGBoost model (2.6 MB)...

This runs the actual trained model in your browser

Honesty

Limitations

✓ Multi-Dataset Evaluation

Evaluated on 3 real IDS datasets. XGBoost achieves 99.66% (CICIDS2017), 80% (UNSW-NB15), 99.9% (CICIDS2018).

⚠ Class Imbalance

UNSW-NB15 minority classes (Analysis, Backdoor, Shellcode, Worms) have lower detection rates (~40-50% recall).

◈ Cross-Dataset Generalization

Cross-dataset Jaccard = 0.216. Models trained on one dataset may not generalize well to others.

✓ XCS Implementation

Novel XCS metric measures explanation reliability. XCS > 0.3 indicates acceptable confidence.

Benchmark Datasets

CIC-IDS-2017

UNSW-NB15

CSE-CIC-IDS-2018

Why XAI Matters in Cybersecurity

Trust Decisions

Reduce False Positives

Regulatory Compliance

Adversarial Defense

Architecture

Data Ingestion

Feature Selection

Model Training

Explainability

How It Works

Data Collection

Model Training

SHAP + LIME

XCS Scoring

Model Comparison

Confusion Matrix

SHAP Beeswarm

Confusion Matrix

SHAP Beeswarm

Confusion Matrix

SHAP Beeswarm

Model Comparison

Cross-Dataset Comparison

Explainability

SHAP Global Importance

SHAP vs LIME Agreement

LIME - Benign Traffic

LIME - DDoS Attack

XCS Distribution

XCS vs Confidence

XAI Confidence Score

Reproduce Results

Clone & Install

Run Pipeline

Kaggle Notebook

View Results

Live Demo

Team

Mohammad Thabet Hassan

Fahad Sadek

Ahmed Sami

Dr. Mehak Khurana

Limitations

✓ Multi-Dataset Evaluation

⚠ Class Imbalance

◈ Cross-Dataset Generalization

✓ XCS Implementation