🔬 IEEE Research Implementation

ML-Based Symptom
Pattern Classification System

An advanced machine learning system for primary health screening that evaluates symptom patterns and proposes potential health conditions using NLP preprocessing and ensemble classification — achieving 91.7% diagnostic accuracy.

🩺 Try Symptom Checker 📊 View Research Results

91.7%

M1 Accuracy

0.93

NDCG Score

65.3%

Improvement

40+

Diseases

⚙️ Architecture

How It Works

Our 4-stage diagnostic pipeline transforms raw symptom descriptions into ranked differential diagnoses using state-of-the-art NLP and ensemble ML.

📝

Text Input

Raw unstructured patient symptom descriptions are fed into the system as clinical vignettes or free-text input.

🔤

NLP Preprocessing

Tokenization, stopword removal, lemmatization, and TF-IDF extraction convert text into mathematical feature vectors.

🤖

ML Classification

Ensemble model (Random Forest + SVM + Gradient Boosting) cross-references symptoms against known disease patterns.

📊

Diagnosis Output

Generates ranked differential diagnoses with confidence scores, severity levels, and recommended next steps.

🩺 Interactive Tool

Symptom Checker

Describe your symptoms in natural language or select common symptoms below. Our ML model will analyze the patterns and suggest potential conditions.

Describe Your Symptoms

🏥 Patient Context (Optional — EHR Integration)

Age

Biological Sex

Medical History

Diabetes Hypertension Heart Disease Asthma Obesity Smoker Immunocompromised Pregnancy

🌡 Fever 🤕 Headache 😷 Cough 💔 Chest Pain 😮‍💨 Breathing Difficulty 😴 Fatigue 🤢 Nausea 😵 Dizziness 🦴 Joint Pain 🗣 Sore Throat 🔴 Rash 🤰 Abdominal Pain ⚖️ Weight Loss 🌙 Night Sweats 🤮 Vomiting 🦵 Swelling

🔬

Enter your symptoms and click analyze
to see diagnostic predictions

📊 Research Data

Comparative Performance Results

Benchmarked against rule-based tools and experienced physicians using 400 medically validated clinical vignettes.

91.7%

M1 Accuracy

Top-1 diagnostic accuracy of our ML system

65.3%

Improvement

Over lowest-performing rule-based tool

0.93

NDCG Score

Surpassing physicians (0.82) in ranking

M1 Accuracy & F1-Score Comparison

Overall Performance Radar

F1-Score Formula (Eq. 1)

F₁ = 2 × (Precision × Recall) / (Precision + Recall)

The F1-score is the harmonic mean of precision and recall, penalizing systems that disproportionately output false positives or false negatives.

🔬 Methodology

Evaluation Parameters

Our testing protocol employed rigorous, standardized methodology to ensure objectivity and clinical relevance.

S.No	Parameter	Description	Remarks
1.	Test Dataset	400 clinical vignettes	Peer-reviewed
2.	Human Benchmark	3 primary care physicians	16.6 years avg. experience
3.	System Comparison	ML vs existing checkers	Identifies gaps
4.	Accuracy Metrics	M1, F1-score, NDCG	Standardized
5.	Overall Objective	Improve disease triage accuracy	Highly effective

Comparative Performance Results (Table II)

Evaluation Entity	M1 Accuracy	F1-Score	NDCG
Lowest Rule-Based Tool	32.4%	0.41	0.45
Human Physicians	88.2%	0.89	0.82
Proposed ML System	91.7%	0.87	0.93

📐 Ranking Quality

NDCG Evaluation Dashboard

Normalized Discounted Cumulative Gain measures how well our system ranks the differential diagnoses — the most critical metric for clinical utility.

Live Model NDCG

—

NDCG@5 Score

NDCG Comparison

0.9–1.0 Excellent

0.7–0.9 Good

0.5–0.7 Fair

<0.5 Poor

NDCG Formula

NDCG = DCG / IDCG where DCG = Σ(relᵢ / log₂(i+1))

DCG sums relevance scores discounted by rank position. IDCG is the ideal DCG (perfect ranking). Their ratio gives a normalized score between 0 and 1.

⚖️ Fairness Analysis

Bias Detection Panel

Analyzing model accuracy across disease categories to identify potential data biases — a key focus area from the paper's future work recommendations.

Accuracy by Disease Category

Loading bias data...

Bias Summary

👥 Team

Research Team

Symbiosis Institute of Technology, Hyderabad Campus — Symbiosis International University

Adnan Mohammed

Researcher

SIT Hyderabad

Narala Lakshman Reddy

Researcher

SIT Hyderabad

Pulegari Shashi Kiran Reddy

Researcher

SIT Hyderabad

Kiran Siripuri

Faculty Advisor

SIT Hyderabad

Sai Prashanth Mallellu

Corresponding Author

SIT Hyderabad

Rajanikanth Aluvalu

Faculty Advisor

SIT Hyderabad

ML-Based Symptom Pattern Classification System

How It Works

Text Input

NLP Preprocessing

ML Classification

Diagnosis Output

Symptom Checker

Describe Your Symptoms

Comparative Performance Results

M1 Accuracy & F1-Score Comparison

Overall Performance Radar

F1-Score Formula (Eq. 1)

Evaluation Parameters

Comparative Performance Results (Table II)

NDCG Evaluation Dashboard

Live Model NDCG

NDCG Comparison

NDCG Formula

Bias Detection Panel

Accuracy by Disease Category

Bias Summary

Research Team

ML-Based Symptom
Pattern Classification System