An advanced machine learning system for primary health screening that evaluates symptom patterns and proposes potential health conditions using NLP preprocessing and ensemble classification — achieving 91.7% diagnostic accuracy.
Our 4-stage diagnostic pipeline transforms raw symptom descriptions into ranked differential diagnoses using state-of-the-art NLP and ensemble ML.
Raw unstructured patient symptom descriptions are fed into the system as clinical vignettes or free-text input.
Tokenization, stopword removal, lemmatization, and TF-IDF extraction convert text into mathematical feature vectors.
Ensemble model (Random Forest + SVM + Gradient Boosting) cross-references symptoms against known disease patterns.
Generates ranked differential diagnoses with confidence scores, severity levels, and recommended next steps.
Describe your symptoms in natural language or select common symptoms below. Our ML model will analyze the patterns and suggest potential conditions.
Benchmarked against rule-based tools and experienced physicians using 400 medically validated clinical vignettes.
The F1-score is the harmonic mean of precision and recall, penalizing systems that disproportionately output false positives or false negatives.
Our testing protocol employed rigorous, standardized methodology to ensure objectivity and clinical relevance.
| S.No | Parameter | Description | Remarks |
|---|---|---|---|
| 1. | Test Dataset | 400 clinical vignettes | Peer-reviewed |
| 2. | Human Benchmark | 3 primary care physicians | 16.6 years avg. experience |
| 3. | System Comparison | ML vs existing checkers | Identifies gaps |
| 4. | Accuracy Metrics | M1, F1-score, NDCG | Standardized |
| 5. | Overall Objective | Improve disease triage accuracy | Highly effective |
| Evaluation Entity | M1 Accuracy | F1-Score | NDCG |
|---|---|---|---|
| Lowest Rule-Based Tool | 32.4% | 0.41 | 0.45 |
| Human Physicians | 88.2% | 0.89 | 0.82 |
| Proposed ML System | 91.7% | 0.87 | 0.93 |
Normalized Discounted Cumulative Gain measures how well our system ranks the differential diagnoses — the most critical metric for clinical utility.
DCG sums relevance scores discounted by rank position. IDCG is the ideal DCG (perfect ranking). Their ratio gives a normalized score between 0 and 1.
Analyzing model accuracy across disease categories to identify potential data biases — a key focus area from the paper's future work recommendations.
Symbiosis Institute of Technology, Hyderabad Campus — Symbiosis International University