- π§ DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
- π§© Model Architecture
- π§ Training Method
- π Confidence Calibration (XGBoost)
- π Results
- βοΈ Training Details
- π·οΈ Supported Disease Classes (27)
- π Repository Structure
- β οΈ Important Notes & Limitations
- π Dataset
- π Citation
- π Summary
π― Our Goal & Future Usecases
DualMedBert is an early-stage text classification model built to categorize patient-reported health and drug conditions. It was trained using the UCI Drug Review Dataset, which contains patient reviews sourced directly from drugs.com.
The internet is full of unstructured patient dataβwhether on health forums, review sites like drugs.com, or in clinic intake forms. Our goal with DualMedBert is to process this messy, patient-written text to help researchers and analysts:
- Analyze Adverse Drug Effects: Quickly sort patient reviews to figure out how different demographics are reacting to certain medications.
- Track Disease Trends: Automatically categorize thousands of forum posts or clinic notes to see what conditions are trending in a specific region or dataset.
Note: These specific analytical use cases are part of our future roadmap. The model is currently under active development.
β οΈ Important Limitations
- Not for Diagnosis: This model is strictly an analytical tool designed for research and data structuring. It is never to be used for direct medical diagnosis, advice, or patient treatment.
- Limited Scope: The current version is only trained to recognize 27 specific diseases/conditions. If a text describes a condition outside this list, the model cannot predict it accurately. We plan to expand its capacity in future iterations.
π§ DualMedBERT: Dual-Teacher Distilled Biomedical Classifier
β οΈ Testing Phase Notice: DualMedBERT is currently in an active testing phase. The present experiments cover 27 disease classes from the UCI Drug Review dataset, which represents a relatively small and focused slice of the clinical NLP landscape. We intend to extend testing across many more disease categories and significantly larger sample sizes in future iterations. The current dataset size is limited, which contributes to mild overfitting observed in the student model. At scale β with more diverse classes and substantially more training data β we expect the model's generalization ability and real-world reliability to improve considerably. Results reported here reflect this early-stage evaluation.
We present DualMedBERT, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.
The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.
Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves a Macro F1 of 0.8432 and Accuracy of 84.4% β matching or exceeding BERT-base performance β while reducing inference latency by ~1.6β1.8Γ (encoder: 10.13 ms, end-to-end: 11.06 ms). The calibration module achieves an AUROC of 0.8847 with a calibration accuracy of 83.33%, significantly improving confidence estimation without affecting classification performance.
These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.
π₯ Use Case / Applications
DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.
π Potential Applications
Clinical decision support (assistive, not diagnostic)
Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.Telemedicine and triage systems
Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.Health forums and patient platforms
Automatically categorizing user-reported conditions for better organization and information retrieval.Public health monitoring
Aggregating and analyzing trends in reported symptoms across populations.
β οΈ Important Note
This model is intended for research and assistive purposes only and should not be used for medical diagnosis or treatment decisions without professional oversight.
π‘ Why This Matters
Patient-reported health data differs from clinical text:
- Informal language
- Symptom descriptions instead of diagnoses
- Ambiguity and overlap across conditions
DualMedBERT addresses this by combining:
- General language understanding (BERT)
- Biomedical domain knowledge (PubMedBERT)
- Efficient deployment (DistilBERT + LoRA)
- Reliability estimation (XGBoost calibration)
π§© Model Architecture
Student Model
| Component | Detail |
|---|---|
| Backbone | distilbert-base-uncased |
| LoRA Rank | r = 8 |
| LoRA Alpha | Ξ± = 32 |
| LoRA Dropout | 0.05 |
| LoRA Applied To | Layers 2β5 |
| Layer 1 | Partially unfrozen |
| Pooling | CLS token + attention pooling |
| Classifier Head | Dense β 27 disease classes |
| Max Sequence Len | 256 tokens |
Teachers
| Teacher | Checkpoint | Role |
|---|---|---|
| BERT-base | google-bert/bert-base-uncased |
General language understanding |
| PubMedBERT | microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext |
Biomedical domain knowledge |
π§ Training Method
Dual-Teacher Knowledge Distillation
The total training loss combines knowledge distillation from two teachers with focal classification loss:
L = Ξ± Β· L_KD_BERT + Ξ² Β· L_KD_PubMed + (1 - Ξ± - Ξ²) Β· L_Focal
Where:
- KD uses two teachers in parallel
- Teacher weights are determined via entropy-based confidence (adaptive weighting)
- Ξ± (KD weight β BERT teacher):
0.4 - Ξ² (KD weight β PubMedBERT teacher):
0.5 - KD Temperature (T):
3.5 - The remaining weight
(1 - 0.4 - 0.5 = 0.1)is applied to the focal loss
π Confidence Calibration (XGBoost)
Post-hoc calibrator predicts whether a prediction is likely to be correct, enabling flagging of uncertain predictions.
Features Used (31 total):
| Feature Group | Details |
|---|---|
| Softmax probabilities | All 27 class-wise softmax outputs |
| Max probability | max(softmax) β confidence in top prediction |
| Entropy | Shannon entropy over softmax distribution |
| Top-2 gap | Difference between top-1 and top-2 softmax values |
| Top-3 sum | Sum of top-3 softmax probabilities |
π Results
Note: These results are from the current testing phase on the UCI Drug Review dataset (27 disease classes). Results may improve significantly with more data and broader disease coverage. Mild overfitting is observed due to limited dataset size.
Classification Performance
| Model | Macro F1 | Accuracy | Latency (Encoder) | Latency (End-to-End) |
|---|---|---|---|---|
| BERT-base | 0.8333 | 0.835 | ~16β18 ms | ~16β18 ms |
| PubMedBERT | 0.8553 | 0.855 | ~16β18 ms | ~16β18 ms |
| DualMedBERT β | 0.8432 | 0.8440 | 10.13 ms | 11.06 ms |
DualMedBERT achieves higher Macro F1 than BERT-base while running at ~1.6Γ lower latency compared to the teacher models.
π Calibration Performance
| Metric | Value |
|---|---|
| Calibration AUROC | 0.8847 |
| Calibration Accuracy | 83.33% |
The XGBoost calibrator reliably detects when the student's prediction is likely to be wrong β enabling downstream systems to flag low-confidence outputs for human review.
βοΈ Training Details
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate (student) | 1.5e-4 |
| Weight Decay | 0.1 |
| Epochs | 12 |
| Early Stopping Patience | 3 |
| KD Temperature (T) | 3.5 |
| KD Alpha (BERT weight) | 0.4 |
| KD Beta (PubMedBERT weight) | 0.5 |
| LoRA Dropout | 0.05 |
| Max Sequence Length | 256 |
π·οΈ Supported Disease Classes (27)
| ID | Disease |
|---|---|
| 0 | Abnormal Uterine Bleeding |
| 1 | Allergic Rhinitis |
| 2 | Bacterial Infection |
| 3 | Benign Prostatic Hyperplasia |
| 4 | Constipation |
| 5 | Diabetes, Type 2 |
| 6 | Endometriosis |
| 7 | Erectile Dysfunction |
| 8 | GERD |
| 9 | Hepatitis C |
| 10 | High Blood Pressure |
| 11 | High Cholesterol |
| 12 | HIV Infection |
| 13 | Hyperhidrosis |
| 14 | Fibromyalgia |
| 15 | Irritable Bowel Syndrome |
| 16 | Migraine |
| 17 | Migraine Prevention |
| 18 | Multiple Sclerosis |
| 19 | Osteoarthritis |
| 20 | Overactive Bladder |
| 21 | Psoriasis |
| 22 | Restless Legs Syndrome |
| 23 | Rheumatoid Arthritis |
| 24 | Sinusitis |
| 25 | Urinary Tract Infection |
| 26 | Vaginal Yeast Infection |
π Repository Structure
DualMedBert/
βββ README.md # This file
βββ config.json # Full model and training configuration
βββ label_map.json # Class ID β disease name mapping
βββ student_weights.pt # Trained student model weights
βββ tokenizer.json # Student tokenizer
βββ tokenizer_config.json # Tokenizer configuration
βββ vocab.txt # Vocabulary file
βββ special_tokens_map.json # Special token definitions
βββ xgb_calibrator.json # Trained XGBoost calibration model
βββ temperature_scaler.joblib # Temperature scaling object (post-hoc)
βββ bert_teacher/ # Fine-tuned BERT-base teacher
β βββ config.json
β βββ model.safetensors
β βββ tokenizer.json
β βββ ...
βββ pubmed_teacher/ # Fine-tuned PubMedBERT teacher
β βββ config.json
β βββ model.safetensors
β βββ tokenizer.json
β βββ ...
βββ plots/ # Evaluation and analysis figures
βββ fig1_kd_training_dynamics.png
βββ fig2_model_comparison.png
βββ fig3_per_class_f1.png
βββ fig4_confusion_matrix.png
βββ fig5_calibrator_analysis.png
βββ fig6_loss_decomposition.png
βββ fig_shap_importance.png
β οΈ Important Notes & Limitations
- Current testing phase: Results are based on a single dataset (UCI Drug Reviews, 27 classes) with limited samples. The model shows mild overfitting attributable to the small dataset size.
- Planned expansion: We intend to test DualMedBERT across many more disease classes and with significantly larger datasets. Broader data is expected to unlock better generalization and substantially stronger real-world performance.
- Adaptive teacher weights: Teacher confidence weights showed limited dynamic variation (~0.45 / 0.55) during training, suggesting both teachers contribute fairly consistently across the dataset.
- Speedβaccuracy tradeoff: The model is designed to prioritize speed and reliability while maintaining competitive classification accuracy relative to its teachers.
- Not for diagnosis: This model is for research and assistive purposes only. It should not be used as a substitute for professional medical judgment.
π Dataset
UCI Drug Review Dataset (GrΓ€Γer et al., 2018)
Patient-written drug reviews paired with condition labels. Reviews are informal, symptom-rich, and linguistically diverse β making this an appropriate benchmark for patient-reported health classification.
π Citation
If you use DualMedBERT, please cite the following relevant works:
- Hinton et al., 2015 β Knowledge Distillation: Distilling the Knowledge in a Neural Network
- Hu et al., 2022 β LoRA: Low-Rank Adaptation of Large Language Models
- Sanh et al., 2019 β DistilBERT, a distilled version of BERT
- Devlin et al., 2018 β BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Gu et al., 2021 β Domain-Specific Language Model Pretraining for Biomedical NLU (PubMedBERT)
- Lin et al., 2017 β Focal Loss for Dense Object Detection
- Chen & Guestrin, 2016 β XGBoost: A Scalable Tree Boosting System
- GrΓ€Γer et al., 2018 β Aspect-Based Sentiment Analysis of Drug Reviews (UCI Drug Review Dataset)
π Summary
DualMedBERT demonstrates that a carefully designed dual-teacher distillation pipeline can:
β Outperform BERT-base in Macro F1 (0.8432 vs. 0.8333) on the current test set
β Achieve ~1.6β1.8Γ lower inference latency (10.13 ms encoder / 11.06 ms end-to-end)
β Provide reliable confidence estimation via XGBoost calibration (AUROC: 0.8847, Accuracy: 83.33%)
β³ Under active expansion β future work will cover more disease classes and larger datasets for improved generalization
- Downloads last month
- 159
Model tree for souvik-nlp/DualMedBert
Base model
distilbert/distilbert-base-uncased