π Indonesian Complaint Classification Model (IndoBERT)
Model klasifikasi teks aduan masyarakat dalam Bahasa Indonesia menggunakan IndoBERT (indobenchmark/indobert-base-p1).
Model dapat mengelompokkan aduan ke dalam 5 kategori dengan akurasi 96.10%.
π Kategori Klasifikasi
| Label | Deskripsi | Contoh |
|---|---|---|
| PINALTI | Konten yang mengandung kata kasar, SARA, pornografi, ujaran kebencian, atau pelanggaran norma | "Kampret pejabat koruptor!", "Konten porno beredar", "Rasis banget pemerintah" |
| DARURAT | Situasi darurat yang membutuhkan respon segera (kebakaran, kecelakaan, bencana, ancaman nyawa) | "Ada kebakaran besar di pasar!", "Kecelakaan beruntun di tol", "Banjir bandang melanda desa" |
| PRIORITAS | Permasalahan yang perlu penanganan cepat (infrastruktur rusak, kebersihan, pelayanan publik) | "Jalan berlubang berbahaya", "Sampah menumpuk seminggu", "Lampu jalan mati semua" |
| UMUM | Pertanyaan informasi, saran, atau aduan non-urgent | "Bagaimana cara mengurus KTP?", "Kapan jadwal posyandu?", "Saran untuk program desa" |
| LAINNYA | Aduan yang tidak termasuk kategori di atas | "Terima kasih atas pelayanannya", "Hanya ingin menyampaikan apresiasi" |
π― Model Performance
Overall Metrics
- Validation Accuracy: 96.10%
- Macro F1-Score: 0.9608
- Weighted F1-Score: 0.9610
- Average Confidence: 93.90%
Per-Class Performance
| Label | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Pinalti | 0.9588 | 0.9645 | 0.9617 | 169 |
| Darurat | 0.9453 | 0.9603 | 0.9528 | 126 |
| Prioritas | 0.9675 | 0.9675 | 0.9675 | 123 |
| Umum | 0.9752 | 0.9593 | 0.9672 | 123 |
| Lainnya | 0.9596 | 0.9500 | 0.9548 | 100 |
Confusion Matrix
Predicted
Pin Dar Pri Umu Lai
Actual Pin 163 2 1 0 3
Dar 2 121 2 0 1
Pri 0 3 119 1 0
Umu 2 2 1 118 0
Lai 3 0 0 2 95
π Dataset Information
- Total Training Samples: 3,204
- Pinalti: 844
- Darurat: 630
- Prioritas: 612
- Umum: 616
- Lainnya: 502
- Train/Val Split: 80% / 20% (2,563 / 641)
- Augmentation: Applied to balance classes
- Language: Indonesian (Bahasa Indonesia)
π Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "Zulkifli1409/aduan-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input
text = "Ada kebakaran besar di pasar, tolong kirim pemadam segera!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
# Predict
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=1)
pred_idx = torch.argmax(probs).item()
# Labels
labels = ["PINALTI", "DARURAT", "PRIORITAS", "UMUM", "LAINNYA"]
print(f"Prediksi: {labels[pred_idx]}")
print(f"Confidence: {probs[0][pred_idx].item():.2%}")
print(f"\nAll probabilities:")
for label, prob in zip(labels, probs[0]):
print(f" {label}: {prob.item():.2%}")
Output:
Prediksi: DARURAT
Confidence: 96.03%
All probabilities:
PINALTI: 0.21%
DARURAT: 96.03%
PRIORITAS: 2.89%
UMUM: 0.45%
LAINNYA: 0.42%
π§ͺ Example Predictions
| Input Text | Prediction | Confidence |
|---|---|---|
| "Brengsek! Pejabat korup semua!" | PINALTI | 94.23% |
| "Ada orang kecelakaan parah butuh ambulans" | DARURAT | 95.67% |
| "Jalan berlubang perlu diperbaiki segera" | PRIORITAS | 92.34% |
| "Bagaimana cara mengurus surat izin usaha?" | UMUM | 89.45% |
| "Terima kasih atas bantuannya" | LAINNYA | 88.91% |
| "Konten porno tersebar di grup WhatsApp" | PINALTI | 91.78% |
| "Banjir tinggi merendam rumah warga" | DARURAT | 93.12% |
| "Sampah menumpuk di jalan sejak seminggu lalu" | PRIORITAS | 90.56% |
π§ Batch Prediction
texts = [
"Ada kebakaran di gedung!",
"Jalan rusak parah",
"Dasar bodoh pemerintah!",
"Kapan jadwal vaksinasi?"
]
# Tokenize batch
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
# Predict
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=1)
predictions = torch.argmax(probs, dim=1)
labels = ["PINALTI", "DARURAT", "PRIORITAS", "UMUM", "LAINNYA"]
for text, pred_idx, prob in zip(texts, predictions, probs):
pred_label = labels[pred_idx]
confidence = prob[pred_idx].item()
print(f"Text: {text}")
print(f"Prediction: {pred_label} ({confidence:.2%})\n")
π API Deployment
Model ini juga tersedia sebagai REST API di Railway:
Base URL: https://api-klasifikasi-aduan.up.railway.app
cURL Example
curl -X POST https://api-klasifikasi-aduan.up.railway.app/predict \
-H "Content-Type: application/json" \
-d '{"text": "Ada kebakaran di pasar"}'
Response
{
"label": "DARURAT",
"confidence": 0.9603,
"all_scores": {
"PINALTI": 0.0021,
"DARURAT": 0.9603,
"PRIORITAS": 0.0289,
"UMUM": 0.0045,
"LAINNYA": 0.0042
}
}
π οΈ Training Details
Model Architecture
- Base Model:
indobenchmark/indobert-base-p1 - Task: Sequence Classification (5 classes)
- Max Sequence Length: 128 tokens
- Hidden Size: 768
- Attention Heads: 12
- Layers: 12
Training Configuration
- GPU: Tesla T4 (14.74 GB VRAM)
- Precision: FP16 (Mixed Precision)
- Gradient Checkpointing: Enabled
- Batch Size: 2
- Learning Rate: 1.5e-5
- Epochs: 5
- Optimizer: AdamW
- Best Epoch: 5
Training Progress
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc | Val F1 |
|---|---|---|---|---|---|
| 1 | 0.3688 | 74.87% | 0.0825 | 93.45% | 0.9346 |
| 2 | 0.0586 | 95.86% | 0.0604 | 96.10% | 0.9609 |
| 3 | 0.0179 | 98.52% | 0.0635 | 96.41% | 0.9641 |
| 4 | 0.0069 | 99.38% | 0.0668 | 96.10% | 0.9611 |
| 5 | 0.0021 | 99.88% | 0.0623 | 96.10% | 0.9610 |
β οΈ Important Notes
Content Moderation (PINALTI)
Model ini dapat mendeteksi konten yang tidak pantas, namun tidak sempurna. Untuk aplikasi produksi yang sensitif, pertimbangkan:
- Layer moderasi tambahan
- Human review untuk kasus borderline
- Whitelist/blacklist kata kunci eksplisit
- Kombinasi dengan rule-based filtering
Limitations
- Model dilatih dengan data aduan masyarakat Indonesia
- Performa optimal untuk teks dengan panjang 10-100 kata
- Slang atau dialek daerah tertentu mungkin kurang akurat
- Context yang ambigu dapat menghasilkan prediksi yang kurang tepat
π License
This model is licensed under Apache 2.0 License.
π§ Citation & Contact
Developer: Zulkifli1409
Hugging Face: @Zulkifli1409
Jika Anda menggunakan model ini dalam penelitian atau aplikasi, mohon untuk memberikan kredit yang sesuai.
BibTeX
@misc{zulkifli2025aduan,
author = {Zulkifli},
title = {Indonesian Complaint Classification Model with IndoBERT},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Zulkifli1409/aduan-model}}
}
π€ Contributing
Umpan balik, laporan bug, dan kontribusi sangat diterima!
Silakan buka issue di repository atau hubungi via Hugging Face.
Β© 2025 - Klasifikasi Aduan Model
- Downloads last month
- 14
Model tree for Zulkifli1409/aduan-model
Base model
indobenchmark/indobert-base-p1Evaluation results
- accuracy on Custom Labeled Aduan Datasetvalidation set self-reported0.939
- f1 on Custom Labeled Aduan Datasetvalidation set self-reported0.939