馃懇馃徔 padie-extended Language Detection Model
This is the trained model for padie-extended, a language detection system for Nigerian languages including English, Nigerian Pidgin, Yoruba, Hausa, and Igbo.
Model Description
This transformer-based model is fine-tuned on afro-xlmr-base to accurately detect and classify text across five Nigerian languages.
Developed by: Ayooluwaposi Olomo
Model type: Text Classification / Language Detection
Language(s): English, Nigerian Pidgin, Yoruba, Hausa, Igbo
License: MIT
Base model: afro-xlmr-base
Supported Languages
| Language | Code | Example |
|---|---|---|
| English | en |
"Hello, how are you?" |
| Nigerian Pidgin | pidgin |
"How you dey?" |
| Yoruba | yo |
"Bawo ni?" |
| Hausa | ha |
"Sannu" |
| Igbo | ig |
"Kedu?" |
Usage
Using the padie-extended Package (Recommended)
pip install padie-extended
from padie_extended import LanguageDetector
# Initialize the detector
detector = LanguageDetector()
# Detect language from text
text = "Bawo ni, se daadaa ni?"
result = detector.predict(text)
print(f"Language: {result['language']}")
print(f"Confidence: {result['confidence']:.2%}")
Direct Usage with Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "posi-olomo/padie-extended"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Predict
text = "How you dey?"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get the predicted language
labels = ['english', 'pidgin', 'yoruba', 'hausa', 'igbo']
predicted_label = labels[predictions.argmax().item()]
confidence = predictions.max().item()
print(f"Language: {predicted_label}")
print(f"Confidence: {confidence:.2%}")
Performance
Tested on a diverse dataset of Nigerian texts:
| Metric | Score |
|---|---|
| Overall Accuracy | 95.3% |
| F1 Score (weighted) | 95.3% |
| Inference Speed | ~4.5 ms per text (GPU) |
Use Cases
- 馃寪 Content moderation - Detect language in user-generated content
- 馃摫 Social media analysis - Analyze multilingual Nigerian social media posts
- 馃 Chatbots - Route conversations based on detected language
- 馃搳 Research - Analyze language distribution in datasets
- 馃幆 Language-specific processing - Trigger different pipelines per language
Training Details
- Base Model: afro-xlmr-base
- Training Data: Diverse corpus of Nigerian language texts
- Model Size: ~1GB
Limitations
- The model performs better on long-form text compared to very short phrases
- Accuracy may vary with code-switching or heavily dialectal variations
- Best results when text is predominantly in one language
Citation
If you use this model in your research, please cite:
@software{padie_extended,
author = {Olomo, Ayooluwaposi},
title = {padie-extended: AI-powered Nigerian Language Detection},
year = {2025},
url = {https://github.com/posi-olomo/padie-extended}
}
Acknowledgments
- Built upon the Padie project by @sir-temi and @pythonisoft
- Built with AWS cloud credits generously provided by Dr. W谩l茅 Ak铆nfad茅r矛n
- Built with Hugging Face Transformers
- Thanks to the Nigerian NLP community
Links
- Package GitHub: posi-olomo/padie-extended
- PyPI: padie-extended
- Issues: Report a bug
- Documentation: Full Documentation
License
This model is licensed under the MIT License.
Made with 鉂わ笍 for the Nigerian tech community
- Downloads last month
- 18