👩🏿 padie-extended Language Detection Model

This is the trained model for padie-extended, a language detection system for Nigerian languages including English, Nigerian Pidgin, Yoruba, Hausa, and Igbo.

Model Description

This transformer-based model is fine-tuned on afro-xlmr-base to accurately detect and classify text across five Nigerian languages.

Developed by: Ayooluwaposi Olomo
Model type: Text Classification / Language Detection
Language(s): English, Nigerian Pidgin, Yoruba, Hausa, Igbo
License: MIT
Base model: afro-xlmr-base

Supported Languages

Language	Code	Example
English	`en`	"Hello, how are you?"
Nigerian Pidgin	`pidgin`	"How you dey?"
Yoruba	`yo`	"Bawo ni?"
Hausa	`ha`	"Sannu"
Igbo	`ig`	"Kedu?"

Usage

Using the padie-extended Package (Recommended)

pip install padie-extended

from padie_extended import LanguageDetector

# Initialize the detector
detector = LanguageDetector()

# Detect language from text
text = "Bawo ni, se daadaa ni?"
result = detector.predict(text)

print(f"Language: {result['language']}")
print(f"Confidence: {result['confidence']:.2%}")

Direct Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "posi-olomo/padie-extended"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict
text = "How you dey?"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get the predicted language
labels = ['english', 'pidgin', 'yoruba', 'hausa', 'igbo']
predicted_label = labels[predictions.argmax().item()]
confidence = predictions.max().item()

print(f"Language: {predicted_label}")
print(f"Confidence: {confidence:.2%}")

Performance

Tested on a diverse dataset of Nigerian texts:

Metric	Score
Overall Accuracy	95.3%
F1 Score (weighted)	95.3%
Inference Speed	~4.5 ms per text (GPU)

Use Cases

🌐 Content moderation - Detect language in user-generated content
📱 Social media analysis - Analyze multilingual Nigerian social media posts
🤖 Chatbots - Route conversations based on detected language
📊 Research - Analyze language distribution in datasets
🎯 Language-specific processing - Trigger different pipelines per language

Training Details

Base Model: afro-xlmr-base
Training Data: Diverse corpus of Nigerian language texts
Model Size: ~1GB

Limitations

The model performs better on long-form text compared to very short phrases
Accuracy may vary with code-switching or heavily dialectal variations
Best results when text is predominantly in one language

Citation

If you use this model in your research, please cite:

@software{padie_extended,
  author = {Olomo, Ayooluwaposi},
  title = {padie-extended: AI-powered Nigerian Language Detection},
  year = {2025},
  url = {https://github.com/posi-olomo/padie-extended}
}

Acknowledgments

Built upon the Padie project by @sir-temi and @pythonisoft
Built with AWS cloud credits generously provided by Dr. Wálé Akínfadérìn
Built with Hugging Face Transformers
Thanks to the Nigerian NLP community

License

This model is licensed under the MIT License.

Made with ❤️ for the Nigerian tech community

Downloads last month: 18

Safetensors

Model size

0.3B params

Tensor type

F32

F16

posi-olomo
/

padie-extended