馃懇馃徔 padie-extended Language Detection Model

This is the trained model for padie-extended, a language detection system for Nigerian languages including English, Nigerian Pidgin, Yoruba, Hausa, and Igbo.

Model Description

This transformer-based model is fine-tuned on afro-xlmr-base to accurately detect and classify text across five Nigerian languages.

Developed by: Ayooluwaposi Olomo
Model type: Text Classification / Language Detection
Language(s): English, Nigerian Pidgin, Yoruba, Hausa, Igbo
License: MIT
Base model: afro-xlmr-base

Supported Languages

Language Code Example
English en "Hello, how are you?"
Nigerian Pidgin pidgin "How you dey?"
Yoruba yo "Bawo ni?"
Hausa ha "Sannu"
Igbo ig "Kedu?"

Usage

Using the padie-extended Package (Recommended)

pip install padie-extended
from padie_extended import LanguageDetector

# Initialize the detector
detector = LanguageDetector()

# Detect language from text
text = "Bawo ni, se daadaa ni?"
result = detector.predict(text)

print(f"Language: {result['language']}")
print(f"Confidence: {result['confidence']:.2%}")

Direct Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "posi-olomo/padie-extended"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict
text = "How you dey?"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get the predicted language
labels = ['english', 'pidgin', 'yoruba', 'hausa', 'igbo']
predicted_label = labels[predictions.argmax().item()]
confidence = predictions.max().item()

print(f"Language: {predicted_label}")
print(f"Confidence: {confidence:.2%}")

Performance

Tested on a diverse dataset of Nigerian texts:

Metric Score
Overall Accuracy 95.3%
F1 Score (weighted) 95.3%
Inference Speed ~4.5 ms per text (GPU)

Use Cases

  • 馃寪 Content moderation - Detect language in user-generated content
  • 馃摫 Social media analysis - Analyze multilingual Nigerian social media posts
  • 馃 Chatbots - Route conversations based on detected language
  • 馃搳 Research - Analyze language distribution in datasets
  • 馃幆 Language-specific processing - Trigger different pipelines per language

Training Details

  • Base Model: afro-xlmr-base
  • Training Data: Diverse corpus of Nigerian language texts
  • Model Size: ~1GB

Limitations

  • The model performs better on long-form text compared to very short phrases
  • Accuracy may vary with code-switching or heavily dialectal variations
  • Best results when text is predominantly in one language

Citation

If you use this model in your research, please cite:

@software{padie_extended,
  author = {Olomo, Ayooluwaposi},
  title = {padie-extended: AI-powered Nigerian Language Detection},
  year = {2025},
  url = {https://github.com/posi-olomo/padie-extended}
}

Acknowledgments

Links

License

This model is licensed under the MIT License.


Made with 鉂わ笍 for the Nigerian tech community

Downloads last month
18
Safetensors
Model size
0.3B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support