Model Card for NLLB-200 English-to-Kannada (Fine-Tuned)
This model is a fine-tuned version of facebook/nllb-200-distilled-600M focused on translating text from English to Kannada. It was trained using LoRA (Low-Rank Adaptation) to efficiently adapt the multilingual model to this specific language pair.
Model Details
Model Description
This model improves upon the baseline NLLB-200 capabilities for English-Kannada translation by fine-tuning on a specialized parallel corpus. It utilizes the PEFT library and LoRA to fine-tune the attention layers of the original model while keeping the base model weights frozen.
- Developed by: rajaykumar12959
- Model type: Seq2Seq Transformer (Encoder-Decoder)
- Language(s) (NLP): English (
eng_Latn) → Kannada (kan_Knda) - License: MIT
- Finetuned from model: facebook/nllb-200-distilled-600M
Model Sources
- Dataset: Hemanth-thunder/english-to-kannada-mt
- Base Model Repository: NLLB-200
Uses
Direct Use
The model is intended for direct translation of short-to-medium length texts from English to Kannada. It performs well on general domain sentences.
Out-of-Scope Use
- This model is specialized for English to Kannada. While the base NLLB model is multilingual, this adapter was optimized specifically for this direction.
- It may not perform well on extremely long legal or medical documents without proper chunking.
Bias, Risks, and Limitations
The model inherits the biases present in the NLLB-200 base model and the Hemanth-thunder/english-to-kannada-mt training dataset. Translations should be verified for critical applications.
Recommendations
Users should double-check translations for accuracy, especially for nuanced or culturally specific content.
How to Get Started with the Model
Use the code below to load the model and run inference. You must use the forced_bos_token_id parameter to ensure the output is in Kannada.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# 1. Load Base Model and Tokenizer
base_model_id = "facebook/nllb-200-distilled-600M"
adapter_model_id = "rajaykumar12959/nllb-en-kn-v1" # Replace with your specific repo name
tokenizer = AutoTokenizer.from_pretrained(base_model_id, src_lang="eng_Latn", tgt_lang="kan_Knda")
base_model = AutoModelForSeq2SeqLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# 2. Load the Fine-tuned Adapter
model = PeftModel.from_pretrained(base_model, adapter_model_id)
model.eval()
# 3. Define Translation Function
def translate(text):
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
translated_tokens = model.generate(
**inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids("kan_Knda"), # Vital for NLLB to target Kannada
max_length=128,
num_beams=5,
early_stopping=True
)
return tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
# 4. Test
input_text = "Machine learning is the future of technology."
print(translate(input_text))
Training Details
Training Data
The model was trained on the Hemanth-thunder/english-to-kannada-mt dataset, containing approximately 382k sentence pairs.
Training Procedure
The model was fine-tuned using the peft library with LoRA (Low-Rank Adaptation) and 4-bit quantization via bitsandbytes.
Training Hyperparameters
- Training regime: QLoRA (4-bit quantization)
- Epochs: 1
- Batch Size: 16
- Learning Rate: 1e-4
- LoRA Rank (r): 32
- LoRA Alpha: 32
- LoRA Dropout: 0.05
- Target Modules:
q_proj,v_proj,k_proj,o_proj
Speeds, Sizes, Times
- Hardware: Trained on a T4 GPU (Google Colab).
- Precision: fp16 mixed precision.
Evaluation
Testing Data
A 10% split of the original dataset was held out for validation/testing.
Metrics
Loss was monitored during training. Subjective evaluation shows improved adherence to Kannada grammar compared to the base model's zero-shot performance.
Environmental Impact
- Hardware Type: NVIDIA T4 GPU
- Hours used: < 2 hours
- Cloud Provider: Google Colab
Technical Specifications
Model Architecture and Objective
The model uses the Transformer encoder-decoder architecture. The objective was Sequence-to-Sequence (Seq2Seq) language modeling (Translation).
Software
transformerspeftbitsandbytesaccelerate
Citation
BibTeX:
@misc{nllb200,
title={NLLB: No Language Left Behind},
author={Meta AI},
year={2022},
howpublished={\url{https://github.com/facebookresearch/fairseq/tree/nllb}}
}
Model tree for rajaykumar12959/nllb-en-kn-v1
Base model
facebook/nllb-200-distilled-600M