Model Card for NLLB-200 English-to-Kannada (Fine-Tuned)

This model is a fine-tuned version of facebook/nllb-200-distilled-600M focused on translating text from English to Kannada. It was trained using LoRA (Low-Rank Adaptation) to efficiently adapt the multilingual model to this specific language pair.

Model Details

Model Description

This model improves upon the baseline NLLB-200 capabilities for English-Kannada translation by fine-tuning on a specialized parallel corpus. It utilizes the PEFT library and LoRA to fine-tune the attention layers of the original model while keeping the base model weights frozen.

Developed by: rajaykumar12959
Model type: Seq2Seq Transformer (Encoder-Decoder)
Language(s) (NLP): English (eng_Latn) → Kannada (kan_Knda)
License: MIT
Finetuned from model: facebook/nllb-200-distilled-600M

Model Sources

Dataset: Hemanth-thunder/english-to-kannada-mt
Base Model Repository: NLLB-200

Uses

Direct Use

The model is intended for direct translation of short-to-medium length texts from English to Kannada. It performs well on general domain sentences.

Out-of-Scope Use

This model is specialized for English to Kannada. While the base NLLB model is multilingual, this adapter was optimized specifically for this direction.
It may not perform well on extremely long legal or medical documents without proper chunking.

Bias, Risks, and Limitations

The model inherits the biases present in the NLLB-200 base model and the Hemanth-thunder/english-to-kannada-mt training dataset. Translations should be verified for critical applications.

Recommendations

Users should double-check translations for accuracy, especially for nuanced or culturally specific content.

How to Get Started with the Model

Use the code below to load the model and run inference. You must use the forced_bos_token_id parameter to ensure the output is in Kannada.

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# 1. Load Base Model and Tokenizer
base_model_id = "facebook/nllb-200-distilled-600M"
adapter_model_id = "rajaykumar12959/nllb-en-kn-v1" # Replace with your specific repo name

tokenizer = AutoTokenizer.from_pretrained(base_model_id, src_lang="eng_Latn", tgt_lang="kan_Knda")
base_model = AutoModelForSeq2SeqLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 2. Load the Fine-tuned Adapter
model = PeftModel.from_pretrained(base_model, adapter_model_id)
model.eval()

# 3. Define Translation Function
def translate(text):
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        translated_tokens = model.generate(
            **inputs,
            forced_bos_token_id=tokenizer.convert_tokens_to_ids("kan_Knda"), # Vital for NLLB to target Kannada
            max_length=128,
            num_beams=5,
            early_stopping=True
        )
    return tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]

# 4. Test
input_text = "Machine learning is the future of technology."
print(translate(input_text))

Training Details

Training Data

The model was trained on the Hemanth-thunder/english-to-kannada-mt dataset, containing approximately 382k sentence pairs.

Training Procedure

The model was fine-tuned using the peft library with LoRA (Low-Rank Adaptation) and 4-bit quantization via bitsandbytes.

Training Hyperparameters

Training regime: QLoRA (4-bit quantization)
Epochs: 1
Batch Size: 16
Learning Rate: 1e-4
LoRA Rank (r): 32
LoRA Alpha: 32
LoRA Dropout: 0.05
Target Modules: q_proj, v_proj, k_proj, o_proj

Speeds, Sizes, Times

Hardware: Trained on a T4 GPU (Google Colab).
Precision: fp16 mixed precision.

Evaluation

Testing Data

A 10% split of the original dataset was held out for validation/testing.

Metrics

Loss was monitored during training. Subjective evaluation shows improved adherence to Kannada grammar compared to the base model's zero-shot performance.

Environmental Impact

Hardware Type: NVIDIA T4 GPU
Hours used: < 2 hours
Cloud Provider: Google Colab

Technical Specifications

Model Architecture and Objective

The model uses the Transformer encoder-decoder architecture. The objective was Sequence-to-Sequence (Seq2Seq) language modeling (Translation).

Software

transformers
peft
bitsandbytes
accelerate

Citation

BibTeX:

@misc{nllb200,
  title={NLLB: No Language Left Behind},
  author={Meta AI},
  year={2022},
  howpublished={\url{https://github.com/facebookresearch/fairseq/tree/nllb}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for rajaykumar12959/nllb-en-kn-v1

Base model

facebook/nllb-200-distilled-600M

Adapter

(41)

this model

rajaykumar12959
/

nllb-en-kn-v1