Model Card for NLLB-200 English-to-Kannada (Fine-Tuned)

This model is a fine-tuned version of facebook/nllb-200-distilled-600M focused on translating text from English to Kannada. It was trained using LoRA (Low-Rank Adaptation) to efficiently adapt the multilingual model to this specific language pair.

Model Details

Model Description

This model improves upon the baseline NLLB-200 capabilities for English-Kannada translation by fine-tuning on a specialized parallel corpus. It utilizes the PEFT library and LoRA to fine-tune the attention layers of the original model while keeping the base model weights frozen.

  • Developed by: rajaykumar12959
  • Model type: Seq2Seq Transformer (Encoder-Decoder)
  • Language(s) (NLP): English (eng_Latn) → Kannada (kan_Knda)
  • License: MIT
  • Finetuned from model: facebook/nllb-200-distilled-600M

Model Sources

Uses

Direct Use

The model is intended for direct translation of short-to-medium length texts from English to Kannada. It performs well on general domain sentences.

Out-of-Scope Use

  • This model is specialized for English to Kannada. While the base NLLB model is multilingual, this adapter was optimized specifically for this direction.
  • It may not perform well on extremely long legal or medical documents without proper chunking.

Bias, Risks, and Limitations

The model inherits the biases present in the NLLB-200 base model and the Hemanth-thunder/english-to-kannada-mt training dataset. Translations should be verified for critical applications.

Recommendations

Users should double-check translations for accuracy, especially for nuanced or culturally specific content.

How to Get Started with the Model

Use the code below to load the model and run inference. You must use the forced_bos_token_id parameter to ensure the output is in Kannada.

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# 1. Load Base Model and Tokenizer
base_model_id = "facebook/nllb-200-distilled-600M"
adapter_model_id = "rajaykumar12959/nllb-en-kn-v1" # Replace with your specific repo name

tokenizer = AutoTokenizer.from_pretrained(base_model_id, src_lang="eng_Latn", tgt_lang="kan_Knda")
base_model = AutoModelForSeq2SeqLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# 2. Load the Fine-tuned Adapter
model = PeftModel.from_pretrained(base_model, adapter_model_id)
model.eval()

# 3. Define Translation Function
def translate(text):
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        translated_tokens = model.generate(
            **inputs,
            forced_bos_token_id=tokenizer.convert_tokens_to_ids("kan_Knda"), # Vital for NLLB to target Kannada
            max_length=128,
            num_beams=5,
            early_stopping=True
        )
    return tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]

# 4. Test
input_text = "Machine learning is the future of technology."
print(translate(input_text))

Training Details

Training Data

The model was trained on the Hemanth-thunder/english-to-kannada-mt dataset, containing approximately 382k sentence pairs.

Training Procedure

The model was fine-tuned using the peft library with LoRA (Low-Rank Adaptation) and 4-bit quantization via bitsandbytes.

Training Hyperparameters

  • Training regime: QLoRA (4-bit quantization)
  • Epochs: 1
  • Batch Size: 16
  • Learning Rate: 1e-4
  • LoRA Rank (r): 32
  • LoRA Alpha: 32
  • LoRA Dropout: 0.05
  • Target Modules: q_proj, v_proj, k_proj, o_proj

Speeds, Sizes, Times

  • Hardware: Trained on a T4 GPU (Google Colab).
  • Precision: fp16 mixed precision.

Evaluation

Testing Data

A 10% split of the original dataset was held out for validation/testing.

Metrics

Loss was monitored during training. Subjective evaluation shows improved adherence to Kannada grammar compared to the base model's zero-shot performance.

Environmental Impact

  • Hardware Type: NVIDIA T4 GPU
  • Hours used: < 2 hours
  • Cloud Provider: Google Colab

Technical Specifications

Model Architecture and Objective

The model uses the Transformer encoder-decoder architecture. The objective was Sequence-to-Sequence (Seq2Seq) language modeling (Translation).

Software

  • transformers
  • peft
  • bitsandbytes
  • accelerate

Citation

BibTeX:

@misc{nllb200,
  title={NLLB: No Language Left Behind},
  author={Meta AI},
  year={2022},
  howpublished={\url{https://github.com/facebookresearch/fairseq/tree/nllb}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rajaykumar12959/nllb-en-kn-v1

Adapter
(41)
this model

Dataset used to train rajaykumar12959/nllb-en-kn-v1