NagaLLaMA-3.2-3B-Instruct-Merged

NagaLLaMA-3.2-3B-Instruct-Merged is the standalone, merged version of the NagaLLaMA-3.2-3B-Instruct LoRA adapter. It combines the fine-tuned Nagamese weights directly with the Llama-3.2-3B-Instruct base model.

This model is optimized for easier deployment (e.g., vLLM, TGI, or GGUF conversion) as it does not require loading adapters separately. It serves as a general-purpose instruction-following assistant for the Nagamese language (Naga Pidgin/Creole).

Model Details

Developer: Agniva Maiti
Base Model: meta-llama/Llama-3.2-3B-Instruct
Language: Nagamese (nag)
Format: Merged Weights (Safetensors)
Precision: fp16

Training Data

The model was trained on the NagaNLP Conversational Corpus, which contains 10,021 Nagamese instruction-following pairs.

Data Splitting:

Training: 80% (approx. 8,000 samples)
Validation: 10%
Test: 10%

This model corresponds to the final checkpoint trained on 100% of the available training split.

Training Hyperparameters (Original Adapter)

Epochs: 3
Batch Size: 2 (per device) with 8 gradient accumulation steps
Sequence Length: 512
Learning Rate: 2e-4
LoRA Rank (r): 16
LoRA Alpha: 32

Intended Use

This model is intended for:

Chatbots and assistants requiring Nagamese language support.
Direct deployment in inference engines (vLLM, Ollama) without adapter management.
Research into low-resource language modeling.

How to Use

Since this is a merged model, you do not need peft. You can load it directly with transformers.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "agnivamaiti/NagaLLaMA-3.2-3B-Instruct-Merged"

# Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Inference
prompt = "Machine Learning ki ase aru kote use hoi?"

messages = [
    {"role": "user", "content": prompt},
]

# Apply chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=150,      
    do_sample=True,
    temperature=0.3,
    top_k=15,               
    top_p=0.3,
    repetition_penalty=1.2,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations & Safety

Hallucinations: Like all LLMs, this model may generate incorrect information.
Bias: The model inherits biases from the base Llama 3.2 model and the specific dialectal patterns found in the training data.
Critical Use: Not suitable for medical, legal, or financial advice.

Credits

Acknowledgments: Special thanks to the friends who validated the dataset and model outputs, and to RespAI Lab, KIIT for supporting the research and publication of this work.