---
license: cc-by-nc-4.0
language:
- ar
- en
base_model:
- Qwen/Qwen3-14B-Base
pipeline_tag: text-generation
---

# SUHAIL-14B-preview

> **14B Arabic LLM – LoRA fine-tuned from Qwen-3-14B-Base for instruction following and human-preference alignment**

---

## TL;DR

- **Base model**: Qwen-3-14B-Base (Transformer decoder, Rotary Positional Embeddings)  
- **Fine-tuning**: Two-stage **Low-Rank Adaptation (LoRA)**  
  1. **Supervised Fine-Tuning (SFT)** on a curated Arabic/English instruction dataset  
  2. **Human Preference Alignment** using binary accept/reject feedback  
- **Data selection**: Employed a **state-of-the-art encoder-based reranker** to filter the Efficient Instruction-Tuning corpus via **Style-Aligned Response Ranking**, retaining only stylistically coherent, high-quality samples  
- **Context window**: 32k tokens  
- **License**: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)  
- **Intended use**: Arabic content generation, multi-turn tool use (Agentic System), conversational agents, educational tools, and research (non-commercial only)  
- **Training samples**: 33k (SFT), 66k (human preference alignment)  
- **Training cost**: Less than $500  

---

## Table of Contents

1. [Model Description](#model-description)  
2. [Quick Start](#quick-start)  
3. [Limitations & Biases](#limitations--biases)  
4. [License](#license)  
5. [Citation](#citation)  
6. [Changelog](#changelog)  

---

## Table of contents  

1. [Model description](#model-description)  
2. [Quick start](#quick-start)  
3. [Limitations & biases](#limitations--biases)  
4. [License](#license)  
5. [Citation](#citation)  
6. [Changelog](#changelog)  

---

## Model Description

**SUHAIL-14B-preview** extends the open-weight **Qwen-3-14B-Base** to better support Arabic instruction-following using **Low-Rank Adaptation (LoRA)**. LoRA introduces small trainable matrices to linear layers as well as attention layers, keeping base weights frozen—enabling compact, efficient fine-tuning. 

### 1 · Supervised Fine-Tuning (SFT)

We first conducted SFT on a high-quality instruction dataset in Arabic and English. This dataset was curated using **Style-Aligned Response Ranking**, a RoBERTa-based reranker that filters out stylistically incoherent or low-quality samples from the Instruction-Tuning corpus. This step enhanced factuality and stylistic consistency.  
> **Result**: Up to 22% performance improvements observed on internal benchmarks (e.g., IFEVAL).

### 2 · Human Preference Alignment

To align model behavior with user intent, we applied preference optimization using binary accept/reject feedback. This direct signal training guides the model toward generating helpful, honest, and harmless outputs, at low alignment cost.

### 3 · Integration of Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models in Verifiable and Auditable Environments (TO-DO)

### 4 · Benchmarks (TO-DO)

> *Explicit benchmark scores are not yet included. We encourage users to evaluate the model in their specific contexts.*
---

## Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda:0"
model_id = "01-ZeroOne/SUHAIL-14B-preview"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "اكتب ملخصًا بسيطًا عن الإنترنت باللغة العربية."
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

*The LoRA adapters are merged into the checkpoint on the Hub for ease of use.*  

---

## Limitations & biases  

* **Factual reliability** – hallucinations remain. Verify critical information.  
* **Dialect coverage** – best on Gulf & Egyptian Arabic; less data for Maghrebi and Levantine.  
* **Code completeness** – suitable for small code snippets, but not guaranteed bug-free.
* **Agentic Function Calling Coverage** – Preliminary support included in SFT. Future updates aim to enhance reasoning and structured API calling capabilities.

---

## License  

Released under the **Creative Commons Attribution-NonCommercial 4.0 International** (CC BY-NC 4.0) — non-commercial use only.  

---

## Citation  

```bibtex
@software{Suhail2025,
  author = {ZeroOne AI},
  title  = {SUHAIL-14B-preview},
  year   = {2025},
  url    = {https://huggingface.co/01-ZeroOne/SUHAIL-14B-preview}
}
```  

---

## Changelog  

| Version | Date       | Notes                                                                                                                      |
| ------- | ---------- | -------------------------------------------------------------------------------------------------------------------------- |
| **v0.1**| 2025-07-05 | Initial public LoRA-merged release (SFT + human-preference alignment; data filtered with Style-Aligned Response Ranking) |  

---

Maintained by **Mohammed Almaghrabi**, Founder of **ZeroOne AI**. This work was supported by **Khalid Alharbi** — contributions are welcome! To contribute, please email: almaghrabima@gmail.com