Update README.md
Browse files
README.md
CHANGED
|
@@ -53,11 +53,11 @@ pipeline_tag: text-generation
|
|
| 53 |
|
| 54 |
## Model Description
|
| 55 |
|
| 56 |
-
**SUHAIL-14B-preview** extends the open-weight **Qwen-3-14B-Base** to support Arabic instruction-following using **Low-Rank Adaptation (LoRA)**. LoRA introduces small trainable matrices to linear layers, keeping base weights frozen鈥攅nabling compact, efficient fine-tuning.
|
| 57 |
|
| 58 |
### 1 路 Supervised Fine-Tuning (SFT)
|
| 59 |
|
| 60 |
-
We first conducted SFT on a high-quality instruction dataset in Arabic and English. This dataset was curated using **Style-Aligned Response Ranking**, a RoBERTa-based reranker that filters out stylistically incoherent or low-quality samples from the
|
| 61 |
> **Result**: Up to 22% performance improvements observed on internal benchmarks (e.g., IFEVAL).
|
| 62 |
|
| 63 |
### 2 路 Human Preference Alignment
|
|
|
|
| 53 |
|
| 54 |
## Model Description
|
| 55 |
|
| 56 |
+
**SUHAIL-14B-preview** extends the open-weight **Qwen-3-14B-Base** to better support Arabic instruction-following using **Low-Rank Adaptation (LoRA)**. LoRA introduces small trainable matrices to linear layers as well as attention layers, keeping base weights frozen鈥攅nabling compact, efficient fine-tuning.
|
| 57 |
|
| 58 |
### 1 路 Supervised Fine-Tuning (SFT)
|
| 59 |
|
| 60 |
+
We first conducted SFT on a high-quality instruction dataset in Arabic and English. This dataset was curated using **Style-Aligned Response Ranking**, a RoBERTa-based reranker that filters out stylistically incoherent or low-quality samples from the Instruction-Tuning corpus. This step enhanced factuality and stylistic consistency.
|
| 61 |
> **Result**: Up to 22% performance improvements observed on internal benchmarks (e.g., IFEVAL).
|
| 62 |
|
| 63 |
### 2 路 Human Preference Alignment
|