Update README.md
Browse files
README.md
CHANGED
|
@@ -33,7 +33,7 @@ We continual pretrain on the expanded vocabulary [homebrewltd/llama3.2-3B-s-whis
|
|
| 33 |
## Training process
|
| 34 |
**Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
|
| 35 |
|
| 36 |
-
 library for the lat
|
|
| 63 |
| **Epoch** | 1 |
|
| 64 |
| **Global batch size** | 480 |
|
| 65 |
| **Learning Rate** | 2e-4 |
|
| 66 |
-
| **Learning Scheduler** |
|
| 67 |
| **Optimizer** | AdamW fused |
|
| 68 |
-
| **Warmup Steps** |
|
| 69 |
| **Weight Decay** | 0.01 |
|
| 70 |
| **Max Sequence Length** | 512 |
|
| 71 |
|
|
|
|
| 33 |
## Training process
|
| 34 |
**Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
|
| 35 |
|
| 36 |
+

|
| 37 |
|
| 38 |
**MMLU**:
|
| 39 |
|
|
|
|
| 63 |
| **Epoch** | 1 |
|
| 64 |
| **Global batch size** | 480 |
|
| 65 |
| **Learning Rate** | 2e-4 |
|
| 66 |
+
| **Learning Scheduler** | LambdaLR with warmup |
|
| 67 |
| **Optimizer** | AdamW fused |
|
| 68 |
+
| **Warmup Steps** | 80 |
|
| 69 |
| **Weight Decay** | 0.01 |
|
| 70 |
| **Max Sequence Length** | 512 |
|
| 71 |
|