Menlo
/

mini-Ichigo-llama3.2-3B-s-base

Audio-Text-to-Text

sound language model

Model card Files Files and versions

jan-hq commited on Oct 14, 2024

Commit

6bb9783

·

verified ·

1 Parent(s): 2d1a375

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -33,7 +33,7 @@ We continual pretrain on the expanded vocabulary [homebrewltd/llama3.2-3B-s-whis
 ## Training process
 **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/gtpDSs750SkMPJO0-UtFq.png)
 **MMLU**:
@@ -63,9 +63,9 @@ We utilize [torchtune](https://github.com/pytorch/torchtune) library for the lat
 | **Epoch**                  | 1                       |
 | **Global batch size**      | 480                     |
 | **Learning Rate**          | 2e-4                    |
-| **Learning Scheduler**     | Cosine with warmup      |
 | **Optimizer**              | AdamW fused             |
-| **Warmup Steps**           | 50                      |
 | **Weight Decay**           | 0.01                    |
 | **Max Sequence Length**    | 512                     |

 ## Training process
 **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/etosaWAQ8TASXOEUADGpi.png)
 **MMLU**:
 | **Epoch**                  | 1                       |
 | **Global batch size**      | 480                     |
 | **Learning Rate**          | 2e-4                    |
+| **Learning Scheduler**     | LambdaLR with warmup    |
 | **Optimizer**              | AdamW fused             |
+| **Warmup Steps**           | 80                      |
 | **Weight Decay**           | 0.01                    |
 | **Max Sequence Length**    | 512                     |