minor change
Browse files- description1.md +5 -5
description1.md
CHANGED
|
@@ -7,10 +7,10 @@ From our findings, we need approximately 1/3 memory under ideal conditions (F, B
|
|
| 7 |
Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
|
| 8 |
|
| 9 |
|
| 10 |
-
|
|
| 11 |
-
|
| 12 |
-
| Bubble Rate
|
| 13 |
-
| Activation Memory <br> (
|
| 14 |
|
| 15 |
|
| 16 |
-
Bubble Rate here is calculated as `1 - (F+B+W)*m / longest_stage_time`.
|
|
|
|
| 7 |
Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
|
| 8 |
|
| 9 |
|
| 10 |
+
| Method | 1F1B | V-Min | V-Half | V-ZB |
|
| 11 |
+
|------------------------------------------|-------|----------|----------| ---- |
|
| 12 |
+
| Bubble Rate <br> (assuming T_F=T_B=T_W) | ~ p/m | ~ 2p/3m | ~ p/ 2m | 0 |
|
| 13 |
+
| Activation Memory <br> (by #micro-batch) | p | (p+4)//3 | (p+2)//2 | p |
|
| 14 |
|
| 15 |
|
| 16 |
+
Bubble Rate here is calculated as `1 - (F+B+W)*m / longest_stage_time`.
|