Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ Conformer is a popular Neural Network for speech recognition. This repository co
|
|
| 34 |
- **Language(s) (NLP):** English
|
| 35 |
- **License:** BigScience OpenRAIL-M v1.1
|
| 36 |
|
| 37 |
-
### Model Sources
|
| 38 |
|
| 39 |
<!-- Provide the basic links for the model. -->
|
| 40 |
|
|
@@ -116,7 +116,7 @@ We used the LibriSpeech 960h dataset. The dataset is composed of 460h of clean a
|
|
| 116 |
If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
|
| 117 |
We used an AWS g5.24xlarge instance to train the NN.
|
| 118 |
|
| 119 |
-
#### Preprocessing
|
| 120 |
|
| 121 |
We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
|
| 122 |
The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
|
|
|
|
| 34 |
- **Language(s) (NLP):** English
|
| 35 |
- **License:** BigScience OpenRAIL-M v1.1
|
| 36 |
|
| 37 |
+
### Model Sources
|
| 38 |
|
| 39 |
<!-- Provide the basic links for the model. -->
|
| 40 |
|
|
|
|
| 116 |
If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
|
| 117 |
We used an AWS g5.24xlarge instance to train the NN.
|
| 118 |
|
| 119 |
+
#### Preprocessing
|
| 120 |
|
| 121 |
We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
|
| 122 |
The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
|