gekkov commited on
Commit
f2bee07
·
verified ·
1 Parent(s): e54e045

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -34,7 +34,7 @@ Conformer is a popular Neural Network for speech recognition. This repository co
34
  - **Language(s) (NLP):** English
35
  - **License:** BigScience OpenRAIL-M v1.1
36
 
37
- ### Model Sources [optional]
38
 
39
  <!-- Provide the basic links for the model. -->
40
 
@@ -116,7 +116,7 @@ We used the LibriSpeech 960h dataset. The dataset is composed of 460h of clean a
116
  If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
117
  We used an AWS g5.24xlarge instance to train the NN.
118
 
119
- #### Preprocessing [optional]
120
 
121
  We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
122
  The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
 
34
  - **Language(s) (NLP):** English
35
  - **License:** BigScience OpenRAIL-M v1.1
36
 
37
+ ### Model Sources
38
 
39
  <!-- Provide the basic links for the model. -->
40
 
 
116
  If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
117
  We used an AWS g5.24xlarge instance to train the NN.
118
 
119
+ #### Preprocessing
120
 
121
  We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
122
  The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.