Arm
/

stt_en_conformer_executorch_small

audioprocessing

Model card Files Files and versions

gekkov commited on Oct 22

Commit

f2bee07

·

verified ·

1 Parent(s): e54e045

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ Conformer is a popular Neural Network for speech recognition. This repository co
 - **Language(s) (NLP):** English
 - **License:** BigScience OpenRAIL-M v1.1
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
@@ -116,7 +116,7 @@ We used the LibriSpeech 960h dataset. The dataset is composed of 460h of clean a
 If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
 We used an AWS g5.24xlarge instance to train the NN.
-#### Preprocessing [optional]
 We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
 The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.

 - **Language(s) (NLP):** English
 - **License:** BigScience OpenRAIL-M v1.1
+### Model Sources
 <!-- Provide the basic links for the model. -->
 If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
 We used an AWS g5.24xlarge instance to train the NN.
+#### Preprocessing
 We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
 The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.