ayymen commited on
Commit
404246f
1 Parent(s): 116fd9c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zgh
4
+ - kab
5
+ - shi
6
+ - rif
7
+ - tzm
8
+ license: cc-by-4.0
9
+ library_name: nemo
10
+ datasets:
11
+ - mozilla-foundation/common_voice_17_0
12
+ thumbnail: null
13
+ tags:
14
+ - automatic-speech-recognition
15
+ - speech
16
+ - audio
17
+ - TDT
18
+ - FastConformer
19
+ - Transducer
20
+ - NeMo
21
+ - pytorch
22
+ model-index:
23
+ - name: stt_zgh_fastconformer_transducer_small
24
+ results:
25
+ - task:
26
+ type: Automatic Speech Recognition
27
+ name: automatic-speech-recognition
28
+ dataset:
29
+ name: Mozilla Common Voice 17.0
30
+ type: mozilla-foundation/common_voice_17_0
31
+ config: zgh
32
+ split: test
33
+ args:
34
+ language: zgh
35
+ metrics:
36
+ - name: Test WER
37
+ type: wer
38
+ value: 72.44
39
+ - task:
40
+ type: Automatic Speech Recognition
41
+ name: automatic-speech-recognition
42
+ dataset:
43
+ name: Mozilla Common Voice 17.0
44
+ type: mozilla-foundation/common_voice_17_0
45
+ config: zgh
46
+ split: test
47
+ args:
48
+ language: zgh
49
+ metrics:
50
+ - name: Test CER
51
+ type: cer
52
+ value: 26.56
53
+ - task:
54
+ type: Automatic Speech Recognition
55
+ name: automatic-speech-recognition
56
+ dataset:
57
+ name: Mozilla Common Voice 17.0
58
+ type: mozilla-foundation/common_voice_17_0
59
+ config: kab
60
+ split: test
61
+ args:
62
+ language: kab
63
+ metrics:
64
+ - name: Test WER
65
+ type: wer
66
+ value: 39.78
67
+ - task:
68
+ type: Automatic Speech Recognition
69
+ name: automatic-speech-recognition
70
+ dataset:
71
+ name: Mozilla Common Voice 17.0
72
+ type: mozilla-foundation/common_voice_17_0
73
+ config: kab
74
+ split: test
75
+ args:
76
+ language: kab
77
+ metrics:
78
+ - name: Test CER
79
+ type: cer
80
+ value: 15.81
81
+ metrics:
82
+ - wer
83
+ - cer
84
+ pipeline_tag: automatic-speech-recognition
85
+ ---
86
+ ## Model Overview
87
+
88
+ <DESCRIBE IN ONE LINE THE MODEL AND ITS USE>
89
+
90
+ ## NVIDIA NeMo: Training
91
+
92
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
93
+ ```
94
+ pip install nemo_toolkit['asr']
95
+ ```
96
+
97
+ ## How to Use this Model
98
+
99
+ The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
100
+
101
+ ### Automatically instantiate the model
102
+
103
+ ```python
104
+ import nemo.collections.asr as nemo_asr
105
+ asr_model = nemo_asr.models.ASRModel.from_pretrained("ayymen/stt_zgh_fastconformer_transducer_small")
106
+ ```
107
+
108
+ ### Transcribing using Python
109
+ First, let's get a sample
110
+ ```
111
+ wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
112
+ ```
113
+ Then simply do:
114
+ ```
115
+ asr_model.transcribe(['2086-149220-0033.wav'])
116
+ ```
117
+
118
+ ### Transcribing many audio files
119
+
120
+ ```shell
121
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="ayymen/stt_zgh_fastconformer_transducer_small" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
122
+ ```
123
+
124
+ ### Input
125
+
126
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
127
+
128
+ ### Output
129
+
130
+ This model provides transcribed speech as a string for a given audio sample.
131
+
132
+ ## Model Architecture
133
+
134
+ <ADD SOME INFORMATION ABOUT THE ARCHITECTURE>
135
+
136
+ ## Training
137
+
138
+ The model was trained for 42 epochs on a NVIDIA GeForce RTX 4050 Laptop GPU.
139
+
140
+ ### Datasets
141
+
142
+ Common Voice 17 *kab* and *zgh* splits plus bible readings in Tachelhit and Tarifit.
143
+
144
+ ## Performance
145
+
146
+ Metrics are computed on the cleaned, non-punctuated test sets of *zgh* and *kab* (converted to Tifinagh).
147
+
148
+ ## Limitations
149
+
150
+ <DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>
151
+
152
+ Eg:
153
+ Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
154
+
155
+
156
+ ## References
157
+
158
+ <ADD ANY REFERENCES HERE AS NEEDED>
159
+
160
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)