carlosdanielhernandezmena commited on
Commit
2a21dfe
1 Parent(s): 2961748

Adding the info to the README file

Browse files
Files changed (1) hide show
  1. README.md +191 -0
README.md CHANGED
@@ -1,3 +1,194 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: is
3
+ datasets:
4
+ - samromur_unverified_data_967h
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - icelandic
9
+ - whisper
10
+ - whisper-large
11
+ - iceland
12
+ - reykjavik
13
+ - samromur
14
  license: cc-by-4.0
15
+ widget:
16
+ model-index:
17
+ - name: whisper-large-icelandic-62640-steps-967h
18
+ results:
19
+ - task:
20
+ name: Automatic Speech Recognition
21
+ type: automatic-speech-recognition
22
+ dataset:
23
+ name: Samrómur (Test)
24
+ type: language-and-voice-lab/samromur_asr
25
+ split: test
26
+ args:
27
+ language: is
28
+ metrics:
29
+ - name: WER
30
+ type: wer
31
+ value: 7.762
32
+ - task:
33
+ name: Automatic Speech Recognition
34
+ type: automatic-speech-recognition
35
+ dataset:
36
+ name: Samrómur (Dev)
37
+ type: language-and-voice-lab/samromur_asr
38
+ split: validation
39
+ args:
40
+ language: is
41
+ metrics:
42
+ - name: WER
43
+ type: wer
44
+ value: 7.035
45
+ - task:
46
+ name: Automatic Speech Recognition
47
+ type: automatic-speech-recognition
48
+ dataset:
49
+ name: Samrómur Children (Test)
50
+ type: language-and-voice-lab/samromur_children
51
+ split: test
52
+ args:
53
+ language: is
54
+ metrics:
55
+ - name: WER
56
+ type: wer
57
+ value: 7.047
58
+ - task:
59
+ name: Automatic Speech Recognition
60
+ type: automatic-speech-recognition
61
+ dataset:
62
+ name: Samrómur Children (Dev)
63
+ type: language-and-voice-lab/samromur_children
64
+ split: validation
65
+ args:
66
+ language: is
67
+ metrics:
68
+ - name: WER
69
+ type: wer
70
+ value: 4.425
71
+ - task:
72
+ name: Automatic Speech Recognition
73
+ type: automatic-speech-recognition
74
+ dataset:
75
+ name: Malrómur (Test)
76
+ type: language-and-voice-lab/malromur_asr
77
+ split: test
78
+ args:
79
+ language: is
80
+ metrics:
81
+ - name: WER
82
+ type: wer
83
+ value: 11.511
84
+ - task:
85
+ name: Automatic Speech Recognition
86
+ type: automatic-speech-recognition
87
+ dataset:
88
+ name: Malrómur (Dev)
89
+ type: language-and-voice-lab/malromur_asr
90
+ split: validation
91
+ args:
92
+ language: is
93
+ metrics:
94
+ - name: WER
95
+ type: wer
96
+ value: 11.000
97
+ - task:
98
+ name: Automatic Speech Recognition
99
+ type: automatic-speech-recognition
100
+ dataset:
101
+ name: Althingi (Test)
102
+ type: language-and-voice-lab/althingi_asr
103
+ split: test
104
+ args:
105
+ language: is
106
+ metrics:
107
+ - name: WER
108
+ type: wer
109
+ value: 16.189
110
+ - task:
111
+ name: Automatic Speech Recognition
112
+ type: automatic-speech-recognition
113
+ dataset:
114
+ name: Althingi (Dev)
115
+ type: language-and-voice-lab/althingi_asr
116
+ split: validation
117
+ args:
118
+ language: is
119
+ metrics:
120
+ - name: WER
121
+ type: wer
122
+ value: 16.007
123
  ---
124
+ # whisper-large-icelandic-62640-steps-967h
125
+
126
+ The "whisper-large-icelandic-62640-steps-967h" is an acoustic model suitable for Automatic Speech Recognition in Icelandic. It is the result of fine-tuning the model ["openai/whisper-large"](https://huggingface.co/openai/whisper-large) for 62,640 steps with 967 hours of Icelandic data collected by the [Language and Voice Laboratory](https://huggingface.co/language-and-voice-lab) through the platform [Samrómur](https://samromur.is/).
127
+
128
+ The specific data that was used to fine-tune the model is not publicly available at the moment but it is the result of the automatic verification of 1 million of recordings comming from [Samrómur](https://samromur.is/). It has to be pointed out that this model was trained with different data than our previous model [whisper-large-icelandic-30k-steps-1000h](https://huggingface.co/language-and-voice-lab/whisper-large-icelandic-30k-steps-1000h).
129
+
130
+ The fine-tuning process was performed during June (2023) in the servers of the Language and Voice Laboratory (https://lvl.ru.is/) at Reykjavík University (Iceland) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).
131
+
132
+ # Evaluation
133
+ ```python
134
+ import torch
135
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
136
+
137
+ #Load the processor and model.
138
+ MODEL_NAME="language-and-voice-lab/whisper-large-icelandic-62640-steps-967h"
139
+ processor = WhisperProcessor.from_pretrained(MODEL_NAME)
140
+ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
141
+
142
+ #Load the dataset
143
+ from datasets import load_dataset, load_metric, Audio
144
+ ds=load_dataset("language-and-voice-lab/samromur_children",split='test')
145
+
146
+ #Downsample to 16kHz
147
+ ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
148
+
149
+ #Process the dataset
150
+ def map_to_pred(batch):
151
+ audio = batch["audio"]
152
+ input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
153
+ batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
154
+
155
+ with torch.no_grad():
156
+ predicted_ids = model.generate(input_features.to("cuda"))[0]
157
+
158
+ transcription = processor.decode(predicted_ids)
159
+ batch["prediction"] = processor.tokenizer._normalize(transcription)
160
+
161
+ return batch
162
+
163
+ #Do the evaluation
164
+ result = ds.map(map_to_pred)
165
+
166
+ #Compute the overall WER now.
167
+ from evaluate import load
168
+
169
+ wer = load("wer")
170
+ WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
171
+ print(WER)
172
+ ```
173
+ **Test Result**: 7.743795695602924
174
+
175
+ # BibTeX entry and citation info
176
+ *When publishing results based on these models please refer to:*
177
+ ```bibtex
178
+ @misc{mena2023whisperlarge62640icelandic,
179
+ title={Acoustic Model in Icelandic: whisper-large-icelandic-62640-steps-967h.},
180
+ author={Hernandez Mena, Carlos Daniel},
181
+ year={2023},
182
+ url={https://huggingface.co/language-and-voice-lab/whisper-large-icelandic-62640-steps-967h},
183
+ }
184
+ ```
185
+
186
+ # Acknowledgements
187
+
188
+ Thanks to Jón Guðnason, head of the Language and Voice Lab for providing computational power to make this model possible.
189
+
190
+ We also want to thank to the "Language Technology Programme for Icelandic 2019-2023" which is managed and coordinated by Almannarómur, and it is funded by the Icelandic Ministry of Education, Science and Culture. This model is an unexpected result of all the resources gathered by the Programme.
191
+
192
+ Special thanks to Björn Ingi Stefánsson for setting up the configuration of the server where this model was trained.
193
+
194
+