metadata

base_model: masoudmzb/wav2vec2-xlsr-multilingual-53-fa
metrics:
  - wer
widget:
  - example_title: Common Voice sample 1
    src: >-
      https://huggingface.co/m3hrdadfi/wav2vec2-large-xlsr-persian-v3/resolve/main/sample1.flac
  - example_title: Common Voice sample 2978
    src: >-
      https://huggingface.co/m3hrdadfi/wav2vec2-large-xlsr-persian-v3/resolve/main/sample2978.flac
  - example_title: Common Voice sample 5168
    src: >-
      https://huggingface.co/m3hrdadfi/wav2vec2-large-xlsr-persian-v3/resolve/main/sample5168.flac
model-index:
  - name: wav2vec2-large-xlsr-persian-asr-shemo_me7494
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 13.0 fa
          type: common_voice_13_0
          args: fa
        metrics:
          - name: Test WER
            type: wer
            value: 19.21
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: ShEMO
          type: shemo
          args: fa
        metrics:
          - name: Test WER
            type: wer
            value: 32.85
language:
  - fa
pipeline_tag: automatic-speech-recognition
tags:
  - audio
  - speech
  - automatic-speech-recognition
  - asr

Wav2Vec2 Large XLSR Persian ShEMO

This model is a fine-tuned version of masoudmzb/wav2vec2-xlsr-multilingual-53-fa on the ShEMO dataset for speech recognition in Persian (Farsi). When using this model, make sure that your speech input is sampled at 16 kHz.

It achieves the following results:

Loss on ShEMO train set: 0.7618
Loss on ShEMO dev set: 0.6728
WER on ShEMO train set: 30.47
WER on ShEMO dev set: 32.85
WER on Common Voice 13 test set: 19.21

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 2000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.8553	0.62	100	1.4126	0.4866
1.4083	1.25	200	1.0428	0.4366
1.1718	1.88	300	0.8683	0.4127
0.9919	2.5	400	0.7921	0.3919
0.9493	3.12	500	0.7676	0.3744
0.9414	3.75	600	0.7247	0.3695
0.8897	4.38	700	0.7202	0.3598
0.8716	5.0	800	0.7096	0.3546
0.8467	5.62	900	0.7023	0.3499
0.8227	6.25	1000	0.6994	0.3411
0.855	6.88	1100	0.6883	0.3432
0.8457	7.5	1200	0.6773	0.3426
0.7614	8.12	1300	0.6913	0.3344
0.8127	8.75	1400	0.6827	0.3335
0.8443	9.38	1500	0.6725	0.3356
0.7548	10.0	1600	0.6759	0.3318
0.7839	10.62	1700	0.6773	0.3286
0.7912	11.25	1800	0.6748	0.3286
0.8238	11.88	1900	0.6735	0.3297
0.7618	12.5	2000	0.6728	0.3286

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0