Inputs recorded at different sampling rate generate drastically different embeddings?

#15
by anuragrawal - opened

Hi @speechbrainteam ,

I am using this model to generate speaker embeddings for one of my projects. I know that the model was trained on audios sampled at 16khz sampling rate and mono channel. My audios are recorded at 44.1khz. I am seeing drastically different outputs for when I down sample my 44.1k audios to 16k vs recording at 16k sampling rate. Outputs are much better when I record audios at 16k vs down sampling from 44.1k to 16k. Have you experienced this scenario before?

I am trying to establish if recording at 16k would be really beneficial. I have done some experiments but it's not easy to capture two exactly identical audios, one at 16k sampling rate and one at 44.1k sampling rate so I am reaching out here for your feedback. Please let me know if you need anything else.

Thanks!
Anurag Agrawal

Sign up or log in to comment