the chinese training data of the model is contaminated

#35
by bookwoods123 - opened

I have tested many long audio recordings that are over half an hour long, the text contains many of the following fields, which is not present in the original audio
请不吝点赞 订阅 转发 打赏支持明镜与点点栏目
字幕志愿者 杨茜茜优优独播剧场

Sign up or log in to comment