mBART-large Russian to Chinese and Chinese to Russian multilingual machine translation
This model is a fine-tuned checkpoint of mBART-large-50-many-to-many-mmt. joefox/mbart-large-ru-zh-ru-many-to-many-mmt
is fine-tuned for ru-zh and zh-ru machine translation.
The model can translate directly between any pair of Russian and Chinese languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the forced_bos_token_id
parameter to the generate
method..
This post in Russian gives more details.
Example translate Russian to Chinese
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
src_text = "Съешь ещё этих мягких французских булок."
# translate Russian to Chinese
tokenizer.src_lang = "ru_RU"
encoded_ru = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(
**encoded_ru,
forced_bos_token_id=tokenizer.lang_code_to_id["zh_CN"]
)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#再吃一些法国面包。
and Example translate Chinese to Russian
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
src_text = "吃一些软的法国面包。"
# translate Chinese to Russian
tokenizer.src_lang = "zh_CN"
encoded_zh = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(
**encoded_zh,
forced_bos_token_id=tokenizer.lang_code_to_id["ru_RU"]
)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Ешьте немного французского хлеба.
Languages covered
Russian (ru_RU), Chinese (zh_CN)
- Downloads last month
- 151
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.