ONNX Export

Are there plans to convert the model to onnx for easier handling in e.g. Vespa?

I tried to export the model using optimum but it fails during the graph optimisation.
optimum-cli export onnx --model jinaai/jina-colbert-v2 --trust-remote-code --task feature-extraction --opset 20 jina-colbert-v2

Using framework PyTorch: 2.4.1+cu12
...
Traceback (most recent call last):
  File ".../.venv/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File ".../.venv/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File ".../.venv/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 265, in run
    main_export(
  File ".../.venv/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 365, in main_export
    onnx_export_from_model(
  File ".../.venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 1170, in onnx_export_from_model
    _, onnx_outputs = export_models(
                      ^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 776, in export_models
    export(
  File ".../.venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 881, in export
    export_output = export_pytorch(
                    ^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 577, in export_pytorch
    onnx_export(
  File ".../.venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 551, in export
    _export(
  File ".../.venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 1648, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 1174, in _model_to_graph
    graph = _optimize_graph(
            ^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 656, in _optimize_graph
    _C._jit_pass_peephole(graph, True)
IndexError: Argument passed to at() was not in the map.

bwang0911

Jina AI org Sep 9

Hi @frederikschubert , yes, let me do it today or tomorrow

libryo-ai

Sep 11

I'm also interested in using this for Vespa, let me see if I can export then quantize. Thanks for sharing this model btw -- arguably the top retriever model currently available :)

jupyterjazz

Jina AI org Sep 17

Hi @frederikschubert , the issue was due to the dynamic nature of the RoPE implementation. I've just uploaded the ONNX weights. Let me know how it works for you.

frederikschubert

Sep 18

Great, thanks for your help! Would you mind sharing your conversion script? Either way, this is really helpful for us!

jupyterjazz

Jina AI org Sep 18

Sure, here it is:

import torch
from transformers import AutoModel, AutoTokenizer
import torch.onnx


model = AutoModel.from_pretrained('/home/admin/saba/jina-colbert-v2', trust_remote_code=True, use_flash_attn=False)
model.eval()

onnx_path =  '/home/admin/saba/jina-colbert-v2/onnx/model.onnx'

tokenizer = AutoTokenizer.from_pretrained('/home/admin/saba/jina-colbert-v2')
inputs = tokenizer(["jina", 'ai'], return_tensors="pt", padding='longest')
inps = inputs['input_ids']
mask = inputs['attention_mask']

torch.onnx.export(
    model,
    (inps, mask),
    onnx_path,
    export_params=True,
    do_constant_folding=True,
    input_names = ['input_ids', 'attention_mask'],
    output_names = ['text_embeds'],
    opset_version=16,
    dynamic_axes={
        'input_ids' : {0 : 'batch_size', 1: 'sequence_length'},
        'attention_mask' : {0 : 'batch_size', 1: 'sequence_length'},
        'text_embeds' : {0 : 'batch_size'}
    },
)

But the main challenge was to modify the model implementation to make it compatible with ONNX. That's why you were having the error.

frederikschubert

Sep 19

Great, thanks again! :)

frederikschubert changed discussion status to closed Sep 19

frederikschubert changed discussion status to open 3 days ago

frederikschubert

3 days ago

@jupyterjazz The exported onnx model in this repository works but the export script fails with the same error as above. I tried several ways of exporting the model but it seems that the RoPE embeddings still cause the export to fail.

jupyterjazz

Jina AI org about 12 hours ago

I have the modified implementation that I used for conversion separately, it's not public atm. Can I ask why you want to convert the model yourself?

frederikschubert

about 11 hours ago

•

edited about 11 hours ago

Alright! I am currently figuring out my general workflow of converting, optimizing and quantizing models and wanted to understand, why it fails for me. Would you mind sharing your diff? I can see why updating the public model for everyone might be a hurdle.

jupyterjazz

Jina AI org about 9 hours ago

Okay, we can share it. Here's the modified implementation: https://huggingface.co/jinaai/xlm-roberta-flash-implementation-onnx

The README includes all the changes I made and explains the reasoning behind them, as well as the conversion script.

A few notes:

The conversion script is written for the jina-embeddings-v3 model, which includes LoRA adapters. To make it work for ColBERT, you'll need to remove the task_id argument
After conversion, it will probably output several files, each containing a part of the model. You can combine them using the convert_model_to_external_data function

Hope this helps!

frederikschubert

about 9 hours ago

Perfect! :) Thank you for the quick reply and for the code!

frederikschubert changed discussion status to closed about 9 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment