Will quantised version be available?

by angerhang - opened 5 days ago

Thanks for sharing but what are the recommended ways to quantise this model?
Or will quantised model be made available so that it is not as resource-intensive to do inference?

Thanks

victor

5 days ago

Did you see https://huggingface.co/models?other=base_model:quantized:nvidia/Llama-3.1-Nemotron-70B-Instruct-HF?
Use the model tree section on model pages to see what quantizations are available.

okuchaiev

NVIDIA org 4 days ago

NVIDIA hasn't released any quantized version yet. But there are several community quantization efforts mentioned above.

yangwang92

about 12 hours ago

we also provide quantized 4-1.5 bits version https://github.com/microsoft/VPTQ at here https://huggingface.co/collections/VPTQ-community/vptq-llama-31-nemotron-70b-instruct-hf-without-finetune-671730b96f16208d0b3fe942 . Feel free give us feedback!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment