Create GGUF for this please

#2
by ishanparihar - opened

Create GGUF for this please.

@LoneStriker Can you quantify the exl2 model?

@MaziyarPanahi @LoneStriker
Can you confirm the funtioning of these quants at your end?

Whenever I tested them, Q6_k and Q8, they spoke all gibberish.

Only the Q4k_m by original author works at the moment for me.

Thanks for your efforts to the open source community. πŸ’–

@MaziyarPanahi @LoneStriker
Can you confirm the funtioning of these quants at your end?

Whenever I tested them, Q6_k and Q8, they spoke all gibberish.

Only the Q4k_m by original author works at the moment for me.

Thanks for your efforts to the open source community. πŸ’–

Hi @ishanparihar
There are a lot of new changes introduced recently to MoE models by Llama.cpp. I think it's best if I do a new one with a new build and just test this again

Thanks @MaziyarPanahi for your prompt response and your intention to serve. Looking forward to the new builds. ⭐

My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5_k or q6_k.

Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.

https://huggingface.co/ZeroWw/MixTAO-7Bx2-MoE-v8.1-GGUF

Sign up or log in to comment