Transformers
llama
uncensored
wizard
vicuna

Loading the model

#3
by PyrroAiakid - opened

Forgive me if it's something very simple, but when I load the modlo I get an error.

Captura de pantalla 2023-08-15 212926.png

They say I have to put -gqa 8 but I don't know how to do it.

I'm running into issues loading this model too. Gotta love our super helpful community, right?

Why context length is 2048? Is it was cut on half? Base Llama 2 model have 4096 context length. If it's indeed 2048 then it's not the first time the model gets massacred like that.

It's just the GQA which should be 8 in 70B models. IF you multiply 1024 by 8 it will be 8192. Try adding -gqa 8 parameter or set gqa as 8.

Sign up or log in to comment