Number of parameters

#9
by HugoLaurencon - opened

For my understanding, why is it callled a 7B if it has 8.54B parameters in the safetensors?

Because yes

@HugoLaurencon I think they are trying to compete with mistral-7b so they are faking the name to seem like its smaller than it actually is. Because the more popular model is a 7b parameter size. If google has a better explanation they can pitch in here.

Google org

(Disclaimer: I'm not from the Gemma development team, and this explanation is to the best of my understanding) The model itself contains close to 7B parameters. However the number you see on the model page on HF should also include the embeddings layer, which would add to the overall number (but is not strictly part of the model size). If you see the Mistral 7B model, it also has a small number of parameters above 7B on the HF page. However, the vocabulary for Gemma 7B is much larger (~8x), which would result in a larger number of params shown on HF

Ok thanks! I leave this open if people want to look at it but feel free to close.

Google org

CodeGemma (https://goo.gle/codegemma) uses the term "size class".

I think it's better to be represent it as an 8B model. Yes, Mistral-7B has small number of parameters above 7B(7.24) however we call it 7B due by rounding it off to 7B, i.e we remove the fractional part to its nearest neighbor. Similar case with llama-2 which has number of parameters lower than than 7B(6.7 iirc) but we round it off to 7B. :)

Sign up or log in to comment