proper embedding for llama-2-7b-chat.Q4_K_M.gguf

by awarity-dev - opened

Hi there, I am trying to get this model working with a llama index vector store.

I can follow this doc, with modifications for a local vector store of some markdown files, and get good responses.
That uses this model: ""
with this embedding: "BAAI/bge-small-en-v1.5"

I see on the model card that llama-2-7b-chat.Q4_K_M.gguf is recommended. However when using that model with any of the "BAAI/bge-small-en-v1.5" variants I get dimension mis-size errors.
ValueError: shapes (1536,) and (768,) not aligned: 1536 (dim 0) != 768 (dim 0)

This occurs when pinging the URL, or loading a local copy of the model.
Non working example:

from llama_index.llms import LlamaCPP
model_url = ""
llm = LlamaCPP(
        # You can pass in the URL to a GGML model to download it automatically
        # optionally, you can set the path to a pre-downloaded model instead of model_url
        # model_path="./models/TheBloke/Llama-2-7B-chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf",
        # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
        # kwargs to pass to __call__()
        # kwargs to pass to __init__()
        # set to at least 1 to use GPU
        model_kwargs={"n_gpu_layers": 1},
        # transform inputs into Llama2 format
        # verbose=True,

Am I right that this is some sort of mismatch between the embedding model, the storage in a VectorStoreIndex, and the instantiation of the LLM version?

Again I am able to get the "llama-2-13b-chat.Q4_0.gguf" version to work, but not the "llama-2-7b-chat.Q4_K_M.gguf"

Thanks for reading.

How did you solve this?

Sign up or log in to comment