Does it work fine or needs fixes? Because of all the discussions on github... (That it's worse than previous models...)

I see that it works better after the corrections... but it's still probably not what it should be. In my tests, it performs chaotically (sometimes it solves puzzles that the previous L3 8b couldn't solve, and then it fails at something obvious that models generally don't have problems with).

It's weird... I just ran llama-server -v -ngl 0 -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -c 8192 . But this time DIDN'T even choose llama 3 promt style, only changed temperature to 0. And suddenly it started to answer hard questions correctly... Always gives me correct answer to this question:
I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?

PS:Snake game in python works too:) Also follows instructions to modify the game!

build\bin\llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 8196 --interactive -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0 --chat-template llama3


> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river? /
You start with 10 apples.

You find 3 gold coins in a river, so you now have 3 gold coins and still 10 apples.

You lose 4 apples but gain a gold coin, so you now have 11 gold coins and 6 apples.

Three birds drop 6 apples each, which is 18 apples. You now have 24 apples and 11 gold coins.

You play an online game and win 6 gold coins, but you have to share them equally with your teammates. This means that you get 2 of the 6 gold coins (since there are two teammates). So you now have 13 gold coins and still 24 apples.

The price of an apple is 0.5 coins, so you can buy 26 apples with all the coins you have (13 * 2 = 26).

You start with 24 apples and buy 26 more, which means that you now have 50 apples.

As for where the river is, it's not specified in the problem.

Weird, even q4_k_m gets this one right for me

Also tested q4 , q6 , q8 always answers are bad , used build in template and tried my own as well .

Here q4k_m

\llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 8196 --interactive -ngl 99 --simple-io --in-prefix "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant." -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0
llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 8196 --interactive -ngl 99 --simple-io --in-prefix "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant." -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0
this is what I get with Q4_K_M:


You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.user

I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?assistant

Let's break down the changes in your apple and gold coin counts step by step:

1. You start with 10 apples and find 3 gold coins.
2. You lose 4 apples, so you have 10 - 4 = 6 apples left.
3. You gain a gold coin, so you have 3 + 1 = 4 gold coins.
4. Three birds each drop 6 apples, so you get 3 x 6 = 18 apples. Now you have 6 + 18 = 24 apples.
5. You win 6 gold coins, but you have to share them equally with your 2 teammates. That means you get 6 / 3 = 2 gold coins. Now you have 4 + 2 = 6 gold coins.
6. You spend all your gold coins on apples. The price of an apple is 0.5 coins, so you can buy 6 / 0.5 = 12 apples.

So, after buying apples with all your gold coins, you have 24 (initial apples) + 12 (bought apples) = 36 apples.

As for the location of the river, it runs near a big city. Unfortunately, I don't have any information about a specific city that you mentioned, so I couldn't pinpoint the exact location of the river. [end of text]

Weird, even q4_k_m gets this one right for me

I already noticed that weirdness in several models. Many times, small quantized models perform better than q8 for example. I never could explain that behavior. 😮

Weird, even q4_k_m gets this one right for me

I already noticed that weirdness in several models. Many times, small quantized models perform better than q8 for example. I never could explain that behavior. 😮

For me q4k_m behave even worse

this is what I get with Q4_K_M:


You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.user

I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?assistant

Let's break down the changes in your apple and gold coin counts step by step:

1. You start with 10 apples and find 3 gold coins.
2. You lose 4 apples, so you have 10 - 4 = 6 apples left.
3. You gain a gold coin, so you have 3 + 1 = 4 gold coins.
4. Three birds each drop 6 apples, so you get 3 x 6 = 18 apples. Now you have 6 + 18 = 24 apples.
5. You win 6 gold coins, but you have to share them equally with your 2 teammates. That means you get 6 / 3 = 2 gold coins. Now you have 4 + 2 = 6 gold coins.
6. You spend all your gold coins on apples. The price of an apple is 0.5 coins, so you can buy 6 / 0.5 = 12 apples.

So, after buying apples with all your gold coins, you have 24 (initial apples) + 12 (bought apples) = 36 apples.

As for the location of the river, it runs near a big city. Unfortunately, I don't have any information about a specific city that you mentioned, so I couldn't pinpoint the exact location of the river. [end of text]

Why do you have "system " at the begging and at the end [end of text] ?

i just used ./llama-cli

./llama-cli -m /models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p <|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n -c 500 -ngl 50

build\bin\llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --repeat-penalty 1.1 --ctx-size 8196 --interactive -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0 --chat-template llama3


Did you read my comment? Probably not. I said i did NOT choose llama3 template. llama-server -v -ngl 0 -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -c 8192. Maybe you should try llama-server interface in browser.
All at default settings. At temperature 0, always 100% correct answer. I compare it to LMSYS Chatbot Arena, seems to perform equally...
This also worked for me:
llama-cli -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --temp 0 -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -c 500 -ngl 50 -c 8192 --conversation

I tested myself with build-in template and my own template using commands

Here tested with suggested command :

llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -c 500 -ngl 50 -c 8192 --conversation
Results are similar to mine - no bigger difference

Correct answers around 3-4 of 10 attempts
Using 10/10

Did you try at least 10 times in a row?
How results are then.

LM-Studio @ Q5_K_M - Temp 0

Let's break down the events step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river, but this doesn't affect your apple count.
3. You lose 4 apples, so you now have 6 apples.
4. The birds drop 6 apples each, which means they dropped a total of 18 apples. Now you have 24 apples (6 + 18).
5. You play the online game and win 12 gold coins (6 x 2 teammates). This doesn't affect your apple count directly.
6. With all the coins you have now (3 from the river, 1 gained after losing 4 apples, and 12 won in the game), you have a total of 16 gold coins.
7. You buy apples for all the coins you have. Since each apple costs 0.5 coins, you can buy 32 apples (16 x 2).

So, you now have 56 apples (24 + 32). As for where the river is, it's near a big city that has something to do with what you can spend the coins on. Unfortunately, this information isn't specific enough to pinpoint an exact location. However, based on your ability to play online games and buy apples, I'm going to take a guess that the city might be somewhere in the world where internet access is widespread and there are markets or stores where you can buy apples. If you'd like to provide more context or clarify what you meant by "something to do with what you can spend the coins on," I'd be happy to try and help further!

Thanks Bartowski, this updated build of L3.1 8b performed notably better for me than the others.

However, if anybody is experiencing performance issues I recommend adjusting the system prompt. I'm using "You're a helpful assistant.", which notable increased its test score over no system prompt (which I normally set during testing).

My theory is since L3.1 uses COT fine-tuning, which negatively impacts multi-shot evaluation scores, not setting the system prompt makes L3.1 behave more like a base than instruct model (e.g. doesn't follow instructions near as accurately so it can be led by the pattern set by multi-shot examples).

Nope .. those ggufs are obsolete .
We have a new ones corrected.

@mirek190 Thanks. I'll try out the corrected ones. Is there a gguf you recommend?

@mirek190 Thanks for the link. Did anyone verify that the perplexity wasn't negatively impacted by the rope changes?

Edit: I noticed you wrote

"perplexity even with --repeat-penalty 1.0 is still 78.00
before rope changes and gguf update was 80.50"

I will be waiting for this PR to be finalized before regenerating, as of now any made with that PR may be final implementation but also may not be

I will be waiting for this PR to be finalized before regenerating, as of now any made with that PR may be final implementation but also may not be

Current job seems not fully done yet. Is far more better now but still makes more errors that version on

"Current job seems not fully done yet. Is far more better now but still makes more errors that version on"

Seeing the same. It hallucinates less in response to the same prompts on LMsys.

I tested myself with build-in template and my own template using commands

Here tested with suggested command :

llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -c 500 -ngl 50 -c 8192 --conversation

UPDATE: WHERE is --temp 0? You did not copy my settings.
Very strange! I don't JUST get the right answer with these settings, but 100% ALWAYS right answer 36. And all other hard questions that i use, also usually answered correctly. Maybe i have to tell you all the steps i'm doing :) Like downloading latest llama.cpp release (, using Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf that i downloaded from here. What else? I don't know, it just works.

\llama-b3463-bin-win-vulkan-x64>llama-cli -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --temp 0 -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -c 500 -ngl 3 -c 8192 --conversation --multiline-input --color
I tested myself with build-in template and my own template using commands

Here tested with suggested command :

llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -c 500 -ngl 50 -c 8192 --conversation

UPDATE: WHERE is --temp 0? You did not copy my settings.
Very strange! I don't JUST get the right answer with these settings, but 100% ALWAYS right answer 36. And all other hard questions that i use, also usually answered correctly. Maybe i have to tell you all the steps i'm doing :) Like downloading latest llama.cpp release (, using Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf that i downloaded from here. What else? I don't know, it just works.

\llama-b3463-bin-win-vulkan-x64>llama-cli -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --temp 0 -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -c 500 -ngl 3 -c 8192 --conversation --multiline-input --color
That comment you are reffering is outdated.

model q8
Second after updates rope and new gguf with temp 0 answers are correct and the same like on groq,com BUT with temp 0.6 answers are a bit worse than I getting on where getting answers for

"If my BMI is 20.5 and my height is 172cm, how much would I weigh if I gained 5% of my current weight?"

"I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"

"Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Be concise."

ALWAYS correct ( has no temp 0 ) 10/10 correct
but locally something around 8-9/10 times are correct.

"Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Be concise."

Meta-Llama-3.1-8B-Instruct-Q4_K_M says "The limiting factor is the wax, so you can make 3 candles."
GPT-4o says "You can make 4 candles with 500 grams of wax and 3 wicks." (Ok, maybe not always but sometimes wrong answer. If the best model sometimes fails, why are we even judging llama 8b :))

"Making one candle requires 125 grams of wax and 1 wick. How many candles can I make with 500 grams of wax and 3 wicks? Be concise."

Meta-Llama-3.1-8B-Instruct-Q4_K_M says "The limiting factor is the wax, so you can make 3 candles."
GPT-4o says "You can make 4 candles with 500 grams of wax and 3 wicks." (Ok, maybe not always but sometimes wrong answer. If the best model sometimes fails, why are we even judging llama 8b :))

But om llama 3 8b always answering correctly that question ;)

Is llama 3.1 8b working for everyone after the llama.cpp rope fix?

I'm noticing a major issue. Namely, all llama 3.1 8b ggufs regularly ignore very obvious prompt directives.

This issue doesn't appear to be native to llama 3.1 8b since it doesn't occur on LMsys despite numerous attempts, but regularly occur with all Llama 3.1 8b GGUFs (pre and post rope fix), regardless of uploader or quantization (even Q8_0), and while using temp 0 or higher.

Anyways, I use scripts to combine text segments into unique learning prompts and I make it define a related term first (to help randomize the output) and end with requesting an interesting related fact. And about half the time it doesn't define the related term first, and periodically it doesn't end with an interesting related fact.

Example (didn't start with a related term): "Define denigrate and gi specialist, but only after first defining another related college-level term. Then end by sharing an interesting fact."

Example (didn't end with interesting fact): "Define chromosphere and feasibility study, but only after first defining another related college-level term. Then end by sharing an interesting related fact."

Does anybody have any idea what may be causing this? The other 8b class LLMs, including Mistral, Qwen2, Internlm, and Gemma 2 almost always get this right. The prompts are so short and simple, plus "but only after first defining..." and "Then end by sharing..." seem like very obvious directives.

I'm starting to think there's a compatibility issue between llama.cpp and llama 3.1 (and to a lesser degree llama 3), since other LLMs, including Qwen2 and Gemma 2, appear to behave comparably when using GGUFs vs providers, such as LMsys.

Plus others are reporting significant quality differences in Llama 3 8b Instruct implementations hosted by various providers, such as the example linked below, which could be because they're using quantized versions.


are you using llamacpp cli?
If yes show the command.
What template do you have?

I am using command for instance

llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 100000 --interactive -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 --chat-template llama3

@mirek190 Thanks, that was very helpful!

I was using GPT4All and Koboldcpp, but when I tried the cli with your recommended settings I only got one error after 8 runs (actually 16 since each was run at temp 0 and 0.6), which is in line with what's expected. I'll have to figure out how to make GPT4ALL work as well, but at least now I know it's not the GGUF.

In addition, this fixed another issue (information retrieval). For example, "Which two actresses played the two ex-wives of Alan Harper in Two and a Half Men?". It always got it wrong before, but after two tries with your settings (temp 0.6) using the cli it got it right.

first wrong

  1. Constance Zimmer (as Missi)
  2. Melanie Lynskey (as Rose)

then right

  1. Judith Melnick (played by Marin Hinkle)
  2. Kandi (played by April Bowlby)

Even when using LMsys and the cli with your settings Llama 3.1 8b Instruct regularly goes on a correcting and second guessing rant when asked about weakly held info. The following was from LMsys to the aforementioned prompt. I wonder if this is caused by COT fine-tuning.

"...The two actresses who played the ex-wives of Alan Harper are:

Susan Sarandon's character was not the ex-wife of Alan Harper.
Megan Fox was not an ex-wife of Alan Harper.
Judy (Maggie), played by Melanie Lynskey, was not one of the ex-wives of Alan Harper
Charlotte, played by Emmanuelle Vaugier, was not one of the ex-wives of Alan Harper
Kandi, played by Sarah Chalke, was not one of the ex-wives of Alan Harper
Walden's mother, played by Constance Zimmer, was not the ex-wife of Alan Harper

However, the two actresses who played the ex-wives of Alan Harper are:

Berta's nemesis, Rose, played by Melanie Lynskey was not an ex-wife
Walden's mother was not an ex-wife
Judy was Walden's mom
Kandi was not an ex-wife
The two women who were Alan's ex-wives are Judith and Kandi.

The two actresses who played the ex-wives of Alan Harper are:

Judith, played by Marin Hinkle
Kandi, played by Judy

However, another ex-wife of Alan Harper is Jacqueline who was played by Judy Greer"

You're welcome .

You're welcome .

i still don't understand... You just showed your llama-cli with "--chat-template llama3" at the end. But i'm using this promt template by bartowski
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
And when i'm asking same questions with your parameters, it's not giving correct answers... Everything same, just instead of "--chat-template llama3" i have -p "<|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\n\nToday Date: 26 Jul 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

I don't understand your question...

I don't understand your question...

i'm asking the model my usual reasoning questions, and with your settings it fails very often.

give examples ...

Sign up or log in to comment