good

#1
by McUH - opened

I think it achieves the goal. Plays roleplay more interestingly than basic CommandR 2024 32B and is not so lewd and horny as Star Command. I tried the exl2 4bpw quant and it works Ok, though it would be nice to have some GGUF's with higher precision (like Q6).

I can make a Q6 soon, if one of the GGUF quantizers doesn't pick it up before then.

Is iMatrix worth using on Q6 these days? Guess I'll see...

Not sure about imatrix and Q6. From what I have heard it produces different results but is it worth it/better? Generally I only use imatrix quants up to 4bit.
I tried the exl2 4bpw a bit more, it is not bad, but it is very inconsistent, contradicting itself often, sometimes even within one message. Don't know if it is because of such low quant or if it is general problem with this Star Command finetune. Either way it is unfortunate because the exl2 4bpw quant could in theory be used for 60-80k context with 24GB VRAM, but if it can't really follow even 8k, then long context is not very useful.

Is the regular command R at 4bpw working better in that respect?

I've been trying both back to back, still not sure yet. I feel like both mess up in different areas, and like you said it may be due to the extreme quantization.

One thing I've found, btw, is that command R likes extremely low temperature, especially if you use quadratic smoothing or something. It gets non deterministic even below 0.1, though I'm not sure about an optimal spot or anything.

Another note, the HF to GGUF conversion script errors out with this model for some reason:

ValueError: Can not map tensor 'lm_head.weight'

But not with regular command R?

There's also another quirk I discovered earlier where this raw model is a few gigabytes larger than regular Command R. It seemed to quantize to exl2 find and end up at the same size, so I wrote it off... but now I'm not so sure. A linear merge seems to have the same result, as does manually specifying a tokenizer source.

Something might be messed up with mergekit and Command-R, not sure yet.

Yeah, you are probably right, checking notes from my original testing of c4ai-command-r-08-2024 32B Q6_K_L I have there "seems not very smart".
I did not try low temperature (actually later I went to experiment with higher temperature as base commandr 2024 often just gets stuck in scene, but yes, it makes it even more chaotic). But maybe low temperature can work for Star Command / lite as they are no longer so dry and repetitive as base CommandR 2024.

I no longer use quadratic smoothing. In general I try not to mess with token distribution much nowadays (except temperature, low minp like 0.02 to remove tail and DRY). As someone pointed out, the models train hard for long time on insane HW to learn token predictions. Simple sampler function that changes distribution is not going to improve it but more likely just mess with what they learned.

Simple sampler function that changes distribution is not going to improve it but more likely just mess with what they learned.

I tend to agree with this. That being said there's no "true" distribution as sampling is largely picking the not-most-likely token to keep it from looping... but now that you say it, I will try skipping the distribution warping.

But yes, I find this model does not like a lot of rep penalty, not a lot of temperature (I am using 0.05 for short completions atm). Unfortunately I am not using DRY atm, as text-gen-web-ui is mega laggy at long context :(

Sign up or log in to comment