Melusine_103b / README.md
MarsupialAI's picture
Update README.md
d5d0a1d verified
|
raw
history blame
1.91 kB
metadata
language:
  - en
tags:
  - rp
  - erp
  - chat
  - miqu

Melusine 103b

image/jpeg

This model is a rotating-stack merge of three 70b models in a 103b (120 layer) configuration inspired by Venus 103b. All components are miqu-based, and the result appears to retain the long-context capabilities of the base model.

Component models for the rotating stack are

  • ShinojiResearch/Senku-70B-Full
  • Undi95/Miqu-70B-Alpaca-DPO
  • alchemonaut/QuartetAnemoi-70B-t0.0001

This model is mostly de-censored and is capable of generating objectionable material. Depending on prompts, remnants of the original censorship may pop up. Due to some of the constituent parts, extremely objectionable material may also be generated under certain circumstances. As with any LLM, no factual claims made by the model should be taken at face value. You know that boilerplate safety disclaimer that most professional models have? Assume this has it too. This model is for entertainment purposes only.

GGUFs:

Sample output

{{[INPUT]}}
Write a detailed and humorous story about a cute and fluffy bunny that goes to a Gwar concert.

Prompt format

Seems to have the strongest affinity for Alpaca and ChatML prompts.

WTF is a rotating-stack merge?

Inspired by Undi's experiments with stacked merges, Jeb Carter found that output quality and model initiative could be significantly improved by reversing the model order in the stack, and then doing a linear merge between the original and reversed stacks. That is what I did here. To preserve as much of the "smarts" and long-context-awareness from miqu as possible while still adding the flavor from the other models, there is effectively twice as much miqu as lzlv or aetheria. The exact merge configs can be found in the recipe.txt file.