eastwind
/

tinymix-8x1b-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tinymix-8x1b-chat / README.md

eastwind's picture

Update README.md

060f027 10 months ago

|

history blame contribute delete

1.77 kB

	---
	license: apache-2.0
	language:
	- en
	---
	<div align="center">

	# TinyMix-8x1b-Chat
	</div>

	This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)

	The Goal was to MoE-fy the TinyLlama model and then use this as a base model to finetune from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.

	More work coming!

	# Chat Template
	```
	def make_prompt(instruction):
	return f"<\|im_start\|>user\n{instruction}<\|im_end\|>\n<\|im_start\|>assistant\n"

	llm.generate(make_prompt('What is quantum tunneling?'))
	```

	## Mergekit Config
	```
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts: [""]
	```

	# Eval
	Thanks to u/mhenrichsen for the HellaSwag score

	```
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|

	\|---------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|

	\|hellaswag\|Yaml \|none \| 0\|acc \|0.4657\|± \|0.0050\|

	\| \| \|none \| 0\|acc\_norm\|0.6042\|± \|0.0049\|
	```