|
--- |
|
base_model: |
|
- maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 |
|
- beomi/Llama-3-KoEn-8B-Instruct-preview |
|
- asiansoul/Llama-3-Open-Ko-Linear-8B |
|
- NousResearch/Meta-Llama-3-8B |
|
- NousResearch/Meta-Llama-3-8B-Instruct |
|
- ajibawa-2023/Code-Llama-3-8B |
|
- defog/llama-3-sqlcoder-8b |
|
- NousResearch/Hermes-2-Pro-Llama-3-8B |
|
- Locutusque/llama-3-neural-chat-v2.2-8B |
|
- asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1 |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
# Joah-Llama-3-KoEn-8B-Coder-v2 |
|
|
|
<a href="https://ibb.co/k8hmBF4"><img src="https://i.ibb.co/J7z3tPv/Screenshot-2024-05-11-at-7-48-08-PM.png" alt="Screenshot-2024-05-11-at-7-48-08-PM" border="0"></a> |
|
|
|
"A cool merge model with swag" |
|
|
|
"Joah" by AsianSoul |
|
|
|
Soon Multi Language Model Merge based on this. First German Start (Korean / English / German) π |
|
|
|
Where to use Joah : Medical, Korean, English, Translation, Code, Science... π₯ |
|
|
|
<u>Strengthened SQL code & Other Sci compared to V1</u> |
|
|
|
## π‘ Merge Details |
|
|
|
|
|
The performance of this merge model doesn't seem to be bad though.-> Just opinion ^^ ποΈ |
|
|
|
This may not be a model that satisfies you. But if we continue to overcome our shortcomings, |
|
|
|
Won't we someday find the answer we want? |
|
|
|
Don't worry even if you don't get the results you want. |
|
|
|
I'll find the answer for you. |
|
|
|
Soon real PoSE to extend Llama's context length to 64k with using my merge method : [reborn](https://medium.com/@puffanddmx82/reborn-elevating-model-adaptation-with-merging-for-superior-nlp-performance-f604e8e307b2) |
|
|
|
I have found that most of merge's model outside so far do not actually have 64k in their configs. I will improve it in the next merge with my reborn. If that doesn't work, I guess I'll have to find another way, right? |
|
|
|
256k is not possible. My computer is running out of memory. |
|
|
|
If you support me, i will try it on a computer with maximum specifications, also, i would like to conduct great tests by building a network with high-capacity traffic and high-speed 10G speeds for you. |
|
|
|
### π§Ά Merge Method |
|
|
|
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) as a base. |
|
|
|
### π Models Merged |
|
|
|
The following models were included in the merge: |
|
* [maum-ai/Llama-3-MAAL-8B-Instruct-v0.1](https://huggingface.co/maum-ai/Llama-3-MAAL-8B-Instruct-v0.1) |
|
* [beomi/Llama-3-KoEn-8B-Instruct-preview](https://huggingface.co/beomi/Llama-3-KoEn-8B-Instruct-preview) |
|
* [asiansoul/Llama-3-Open-Ko-Linear-8B](https://huggingface.co/asiansoul/Llama-3-Open-Ko-Linear-8B) |
|
* [NousResearch/Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct) |
|
* [ajibawa-2023/Code-Llama-3-8B](https://huggingface.co/ajibawa-2023/Code-Llama-3-8B) |
|
* [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) |
|
* [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) |
|
* [Locutusque/llama-3-neural-chat-v2.2-8B](https://huggingface.co/Locutusque/llama-3-neural-chat-v2.2-8B) |
|
* [asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1](https://huggingface.co/asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1) |
|
|
|
### πΉ Ollama |
|
|
|
Modelfile_Q5_K_M |
|
|
|
``` |
|
FROM joah-llama-3-koen-8b-coder-v2-Q5_K_M.gguf |
|
TEMPLATE """ |
|
{{- if .System }} |
|
system |
|
<s>{{ .System }}</s> |
|
{{- end }} |
|
user |
|
<s>Human: |
|
{{ .Prompt }}</s> |
|
assistant |
|
<s>Assistant: |
|
""" |
|
|
|
SYSTEM """ |
|
μΉμ ν μ±λ΄μΌλ‘μ μλλ°©μ μμ²μ μ΅λν μμΈνκ³ μΉμ νκ² λ΅νμ. λͺ¨λ λλ΅μ νκ΅μ΄(Korean)μΌλ‘ λλ΅ν΄μ€. |
|
""" |
|
|
|
PARAMETER temperature 0.7 |
|
PARAMETER num_predict 3000 |
|
PARAMETER num_ctx 4096 |
|
PARAMETER stop "<s>" |
|
PARAMETER stop "</s>" |
|
``` |
|
|
|
``` |
|
ollama create joah -f ./Modelfile_Q5_K_M |
|
``` |
|
|
|
Modelfile_Q5_K_M default, i hope you to test many upload file for my repo to change that and create ollama |
|
|
|
### π Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
models: |
|
- model: NousResearch/Meta-Llama-3-8B |
|
# Base model providing a general foundation without specific parameters |
|
|
|
- model: NousResearch/Meta-Llama-3-8B-Instruct |
|
parameters: |
|
density: 0.60 |
|
weight: 0.25 |
|
|
|
- model: beomi/Llama-3-KoEn-8B-Instruct-preview |
|
parameters: |
|
density: 0.55 |
|
weight: 0.15 |
|
|
|
- model: asiansoul/Llama-3-Open-Ko-Linear-8B |
|
parameters: |
|
density: 0.55 |
|
weight: 0.1 |
|
|
|
- model: maum-ai/Llama-3-MAAL-8B-Instruct-v0.1 |
|
parameters: |
|
density: 0.55 |
|
weight: 0.1 |
|
|
|
- model: asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1 |
|
parameters: |
|
density: 0.55 |
|
weight: 0.2 |
|
|
|
- model: ajibawa-2023/Code-Llama-3-8B |
|
parameters: |
|
density: 0.55 |
|
weight: 0.05 |
|
|
|
- model: defog/llama-3-sqlcoder-8b |
|
parameters: |
|
density: 0.55 |
|
weight: 0.1 |
|
|
|
- model: Locutusque/llama-3-neural-chat-v2.2-8B |
|
parameters: |
|
density: 0.55 |
|
weight: 0.1 |
|
|
|
- model: NousResearch/Hermes-2-Pro-Llama-3-8B |
|
parameters: |
|
density: 0.55 |
|
weight: 0.05 |
|
|
|
merge_method: dare_ties |
|
base_model: NousResearch/Meta-Llama-3-8B |
|
parameters: |
|
int8_mask: true |
|
dtype: bfloat16 |
|
|
|
|
|
``` |