asiansoul's picture
Update README.md
5e052cb verified
|
raw
history blame
4.38 kB
---
base_model:
- beomi/Llama-3-KoEn-8B-Instruct-preview
- Danielbrdz/Barcenas-Llama3-8b-ORPO
- maum-ai/Llama-3-MAAL-8B-Instruct-v0.1
- rombodawg/Llama-3-8B-Instruct-Coder
- NousResearch/Meta-Llama-3-8B-Instruct
- rombodawg/Llama-3-8B-Base-Coder-v3.5-10k
- cognitivecomputations/dolphin-2.9-llama3-8b
- asiansoul/Llama-3-Open-Ko-Linear-8B
- NousResearch/Meta-Llama-3-8B
- aaditya/Llama3-OpenBioLLM-8B
library_name: transformers
tags:
- mergekit
- merge
---
# Joah-Llama-3-KoEn-8B-Coder-v1
<a href="https://ibb.co/8XPkwP8"><img src="https://i.ibb.co/kMqZTqc/Joah.png" alt="Joah" border="0"></a><br />
였늘 λΆ€ν„° μ„œλ‘œμ—κ²Œ 빛이 λ˜μ–΄ 쀄 μ—¬λŸ¬λΆ„μ˜ Merge Model
"μ’‹μ•„(Joah)" by AsianSoul
Soon Multi Language Model Merge based on this. First German Start (Korean / English / German)
## Merge Details
The performance of this merge model doesn't seem to be bad though.-> Just opinion
This may not be a model that satisfies you. But if we continue to overcome our shortcomings,
Won't we someday find the answer we want?
Don't worry even if you don't get the results you want.
I'll find the answer for you.
Soon real PoSE to extend Llama's context length to 64k with using my merge method : [reborn](https://medium.com/@puffanddmx82/reborn-elevating-model-adaptation-with-merging-for-superior-nlp-performance-f604e8e307b2)
I have found that most of merge's model outside so far do not actually have 64k in their configs. I will improve it in the next merge with my reborn. If that doesn't work, I guess I'll have to find another way, right?
256k is not possible. My computer is running out of memory.
If you support me, i will try it on a computer with maximum specifications, also, i would like to conduct great tests by building a network with high-capacity traffic and high-speed 10G speeds for you.
### Merge Method
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) as a base.
### Models Merged
The following models were included in the merge:
* [beomi/Llama-3-KoEn-8B-Instruct-preview](https://huggingface.co/beomi/Llama-3-KoEn-8B-Instruct-preview)
* [Danielbrdz/Barcenas-Llama3-8b-ORPO](https://huggingface.co/Danielbrdz/Barcenas-Llama3-8b-ORPO)
* [maum-ai/Llama-3-MAAL-8B-Instruct-v0.1](https://huggingface.co/maum-ai/Llama-3-MAAL-8B-Instruct-v0.1)
* [rombodawg/Llama-3-8B-Instruct-Coder](https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder)
* [NousResearch/Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
* [rombodawg/Llama-3-8B-Base-Coder-v3.5-10k](https://huggingface.co/rombodawg/Llama-3-8B-Base-Coder-v3.5-10k)
* [cognitivecomputations/dolphin-2.9-llama3-8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b)
* [asiansoul/Llama-3-Open-Ko-Linear-8B](https://huggingface.co/asiansoul/Llama-3-Open-Ko-Linear-8B)
* [aaditya/Llama3-OpenBioLLM-8B](https://huggingface.co/aaditya/Llama3-OpenBioLLM-8B)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
models:
- model: NousResearch/Meta-Llama-3-8B
# Base model providing a general foundation without specific parameters
- model: NousResearch/Meta-Llama-3-8B-Instruct
parameters:
density: 0.60
weight: 0.25
- model: beomi/Llama-3-KoEn-8B-Instruct-preview
parameters:
density: 0.55
weight: 0.15
- model: asiansoul/Llama-3-Open-Ko-Linear-8B
parameters:
density: 0.55
weight: 0.2
- model: maum-ai/Llama-3-MAAL-8B-Instruct-v0.1
parameters:
density: 0.55
weight: 0.1
- model: rombodawg/Llama-3-8B-Instruct-Coder
parameters:
density: 0.55
weight: 0.1
- model: rombodawg/Llama-3-8B-Base-Coder-v3.5-10k
parameters:
density: 0.55
weight: 0.1
- model: cognitivecomputations/dolphin-2.9-llama3-8b
parameters:
density: 0.55
weight: 0.05
- model: Danielbrdz/Barcenas-Llama3-8b-ORPO
parameters:
density: 0.55
weight: 0.05
- model: aaditya/Llama3-OpenBioLLM-8B
parameters:
density: 0.55
weight: 0.1
merge_method: dare_ties
base_model: NousResearch/Meta-Llama-3-8B
parameters:
int8_mask: true
dtype: bfloat16
```