xmanii commited on
Commit
0f6287a
1 Parent(s): a395dca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -22
README.md CHANGED
@@ -1,22 +1,45 @@
1
- ---
2
- base_model: unsloth/llama-3-8b-instruct-bnb-4bit
3
- language:
4
- - en
5
- license: apache-2.0
6
- tags:
7
- - text-generation-inference
8
- - transformers
9
- - unsloth
10
- - llama
11
- - trl
12
- ---
13
-
14
- # Uploaded model
15
-
16
- - **Developed by:** xmanii
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/llama-3-8b-instruct-bnb-4bit
19
-
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
-
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Model Information
2
+
3
+ Developed by: xmanii License: Apache-2.0 Finetuned from model: unsloth/llama-3-8b-instruct-bnb-4bit
4
+
5
+ This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
6
+
7
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
8
+
9
+ This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
10
+
11
+ Using Adapters with Unsloth
12
+ To run the model with adapters, you can use the following code:
13
+ (you need unsloth package)
14
+ import torch
15
+ from unsloth import FastLanguageModel
16
+ from unsloth.chat_templates import get_chat_template
17
+
18
+ model_save_path = "path to the download folder" # Adjust this path as needed
19
+
20
+ model, tokenizer = FastLanguageModel.from_pretrained(
21
+ model_name=model_save_path,
22
+ max_seq_length=4096,
23
+ load_in_4bit=True,
24
+ )
25
+ FastLanguageModel.for_inference(model) # Enable native 2x faster inference
26
+
27
+ tokenizer = get_chat_template(
28
+ tokenizer,
29
+ chat_template="llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
30
+ mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style
31
+ )
32
+
33
+ messages = [ {"from": "human", "value": "your prompt"},]
34
+ inputs = tokenizer.apply_chat_template(
35
+ messages,
36
+ tokenize=True,
37
+ add_generation_prompt=True, # Must add for generation
38
+ return_tensors="pt",
39
+ ).to("cuda")
40
+
41
+ outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
42
+ response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
43
+ print(response)
44
+
45
+ We are working on quantizing the models and bringing them to ollama.