xmanii
/

Llama3-8b-simorgh

Model card Files Files and versions Community

xmanii commited on Jun 17

Commit

0f6287a

•

1 Parent(s): a395dca

Update README.md

Files changed (1) hide show

README.md +45 -22

README.md CHANGED Viewed

@@ -1,22 +1,45 @@
----
-base_model: unsloth/llama-3-8b-instruct-bnb-4bit
-language:
-- en
-license: apache-2.0
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- llama
-- trl
----
-# Uploaded  model
-- **Developed by:** xmanii
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/llama-3-8b-instruct-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

+Model Information
+Developed by: xmanii License: Apache-2.0 Finetuned from model: unsloth/llama-3-8b-instruct-bnb-4bit
+This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
+<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
+This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
+Using Adapters with Unsloth
+To run the model with adapters, you can use the following code:
+(you need unsloth package)
+import torch
+from unsloth import FastLanguageModel
+from unsloth.chat_templates import get_chat_template
+model_save_path = "path to the download folder"  # Adjust this path as needed
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name=model_save_path,
+    max_seq_length=4096,
+    load_in_4bit=True,
+)
+FastLanguageModel.for_inference(model)  # Enable native 2x faster inference
+tokenizer = get_chat_template(
+    tokenizer,
+    chat_template="llama-3",  # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
+    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # ShareGPT style
+)
+messages = [    {"from": "human", "value": "your prompt"},]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,  # Must add for generation
+    return_tensors="pt",
+).to("cuda")
+outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
+response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+print(response)
+We are working on quantizing the models and bringing them to ollama.