File size: 2,815 Bytes

0f6287a
6107dec
 
 
 
0f6287a
6107dec
 
 
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
273c603
 
 
 
 
 
 
 
 
 
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
0f6287a
6107dec
 
 
 
0f6287a
00f4a22
6107dec
 
 
 
 
 
 
 
 
 
00f4a22
 
6107dec
 
00f4a22
6107dec
 
 
 
 
 
 
 
 
 
 
 
273c603
 
 
 
6107dec

model-index:
  - name: xmanii/llama-3-8b-instruct-bnb-4bit-persian
    description: |
      **Model Information**

      **Developed by:** xmanii
      **License:** Apache-2.0
      **Finetuned from model:** unsloth/llama-3-8b-instruct-bnb-4bit

      **Model Description**

      This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.

      ![Unsloth Made with Love](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)

      **Training Resources**

      * 2x H100 GPUs
      * Unsloth and Hugging Face's TRL library

      **Dataset**

      * Unique Persian dataset of Alpaca chat conversations
      * Approximately 8,000 rows

      **Open-Source Contribution**

      This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.

      **Using Adapters with Unsloth**

      To run the model with adapters, you can use the following code:

      ```python
      import torch
      from unsloth import FastLanguageModel
      from unsloth.chat_templates import get_chat_template

      model_save_path = "path to the download folder"  #the hugging face folder path pulled.

      model, tokenizer = FastLanguageModel.from_pretrained(
          model_name=model_save_path,
          max_seq_length=4096,
          load_in_4bit=True,
      )
      FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

      tokenizer = get_chat_template(
          tokenizer,
          chat_template="llama-3",  # use the llama-3 template
          mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # mapping the messages.
      )

      messages = [{"from": "human", "value": "your prompt"}]#add your prompt here as human
      inputs = tokenizer.apply_chat_template(
          messages,
          tokenize=True,
          add_generation_prompt=True,  # Must add for generation
          return_tensors="pt",
      ).to("cuda")

      outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
      response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
      print(response)
      ```

      **Full 16-bit Merged Model**

      For a full 16-bit merged model, please check out xmanii/Llama3-8b-simorgh-16bit.

      **Future Work**

      We are working on quantizing the models and bringing them to ollama.