--- library_name: transformers tags: - language-model - fine-tuned - instruction-following - PEFT - LoRA - BitsAndBytes - Persian - Farsi - text-generation datasets: - taesiri/TinyStories-Farsi model_name: LLaMA-3.1-8B-Persian-Instruct pipeline_tag: text-generation --- # LLaMA-3.1-8B-Persian-Instruct This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language. ## Model Details ### Model Description This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way. - **Developed by:** Meta AI - **Model type:** Language Model - **License:** Apache 2.0 - **Base Model:** `meta-llama/Meta-Llama-3.1-8B-Instruct` ### Model Sources - **Repository:** [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) ## Training Details ### Training Data The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian. ### Training Procedure The fine-tuning process was conducted using the following setup: - **Epochs:** 4 - **Batch Size:** 8 - **Gradient Accumulation Steps:** 2 - **Hardware:** NVIDIA A100 GPU ### Fine-Tuning Strategy To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required. ## Uses ### Direct Use This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed. ### Out-of-Scope Use The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian. ## How to Get Started with the Model Here is how you can use this model: ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Define the base model and the adapter model base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct" adapter_model = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft" # Load the base model and apply the adapter model using PEFT model = AutoModelForCausalLM.from_pretrained(base_model, device_map={"": 0}) model = PeftModel.from_pretrained(model, adapter_model) # Check if CUDA is available, otherwise use CPU device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model) # Add a new pad token if necessary if tokenizer.pad_token is None: tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # Adding a distinct pad token # Example usage input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟" # Tokenize the input and get both input IDs and attention mask inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True) input_ids = inputs['input_ids'].to(device) attention_mask = inputs['attention_mask'].to(device) # Generate text outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=512, pad_token_id=tokenizer.pad_token_id) # Decode and print the output response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ```