|
--- |
|
library_name: transformers |
|
tags: |
|
- language-model |
|
- fine-tuned |
|
- instruction-following |
|
- PEFT |
|
- LoRA |
|
- BitsAndBytes |
|
- Persian |
|
- Farsi |
|
- text-generation |
|
datasets: |
|
- taesiri/TinyStories-Farsi |
|
model_name: LLaMA-3.1-8B-Persian-Instruct |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
# LLaMA-3.1-8B-Persian-Instruct |
|
|
|
This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way. |
|
|
|
- **Developed by:** Meta AI |
|
- **Model type:** Language Model |
|
- **License:** Apache 2.0 |
|
- **Base Model:** `meta-llama/Meta-Llama-3.1-8B-Instruct` |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian. |
|
|
|
### Training Procedure |
|
The fine-tuning process was conducted using the following setup: |
|
|
|
- **Epochs:** 4 |
|
- **Batch Size:** 8 |
|
- **Gradient Accumulation Steps:** 2 |
|
- **Hardware:** NVIDIA A100 GPU |
|
|
|
### Fine-Tuning Strategy |
|
|
|
To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required. |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian. |
|
|
|
## How to Get Started with the Model |
|
|
|
Here is how you can use this model: |
|
|
|
```python |
|
from peft import PeftModel |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Define the base model and the adapter model |
|
base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct" |
|
adapter_model = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft" |
|
|
|
# Load the base model and apply the adapter model using PEFT |
|
model = AutoModelForCausalLM.from_pretrained(base_model, device_map={"": 0}) |
|
model = PeftModel.from_pretrained(model, adapter_model) |
|
|
|
# Check if CUDA is available, otherwise use CPU |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
model = model.to(device) |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
|
|
# Add a new pad token if necessary |
|
if tokenizer.pad_token is None: |
|
tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # Adding a distinct pad token |
|
|
|
# Example usage |
|
input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟" |
|
|
|
# Tokenize the input and get both input IDs and attention mask |
|
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True) |
|
input_ids = inputs['input_ids'].to(device) |
|
attention_mask = inputs['attention_mask'].to(device) |
|
|
|
# Generate text |
|
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=512, pad_token_id=tokenizer.pad_token_id) |
|
|
|
# Decode and print the output |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |