metadata

library_name: transformers
tags:
  - language-model
  - fine-tuned
  - instruction-following
  - PEFT
  - LoRA
  - BitsAndBytes
  - Persian
  - Farsi
  - text-generation
datasets:
  - taesiri/TinyStories-Farsi
model_name: LLaMA-3.1-8B-Persian-Instruct
pipeline_tag: text-generation

LLaMA-3.1-8B-Persian-Instruct

This model is a fine-tuned version of the meta-llama/Meta-Llama-3.1-8B-Instruct model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the TinyStories-Farsi dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language.

Model Details

Model Description

This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way.

Developed by: Meta AI
Model type: Language Model
License: Apache 2.0
Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct

Model Sources

Repository: Llama-3.1-8B-Instruct on Hugging Face

Training Details

Training Data

The model was fine-tuned using the TinyStories-Farsi dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian.

Training Procedure

The fine-tuning process was conducted using the following setup:

Epochs: 4
Batch Size: 8
Gradient Accumulation Steps: 2
Hardware: NVIDIA A100 GPU

Fine-Tuning Strategy

To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the BitsAndBytesConfig(load_in_4bit=True) configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of BitsAndBytesConfig(load_in_4bit=True) helped reduce the environmental impact by minimizing the computational resources required.

Uses

Direct Use

This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed.

Out-of-Scope Use

The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian.

How to Get Started with the Model

Here is how you can use this model:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Define the base model and the adapter model
base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_model = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft"

# Load the base model and apply the adapter model using PEFT
model = AutoModelForCausalLM.from_pretrained(base_model, device_map={"": 0})
model = PeftModel.from_pretrained(model, adapter_model)

# Check if CUDA is available, otherwise use CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Add a new pad token if necessary
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})  # Adding a distinct pad token

# Example usage
input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟"

# Tokenize the input and get both input IDs and attention mask
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
input_ids = inputs['input_ids'].to(device)
attention_mask = inputs['attention_mask'].to(device)

# Generate text
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=512, pad_token_id=tokenizer.pad_token_id)

# Decode and print the output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)