Update README.md

94a5508 verified 2 months ago

4.55 kB

	---
	library_name: transformers
	tags:
	- language-model
	- fine-tuned
	- instruction-following
	- PEFT
	- LoRA
	- BitsAndBytes
	- Persian
	- Farsi
	- text-generation
	datasets:
	- taesiri/TinyStories-Farsi
	model_name: LLaMA-3.1-8B-Persian-Instruct
	pipeline_tag: text-generation
	---


	# LLaMA-3.1-8B-Persian-Instruct

	This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language.

	## Model Details

	### Model Description

	This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way.

	- Developed by: Meta AI
	- Model type: Language Model
	- License: Apache 2.0
	- Base Model: `meta-llama/Meta-Llama-3.1-8B-Instruct`

	### Model Sources

	- Repository: [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)

	## Training Details

	### Training Data
	The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian.

	### Training Procedure
	The fine-tuning process was conducted using the following setup:

	- Epochs: 4
	- Batch Size: 8
	- Gradient Accumulation Steps: 2
	- Hardware: NVIDIA A100 GPU

	### Fine-Tuning Strategy

	To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required.

	## Uses

	### Direct Use

	This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed.

	### Out-of-Scope Use

	The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian.

	## How to Get Started with the Model

	Here is how you can use this model:

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Define the base model and the adapter model
	base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
	adapter_model = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft"

	# Load the base model and apply the adapter model using PEFT
	model = AutoModelForCausalLM.from_pretrained(base_model, device_map={"": 0})
	model = PeftModel.from_pretrained(model, adapter_model)

	# Check if CUDA is available, otherwise use CPU
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = model.to(device)

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model)

	# Add a new pad token if necessary
	if tokenizer.pad_token is None:
	tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # Adding a distinct pad token

	# Example usage
	input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟"

	# Tokenize the input and get both input IDs and attention mask
	inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
	input_ids = inputs['input_ids'].to(device)
	attention_mask = inputs['attention_mask'].to(device)

	# Generate text
	outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=512, pad_token_id=tokenizer.pad_token_id)

	# Decode and print the output
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```