Update README.md

eda3926 verified 2 months ago

4.66 kB

	---
	library_name: transformers
	tags:
	- language-model
	- fine-tuned
	- instruction-following
	- PEFT
	- LoRA
	- BitsAndBytes
	- Persian
	- Farsi
	- text-generation
	datasets:
	- taesiri/TinyStories-Farsi
	model_name: LLaMA-3.1-8B-Persian-Instruct
	pipeline_tag: text-generation
	---


	# LLaMA-3.1-8B-Persian-Instruct

	This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language.

	## Model Details

	### Model Description

	The `LLaMA-3.1-8B-Persian-Instruct` model is part of the LLaMA series known for its robust performance across various NLP tasks. This version is adapted to Persian, making it more effective for generating coherent and contextually relevant responses in this language.

	- Developed by: Meta AI, fine-tuned by Amir Mohseni
	- Model type: Language Model
	- Language(s) (NLP): Persian (Farsi)
	- License: Apache 2.0
	- Finetuned from model: `meta-llama/Meta-Llama-3.1-8B-Instruct`

	### Model Sources

	- Repository: [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)

	## Training Details

	### Training Data
	The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian.

	### Training Procedure
	The fine-tuning process was conducted using the following setup:

	- Epochs: 4
	- Batch Size: 8
	- Gradient Accumulation Steps: 2
	- Hardware: NVIDIA A100 GPU

	### Fine-Tuning Strategy

	To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required.

	## Uses

	### Direct Use

	This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed.

	### Out-of-Scope Use

	The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian.

	## How to Get Started with the Model

	Here is how you can use this model:

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Define the base model and the adapter model
	base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
	adapter_model = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft"

	# Load the base model and apply the adapter model using PEFT
	model = AutoModelForCausalLM.from_pretrained(base_model, device_map={"": 0})
	model = PeftModel.from_pretrained(model, adapter_model)

	# Check if CUDA is available, otherwise use CPU
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = model.to(device)

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model)

	# Add a new pad token if necessary
	if tokenizer.pad_token is None:
	tokenizer.add_special_tokens({'pad_token': '[PAD]'}) # Adding a distinct pad token

	# Example usage
	input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟"

	# Tokenize the input and get both input IDs and attention mask
	inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
	input_ids = inputs['input_ids'].to(device)
	attention_mask = inputs['attention_mask'].to(device)

	# Generate text
	outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=512, pad_token_id=tokenizer.pad_token_id)

	# Decode and print the output
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```