Edit model card

Visualize in Weights & Biases

FinGEITje DPO Banner

🐐 FinGEITje 7B DPO

A large open Dutch financial language model aligned through AI feedback.

This model is a fine-tuned version of snoels/FinGEITje-7B-sft on the BramVanroy/ultra_feedback_dutch dataset.

πŸ“– Model Description

FinGEITje-7B-dpo is a large open Dutch financial language model with 7 billion parameters, based on Mistral 7B. It has been further trained using Direct Preference Optimization (DPO) on AI-generated preference data, aligning the model's responses with human-like preferences in the Dutch language. This alignment process enhances the model's ability to generate more helpful, coherent, and user-aligned responses in financial contexts.

πŸ“Š Training

Training Data

FinGEITje-7B-dpo was fine-tuned on the BramVanroy/ultra_feedback_dutch dataset, which consists of synthetic preference data in Dutch. This dataset includes prompts along with preferred and less preferred responses, allowing the model to learn to generate more aligned responses through DPO.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1029 0.1327 100 0.1099 -1.8067 -5.3683 0.9679 3.5616 -892.3373 -579.9115 -2.4775 -2.3705
0.042 0.2654 200 0.0430 -3.5129 -10.6778 0.9828 7.1649 -1423.2883 -750.5289 -1.9744 -1.9895
0.0278 0.3981 300 0.0344 -3.7335 -13.5153 0.9828 9.7818 -1707.0360 -772.5893 -1.7454 -1.8191
0.0223 0.5308 400 0.0308 -3.6554 -13.7712 0.9858 10.1158 -1732.6289 -764.7831 -1.8020 -1.9184
0.0378 0.6635 500 0.0297 -4.0018 -16.3285 0.9851 12.3266 -1988.3542 -799.4221 -1.6924 -1.8650
0.0352 0.7962 600 0.0278 -3.8104 -15.6430 0.9836 11.8327 -1919.8119 -780.2752 -1.7437 -1.8978
0.0238 0.9289 700 0.0279 -3.8974 -15.9642 0.9828 12.0668 -1951.9310 -788.9780 -1.7371 -1.8937

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.4
  • Pytorch 2.3.1
  • Datasets 2.20.0
  • Tokenizers 0.19.1

πŸ› οΈ How to Use

FinGEITje-7B-dpo can be utilized using the Hugging Face Transformers library along with PEFT to load the adapters efficiently.

Installation

Ensure you have the necessary libraries installed:

pip install torch transformers peft accelerate

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("BramVanroy/GEITje-7B-ultra", use_fast=False)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained("BramVanroy/GEITje-7B-ultra", device_map='auto')

# Load the FinGEITje-7B-dpo model with PEFT adapters
model = PeftModel.from_pretrained(base_model, "snoels/FinGEITje-7B-dpo", device_map='auto')

Generating Text

# Prepare the input
input_text = "Wat zijn de laatste trends in de Nederlandse banksector?"
input_ids = tokenizer.encode(input_text, return_tensors='pt').to(model.device)

# Generate a response
outputs = model.generate(input_ids, max_length=200, num_return_sequences=1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

πŸ™ Acknowledgements

We would like to thank:

πŸ“ Citation

Link to the paper

If you use FinGEITje-7B-dpo in your work, please cite:

@article{FinGEITje2024,
  title={A Dutch Financial Large Language Model},
  author={Noels, Sander and De Blaere, Jorne and De Bie, Tijl},
  journal={arXiv preprint arXiv:2410.12835},
  year={2024},
  url={https://arxiv.org/abs/2410.12835}
}

πŸ“œ License

This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

πŸ“§ Contact

For any inquiries or questions, please contact Sander Noels.

Downloads last month
4
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for snoels/FinGEITje-7B-dpo

Dataset used to train snoels/FinGEITje-7B-dpo