Adding Evaluation Results

64b37ec verified 7 months ago

4.49 kB

	---
	language:
	- en
	license: apache-2.0
	model-index:
	- name: luxia-21.4b-alignment-v1.0
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 77.47
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saltlux/luxia-21.4b-alignment-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 91.88
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saltlux/luxia-21.4b-alignment-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 68.1
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saltlux/luxia-21.4b-alignment-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 79.17
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saltlux/luxia-21.4b-alignment-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 87.45
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saltlux/luxia-21.4b-alignment-v1.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 62.4
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saltlux/luxia-21.4b-alignment-v1.0
	name: Open LLM Leaderboard
	---

	# Introduction
	We introduce luxia-21.4b-alignment-v1.0, an instruction-tuned and alignment model based on luxia-21.4b.
	Please refer to the evaluation results table for details.

	# Instruction Fine-tuning Strategy
	We utilize state-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO)

	# Data Contamination Test Results
	Results will be updated soon.

	# Evaluation Results
	Results will be updated soon.


	# Usage Instructions

	### How to use
	```python
	# pip install transformers==4.35.2
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("saltlux/luxia-21.4b-alignment-v0.1")
	model = AutoModelForCausalLM.from_pretrained(
	"saltlux/luxia-21.4b-alignment-v0.1",
	device_map="auto",
	torch_dtype=torch.float16,
	)
	```

	### License
	- [saltlux/luxia-21.4b-alignment-v1.0](https://huggingface.co/saltlux/luxia-21.4b-alignment-v1.0): apache-2.0


	### Contact Us ###
	Any questions and suggestions are welcomed at the discussion tab.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_saltlux__luxia-21.4b-alignment-v1.0)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|77.74\|
	\|AI2 Reasoning Challenge (25-Shot)\|77.47\|
	\|HellaSwag (10-Shot) \|91.88\|
	\|MMLU (5-Shot) \|68.10\|
	\|TruthfulQA (0-shot) \|79.17\|
	\|Winogrande (5-shot) \|87.45\|
	\|GSM8k (5-shot) \|62.40\|