go-bruins-v2.1.1 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
abdd04c verified
|
raw
history blame
3.88 kB
metadata
license: cc
model-index:
  - name: go-bruins-v2.1.1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 72.87
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rwitz2/go-bruins-v2.1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 88.33
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rwitz2/go-bruins-v2.1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 65.18
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rwitz2/go-bruins-v2.1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 69.8
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rwitz2/go-bruins-v2.1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 82.24
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rwitz2/go-bruins-v2.1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 71.27
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rwitz2/go-bruins-v2.1.1
          name: Open LLM Leaderboard

jan-hq/trinity-v1 DPO-trained on Intel/orca_dpo_pairs

#1 Model on the Leaderboard of ANY SIZE 12/16/2023

12/18 Update: Some of the datasets used to create the model I fine-tuned may have been contaminated. I am doing my best to remove thie contamination in future models. Thanks for your patience. Contains traces of Cybertron-2:

  title={Cybertron: Uniform Neural Alignment}, 
  author={Xavier Murias},
  year={2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
  howpublished = {\url{https://huggingface.co/fblgit/una-cybertron-7b-v2-bf16}},
}```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rwitz2__go-bruins-v2.1.1)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |74.95|
|AI2 Reasoning Challenge (25-Shot)|72.87|
|HellaSwag (10-Shot)              |88.33|
|MMLU (5-Shot)                    |65.18|
|TruthfulQA (0-shot)              |69.80|
|Winogrande (5-shot)              |82.24|
|GSM8k (5-shot)                   |71.27|