pszemraj commited on
Commit
a0c273b
1 Parent(s): 900c716

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -48
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
  license: apache-2.0
3
  base_model: Qwen/Qwen2-1.5B
4
- tags:
5
- - generated_from_trainer
6
  metrics:
7
  - accuracy
8
  datasets:
9
  - BEE-spoke-data/stepbasin-books
 
 
10
  ---
11
 
12
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pszemraj/long-generation-tests/runs/ethp25f9)
@@ -15,52 +15,9 @@ datasets:
15
  > [!IMPORTANT]
16
  > this was finetuned at 16384 context length
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B) on the BEE-spoke-data/stepbasin-books dataset.
 
19
  It achieves the following results on the evaluation set:
20
  - Loss: 2.8110
21
  - Accuracy: 0.4298
22
- - Num Input Tokens Seen: 44040192
23
-
24
- ## Model description
25
-
26
- More information needed
27
-
28
- ## Intended uses & limitations
29
-
30
- More information needed
31
-
32
- ## Training and evaluation data
33
-
34
- More information needed
35
-
36
- ## Training procedure
37
-
38
- ### Training hyperparameters
39
-
40
- The following hyperparameters were used during training:
41
- - learning_rate: 3e-05
42
- - train_batch_size: 1
43
- - eval_batch_size: 1
44
- - seed: 80085
45
- - gradient_accumulation_steps: 32
46
- - total_train_batch_size: 32
47
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
- - lr_scheduler_type: cosine
49
- - lr_scheduler_warmup_ratio: 0.05
50
- - num_epochs: 3.0
51
-
52
- ### Training results
53
-
54
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | Input Tokens Seen |
55
- |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|
56
- | 2.7792 | 0.9967 | 28 | 2.8183 | 0.4287 | 14729216 |
57
- | 2.6971 | 1.9933 | 56 | 2.8112 | 0.4297 | 29458432 |
58
- | 2.7116 | 2.9900 | 84 | 2.8110 | 0.4298 | 44040192 |
59
-
60
-
61
- ### Framework versions
62
-
63
- - Transformers 4.42.4
64
- - Pytorch 2.3.1+cu121
65
- - Datasets 2.20.0
66
- - Tokenizers 0.19.1
 
1
  ---
2
  license: apache-2.0
3
  base_model: Qwen/Qwen2-1.5B
 
 
4
  metrics:
5
  - accuracy
6
  datasets:
7
  - BEE-spoke-data/stepbasin-books
8
+ language:
9
+ - en
10
  ---
11
 
12
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pszemraj/long-generation-tests/runs/ethp25f9)
 
15
  > [!IMPORTANT]
16
  > this was finetuned at 16384 context length
17
 
18
+ This model is a fine-tuned version of [Qwen/Qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B) on https://github.com/stepbasin/books/tree/master/books
19
+
20
  It achieves the following results on the evaluation set:
21
  - Loss: 2.8110
22
  - Accuracy: 0.4298
23
+ - Num Input Tokens Seen: 44040192