vishalkatheriya18 commited on
Commit
053fce4
1 Parent(s): 9b897af

End of training

Browse files
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/convnextv2-tiny-1k-224
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - imagefolder
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: convnextv2-tiny-1k-224-finetuned-print
12
+ results:
13
+ - task:
14
+ name: Image Classification
15
+ type: image-classification
16
+ dataset:
17
+ name: imagefolder
18
+ type: imagefolder
19
+ config: default
20
+ split: train
21
+ args: default
22
+ metrics:
23
+ - name: Accuracy
24
+ type: accuracy
25
+ value: 0.9263157894736842
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ # convnextv2-tiny-1k-224-finetuned-print
32
+
33
+ This model is a fine-tuned version of [facebook/convnextv2-tiny-1k-224](https://huggingface.co/facebook/convnextv2-tiny-1k-224) on the imagefolder dataset.
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 0.1811
36
+ - Accuracy: 0.9263
37
+
38
+ ## Model description
39
+
40
+ More information needed
41
+
42
+ ## Intended uses & limitations
43
+
44
+ More information needed
45
+
46
+ ## Training and evaluation data
47
+
48
+ More information needed
49
+
50
+ ## Training procedure
51
+
52
+ ### Training hyperparameters
53
+
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 5e-05
56
+ - train_batch_size: 32
57
+ - eval_batch_size: 32
58
+ - seed: 42
59
+ - gradient_accumulation_steps: 4
60
+ - total_train_batch_size: 128
61
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
+ - lr_scheduler_type: linear
63
+ - lr_scheduler_warmup_ratio: 0.1
64
+ - num_epochs: 100
65
+
66
+ ### Training results
67
+
68
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
69
+ |:-------------:|:-------:|:----:|:---------------:|:--------:|
70
+ | No log | 0.8889 | 6 | 1.6653 | 0.1158 |
71
+ | 1.6915 | 1.9259 | 13 | 1.5942 | 0.2316 |
72
+ | 1.5895 | 2.9630 | 20 | 1.4918 | 0.3895 |
73
+ | 1.5895 | 4.0 | 27 | 1.3637 | 0.5053 |
74
+ | 1.433 | 4.8889 | 33 | 1.2241 | 0.6737 |
75
+ | 1.2158 | 5.9259 | 40 | 1.0279 | 0.7474 |
76
+ | 1.2158 | 6.9630 | 47 | 0.8303 | 0.8105 |
77
+ | 0.9438 | 8.0 | 54 | 0.6756 | 0.8211 |
78
+ | 0.727 | 8.8889 | 60 | 0.5751 | 0.8421 |
79
+ | 0.727 | 9.9259 | 67 | 0.4679 | 0.8632 |
80
+ | 0.5516 | 10.9630 | 74 | 0.4432 | 0.8526 |
81
+ | 0.4337 | 12.0 | 81 | 0.3815 | 0.8526 |
82
+ | 0.4337 | 12.8889 | 87 | 0.3641 | 0.8842 |
83
+ | 0.3757 | 13.9259 | 94 | 0.3233 | 0.8737 |
84
+ | 0.3017 | 14.9630 | 101 | 0.3230 | 0.9158 |
85
+ | 0.3017 | 16.0 | 108 | 0.3018 | 0.8842 |
86
+ | 0.2495 | 16.8889 | 114 | 0.3445 | 0.9053 |
87
+ | 0.2177 | 17.9259 | 121 | 0.2987 | 0.8947 |
88
+ | 0.2177 | 18.9630 | 128 | 0.2727 | 0.8947 |
89
+ | 0.1738 | 20.0 | 135 | 0.2865 | 0.8842 |
90
+ | 0.1572 | 20.8889 | 141 | 0.2646 | 0.9263 |
91
+ | 0.1572 | 21.9259 | 148 | 0.3100 | 0.9053 |
92
+ | 0.1165 | 22.9630 | 155 | 0.3039 | 0.9263 |
93
+ | 0.1057 | 24.0 | 162 | 0.3023 | 0.9053 |
94
+ | 0.1057 | 24.8889 | 168 | 0.2254 | 0.9158 |
95
+ | 0.0825 | 25.9259 | 175 | 0.3308 | 0.8737 |
96
+ | 0.0795 | 26.9630 | 182 | 0.2040 | 0.9368 |
97
+ | 0.0795 | 28.0 | 189 | 0.2148 | 0.9263 |
98
+ | 0.072 | 28.8889 | 195 | 0.3450 | 0.8632 |
99
+ | 0.0701 | 29.9259 | 202 | 0.2418 | 0.9263 |
100
+ | 0.0701 | 30.9630 | 209 | 0.2495 | 0.9263 |
101
+ | 0.0635 | 32.0 | 216 | 0.3267 | 0.8947 |
102
+ | 0.0537 | 32.8889 | 222 | 0.3728 | 0.9158 |
103
+ | 0.0537 | 33.9259 | 229 | 0.2852 | 0.9053 |
104
+ | 0.0607 | 34.9630 | 236 | 0.2386 | 0.9474 |
105
+ | 0.052 | 36.0 | 243 | 0.2070 | 0.9158 |
106
+ | 0.052 | 36.8889 | 249 | 0.1860 | 0.9474 |
107
+ | 0.049 | 37.9259 | 256 | 0.3069 | 0.8947 |
108
+ | 0.0578 | 38.9630 | 263 | 0.4477 | 0.8737 |
109
+ | 0.0533 | 40.0 | 270 | 0.2612 | 0.8947 |
110
+ | 0.0533 | 40.8889 | 276 | 0.2649 | 0.8842 |
111
+ | 0.0505 | 41.9259 | 283 | 0.1950 | 0.9263 |
112
+ | 0.0433 | 42.9630 | 290 | 0.2903 | 0.8842 |
113
+ | 0.0433 | 44.0 | 297 | 0.2526 | 0.9368 |
114
+ | 0.0395 | 44.8889 | 303 | 0.3016 | 0.8842 |
115
+ | 0.035 | 45.9259 | 310 | 0.3509 | 0.8947 |
116
+ | 0.035 | 46.9630 | 317 | 0.2943 | 0.8842 |
117
+ | 0.0335 | 48.0 | 324 | 0.2613 | 0.8842 |
118
+ | 0.0408 | 48.8889 | 330 | 0.2165 | 0.9158 |
119
+ | 0.0408 | 49.9259 | 337 | 0.2872 | 0.9263 |
120
+ | 0.0244 | 50.9630 | 344 | 0.3134 | 0.8842 |
121
+ | 0.0323 | 52.0 | 351 | 0.3006 | 0.9158 |
122
+ | 0.0323 | 52.8889 | 357 | 0.3758 | 0.8737 |
123
+ | 0.0241 | 53.9259 | 364 | 0.3033 | 0.9263 |
124
+ | 0.0193 | 54.9630 | 371 | 0.2741 | 0.9368 |
125
+ | 0.0193 | 56.0 | 378 | 0.1684 | 0.9368 |
126
+ | 0.0273 | 56.8889 | 384 | 0.2403 | 0.9474 |
127
+ | 0.0244 | 57.9259 | 391 | 0.1500 | 0.9474 |
128
+ | 0.0244 | 58.9630 | 398 | 0.1377 | 0.9368 |
129
+ | 0.0268 | 60.0 | 405 | 0.1898 | 0.9158 |
130
+ | 0.0405 | 60.8889 | 411 | 0.1756 | 0.9053 |
131
+ | 0.0405 | 61.9259 | 418 | 0.1907 | 0.9263 |
132
+ | 0.0219 | 62.9630 | 425 | 0.1790 | 0.9053 |
133
+ | 0.0329 | 64.0 | 432 | 0.1885 | 0.9368 |
134
+ | 0.0329 | 64.8889 | 438 | 0.1550 | 0.9368 |
135
+ | 0.019 | 65.9259 | 445 | 0.1811 | 0.9158 |
136
+ | 0.0205 | 66.9630 | 452 | 0.2166 | 0.9263 |
137
+ | 0.0205 | 68.0 | 459 | 0.1701 | 0.9053 |
138
+ | 0.0232 | 68.8889 | 465 | 0.2153 | 0.9158 |
139
+ | 0.0269 | 69.9259 | 472 | 0.2229 | 0.9263 |
140
+ | 0.0269 | 70.9630 | 479 | 0.2237 | 0.9263 |
141
+ | 0.0306 | 72.0 | 486 | 0.1828 | 0.9368 |
142
+ | 0.0298 | 72.8889 | 492 | 0.1448 | 0.9368 |
143
+ | 0.0298 | 73.9259 | 499 | 0.1948 | 0.9158 |
144
+ | 0.0154 | 74.9630 | 506 | 0.2570 | 0.9158 |
145
+ | 0.0193 | 76.0 | 513 | 0.2462 | 0.9263 |
146
+ | 0.0193 | 76.8889 | 519 | 0.2194 | 0.9158 |
147
+ | 0.0188 | 77.9259 | 526 | 0.2254 | 0.9158 |
148
+ | 0.0198 | 78.9630 | 533 | 0.1924 | 0.9263 |
149
+ | 0.0147 | 80.0 | 540 | 0.1525 | 0.9368 |
150
+ | 0.0147 | 80.8889 | 546 | 0.1314 | 0.9474 |
151
+ | 0.0282 | 81.9259 | 553 | 0.1381 | 0.9368 |
152
+ | 0.0168 | 82.9630 | 560 | 0.1496 | 0.9158 |
153
+ | 0.0168 | 84.0 | 567 | 0.1806 | 0.9263 |
154
+ | 0.018 | 84.8889 | 573 | 0.2080 | 0.9263 |
155
+ | 0.0172 | 85.9259 | 580 | 0.2199 | 0.9158 |
156
+ | 0.0172 | 86.9630 | 587 | 0.1939 | 0.9263 |
157
+ | 0.0117 | 88.0 | 594 | 0.1815 | 0.9263 |
158
+ | 0.0149 | 88.8889 | 600 | 0.1811 | 0.9263 |
159
+
160
+
161
+ ### Framework versions
162
+
163
+ - Transformers 4.44.0
164
+ - Pytorch 2.4.0
165
+ - Datasets 2.21.0
166
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 88.88888888888889,
3
+ "eval_accuracy": 0.9263157894736842,
4
+ "eval_loss": 0.1811211258172989,
5
+ "eval_runtime": 1.544,
6
+ "eval_samples_per_second": 61.527,
7
+ "eval_steps_per_second": 1.943,
8
+ "total_flos": 1.9132429834630103e+18,
9
+ "train_loss": 0.19732174752900997,
10
+ "train_runtime": 1871.4873,
11
+ "train_samples_per_second": 45.686,
12
+ "train_steps_per_second": 0.321
13
+ }
config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/convnextv2-tiny-1k-224",
3
+ "architectures": [
4
+ "ConvNextV2ForImageClassification"
5
+ ],
6
+ "depths": [
7
+ 3,
8
+ 3,
9
+ 9,
10
+ 3
11
+ ],
12
+ "drop_path_rate": 0.0,
13
+ "hidden_act": "gelu",
14
+ "hidden_sizes": [
15
+ 96,
16
+ 192,
17
+ 384,
18
+ 768
19
+ ],
20
+ "id2label": {
21
+ "0": "Floral",
22
+ "1": "Leopard",
23
+ "2": "Sequined",
24
+ "3": "Tie_and_Dye",
25
+ "4": "Zebra"
26
+ },
27
+ "image_size": 224,
28
+ "initializer_range": 0.02,
29
+ "label2id": {
30
+ "Floral": 0,
31
+ "Leopard": 1,
32
+ "Sequined": 2,
33
+ "Tie_and_Dye": 3,
34
+ "Zebra": 4
35
+ },
36
+ "layer_norm_eps": 1e-12,
37
+ "model_type": "convnextv2",
38
+ "num_channels": 3,
39
+ "num_stages": 4,
40
+ "out_features": [
41
+ "stage4"
42
+ ],
43
+ "out_indices": [
44
+ 4
45
+ ],
46
+ "patch_size": 4,
47
+ "problem_type": "single_label_classification",
48
+ "stage_names": [
49
+ "stem",
50
+ "stage1",
51
+ "stage2",
52
+ "stage3",
53
+ "stage4"
54
+ ],
55
+ "torch_dtype": "float32",
56
+ "transformers_version": "4.44.0"
57
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 88.88888888888889,
3
+ "eval_accuracy": 0.9263157894736842,
4
+ "eval_loss": 0.1811211258172989,
5
+ "eval_runtime": 1.544,
6
+ "eval_samples_per_second": 61.527,
7
+ "eval_steps_per_second": 1.943
8
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0143674d58f7b45de93c4477981c0c96a74f7f7fdebef703066160b9c3003f3
3
+ size 111505052
preprocessor_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_pct": 0.875,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.485,
8
+ 0.456,
9
+ 0.406
10
+ ],
11
+ "image_processor_type": "ConvNextImageProcessor",
12
+ "image_std": [
13
+ 0.229,
14
+ 0.224,
15
+ 0.225
16
+ ],
17
+ "resample": 3,
18
+ "rescale_factor": 0.00392156862745098,
19
+ "size": {
20
+ "shortest_edge": 224
21
+ }
22
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 88.88888888888889,
3
+ "total_flos": 1.9132429834630103e+18,
4
+ "train_loss": 0.19732174752900997,
5
+ "train_runtime": 1871.4873,
6
+ "train_samples_per_second": 45.686,
7
+ "train_steps_per_second": 0.321
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 88.88888888888889,
5
+ "eval_steps": 500,
6
+ "global_step": 600,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.8888888888888888,
13
+ "eval_accuracy": 0.11578947368421053,
14
+ "eval_loss": 1.6652907133102417,
15
+ "eval_runtime": 1.8033,
16
+ "eval_samples_per_second": 52.682,
17
+ "eval_steps_per_second": 1.664,
18
+ "step": 6
19
+ },
20
+ {
21
+ "epoch": 1.4814814814814814,
22
+ "grad_norm": 5.346019268035889,
23
+ "learning_rate": 8.333333333333334e-06,
24
+ "loss": 1.6915,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 1.925925925925926,
29
+ "eval_accuracy": 0.23157894736842105,
30
+ "eval_loss": 1.5941598415374756,
31
+ "eval_runtime": 1.5205,
32
+ "eval_samples_per_second": 62.481,
33
+ "eval_steps_per_second": 1.973,
34
+ "step": 13
35
+ },
36
+ {
37
+ "epoch": 2.962962962962963,
38
+ "grad_norm": 8.961028099060059,
39
+ "learning_rate": 1.6666666666666667e-05,
40
+ "loss": 1.5895,
41
+ "step": 20
42
+ },
43
+ {
44
+ "epoch": 2.962962962962963,
45
+ "eval_accuracy": 0.3894736842105263,
46
+ "eval_loss": 1.4918299913406372,
47
+ "eval_runtime": 1.5278,
48
+ "eval_samples_per_second": 62.179,
49
+ "eval_steps_per_second": 1.964,
50
+ "step": 20
51
+ },
52
+ {
53
+ "epoch": 4.0,
54
+ "eval_accuracy": 0.5052631578947369,
55
+ "eval_loss": 1.363681674003601,
56
+ "eval_runtime": 1.5355,
57
+ "eval_samples_per_second": 61.868,
58
+ "eval_steps_per_second": 1.954,
59
+ "step": 27
60
+ },
61
+ {
62
+ "epoch": 4.444444444444445,
63
+ "grad_norm": 12.23503303527832,
64
+ "learning_rate": 2.5e-05,
65
+ "loss": 1.433,
66
+ "step": 30
67
+ },
68
+ {
69
+ "epoch": 4.888888888888889,
70
+ "eval_accuracy": 0.6736842105263158,
71
+ "eval_loss": 1.2241352796554565,
72
+ "eval_runtime": 1.595,
73
+ "eval_samples_per_second": 59.559,
74
+ "eval_steps_per_second": 1.881,
75
+ "step": 33
76
+ },
77
+ {
78
+ "epoch": 5.925925925925926,
79
+ "grad_norm": 29.141326904296875,
80
+ "learning_rate": 3.3333333333333335e-05,
81
+ "loss": 1.2158,
82
+ "step": 40
83
+ },
84
+ {
85
+ "epoch": 5.925925925925926,
86
+ "eval_accuracy": 0.7473684210526316,
87
+ "eval_loss": 1.0278701782226562,
88
+ "eval_runtime": 1.5753,
89
+ "eval_samples_per_second": 60.307,
90
+ "eval_steps_per_second": 1.904,
91
+ "step": 40
92
+ },
93
+ {
94
+ "epoch": 6.962962962962963,
95
+ "eval_accuracy": 0.8105263157894737,
96
+ "eval_loss": 0.8302664756774902,
97
+ "eval_runtime": 1.6713,
98
+ "eval_samples_per_second": 56.842,
99
+ "eval_steps_per_second": 1.795,
100
+ "step": 47
101
+ },
102
+ {
103
+ "epoch": 7.407407407407407,
104
+ "grad_norm": 28.541772842407227,
105
+ "learning_rate": 4.166666666666667e-05,
106
+ "loss": 0.9438,
107
+ "step": 50
108
+ },
109
+ {
110
+ "epoch": 8.0,
111
+ "eval_accuracy": 0.8210526315789474,
112
+ "eval_loss": 0.6755695343017578,
113
+ "eval_runtime": 1.5959,
114
+ "eval_samples_per_second": 59.527,
115
+ "eval_steps_per_second": 1.88,
116
+ "step": 54
117
+ },
118
+ {
119
+ "epoch": 8.88888888888889,
120
+ "grad_norm": 14.663153648376465,
121
+ "learning_rate": 5e-05,
122
+ "loss": 0.727,
123
+ "step": 60
124
+ },
125
+ {
126
+ "epoch": 8.88888888888889,
127
+ "eval_accuracy": 0.8421052631578947,
128
+ "eval_loss": 0.5751134157180786,
129
+ "eval_runtime": 1.7131,
130
+ "eval_samples_per_second": 55.455,
131
+ "eval_steps_per_second": 1.751,
132
+ "step": 60
133
+ },
134
+ {
135
+ "epoch": 9.925925925925926,
136
+ "eval_accuracy": 0.8631578947368421,
137
+ "eval_loss": 0.4678628146648407,
138
+ "eval_runtime": 1.5414,
139
+ "eval_samples_per_second": 61.633,
140
+ "eval_steps_per_second": 1.946,
141
+ "step": 67
142
+ },
143
+ {
144
+ "epoch": 10.37037037037037,
145
+ "grad_norm": 36.71305847167969,
146
+ "learning_rate": 4.9074074074074075e-05,
147
+ "loss": 0.5516,
148
+ "step": 70
149
+ },
150
+ {
151
+ "epoch": 10.962962962962964,
152
+ "eval_accuracy": 0.8526315789473684,
153
+ "eval_loss": 0.4432494640350342,
154
+ "eval_runtime": 1.5533,
155
+ "eval_samples_per_second": 61.159,
156
+ "eval_steps_per_second": 1.931,
157
+ "step": 74
158
+ },
159
+ {
160
+ "epoch": 11.851851851851851,
161
+ "grad_norm": 25.495798110961914,
162
+ "learning_rate": 4.814814814814815e-05,
163
+ "loss": 0.4337,
164
+ "step": 80
165
+ },
166
+ {
167
+ "epoch": 12.0,
168
+ "eval_accuracy": 0.8526315789473684,
169
+ "eval_loss": 0.38154730200767517,
170
+ "eval_runtime": 1.7941,
171
+ "eval_samples_per_second": 52.95,
172
+ "eval_steps_per_second": 1.672,
173
+ "step": 81
174
+ },
175
+ {
176
+ "epoch": 12.88888888888889,
177
+ "eval_accuracy": 0.8842105263157894,
178
+ "eval_loss": 0.36410775780677795,
179
+ "eval_runtime": 1.5444,
180
+ "eval_samples_per_second": 61.514,
181
+ "eval_steps_per_second": 1.943,
182
+ "step": 87
183
+ },
184
+ {
185
+ "epoch": 13.333333333333334,
186
+ "grad_norm": 20.939546585083008,
187
+ "learning_rate": 4.722222222222222e-05,
188
+ "loss": 0.3757,
189
+ "step": 90
190
+ },
191
+ {
192
+ "epoch": 13.925925925925926,
193
+ "eval_accuracy": 0.8736842105263158,
194
+ "eval_loss": 0.3233407735824585,
195
+ "eval_runtime": 1.5363,
196
+ "eval_samples_per_second": 61.836,
197
+ "eval_steps_per_second": 1.953,
198
+ "step": 94
199
+ },
200
+ {
201
+ "epoch": 14.814814814814815,
202
+ "grad_norm": 87.37303924560547,
203
+ "learning_rate": 4.62962962962963e-05,
204
+ "loss": 0.3017,
205
+ "step": 100
206
+ },
207
+ {
208
+ "epoch": 14.962962962962964,
209
+ "eval_accuracy": 0.9157894736842105,
210
+ "eval_loss": 0.32304683327674866,
211
+ "eval_runtime": 1.5993,
212
+ "eval_samples_per_second": 59.403,
213
+ "eval_steps_per_second": 1.876,
214
+ "step": 101
215
+ },
216
+ {
217
+ "epoch": 16.0,
218
+ "eval_accuracy": 0.8842105263157894,
219
+ "eval_loss": 0.30177825689315796,
220
+ "eval_runtime": 1.5164,
221
+ "eval_samples_per_second": 62.65,
222
+ "eval_steps_per_second": 1.978,
223
+ "step": 108
224
+ },
225
+ {
226
+ "epoch": 16.296296296296298,
227
+ "grad_norm": 25.734861373901367,
228
+ "learning_rate": 4.5370370370370374e-05,
229
+ "loss": 0.2495,
230
+ "step": 110
231
+ },
232
+ {
233
+ "epoch": 16.88888888888889,
234
+ "eval_accuracy": 0.9052631578947369,
235
+ "eval_loss": 0.34445926547050476,
236
+ "eval_runtime": 1.5124,
237
+ "eval_samples_per_second": 62.812,
238
+ "eval_steps_per_second": 1.984,
239
+ "step": 114
240
+ },
241
+ {
242
+ "epoch": 17.77777777777778,
243
+ "grad_norm": 22.723478317260742,
244
+ "learning_rate": 4.4444444444444447e-05,
245
+ "loss": 0.2177,
246
+ "step": 120
247
+ },
248
+ {
249
+ "epoch": 17.925925925925927,
250
+ "eval_accuracy": 0.8947368421052632,
251
+ "eval_loss": 0.29870516061782837,
252
+ "eval_runtime": 1.6176,
253
+ "eval_samples_per_second": 58.728,
254
+ "eval_steps_per_second": 1.855,
255
+ "step": 121
256
+ },
257
+ {
258
+ "epoch": 18.962962962962962,
259
+ "eval_accuracy": 0.8947368421052632,
260
+ "eval_loss": 0.27269816398620605,
261
+ "eval_runtime": 1.5486,
262
+ "eval_samples_per_second": 61.345,
263
+ "eval_steps_per_second": 1.937,
264
+ "step": 128
265
+ },
266
+ {
267
+ "epoch": 19.25925925925926,
268
+ "grad_norm": 21.11945343017578,
269
+ "learning_rate": 4.351851851851852e-05,
270
+ "loss": 0.1738,
271
+ "step": 130
272
+ },
273
+ {
274
+ "epoch": 20.0,
275
+ "eval_accuracy": 0.8842105263157894,
276
+ "eval_loss": 0.28645142912864685,
277
+ "eval_runtime": 1.7081,
278
+ "eval_samples_per_second": 55.618,
279
+ "eval_steps_per_second": 1.756,
280
+ "step": 135
281
+ },
282
+ {
283
+ "epoch": 20.74074074074074,
284
+ "grad_norm": 55.72514343261719,
285
+ "learning_rate": 4.259259259259259e-05,
286
+ "loss": 0.1572,
287
+ "step": 140
288
+ },
289
+ {
290
+ "epoch": 20.88888888888889,
291
+ "eval_accuracy": 0.9263157894736842,
292
+ "eval_loss": 0.2645561993122101,
293
+ "eval_runtime": 1.8502,
294
+ "eval_samples_per_second": 51.345,
295
+ "eval_steps_per_second": 1.621,
296
+ "step": 141
297
+ },
298
+ {
299
+ "epoch": 21.925925925925927,
300
+ "eval_accuracy": 0.9052631578947369,
301
+ "eval_loss": 0.31001701951026917,
302
+ "eval_runtime": 1.5502,
303
+ "eval_samples_per_second": 61.282,
304
+ "eval_steps_per_second": 1.935,
305
+ "step": 148
306
+ },
307
+ {
308
+ "epoch": 22.22222222222222,
309
+ "grad_norm": 11.173165321350098,
310
+ "learning_rate": 4.166666666666667e-05,
311
+ "loss": 0.1165,
312
+ "step": 150
313
+ },
314
+ {
315
+ "epoch": 22.962962962962962,
316
+ "eval_accuracy": 0.9263157894736842,
317
+ "eval_loss": 0.30389800667762756,
318
+ "eval_runtime": 1.5599,
319
+ "eval_samples_per_second": 60.903,
320
+ "eval_steps_per_second": 1.923,
321
+ "step": 155
322
+ },
323
+ {
324
+ "epoch": 23.703703703703702,
325
+ "grad_norm": 13.963062286376953,
326
+ "learning_rate": 4.074074074074074e-05,
327
+ "loss": 0.1057,
328
+ "step": 160
329
+ },
330
+ {
331
+ "epoch": 24.0,
332
+ "eval_accuracy": 0.9052631578947369,
333
+ "eval_loss": 0.3022925555706024,
334
+ "eval_runtime": 1.682,
335
+ "eval_samples_per_second": 56.482,
336
+ "eval_steps_per_second": 1.784,
337
+ "step": 162
338
+ },
339
+ {
340
+ "epoch": 24.88888888888889,
341
+ "eval_accuracy": 0.9157894736842105,
342
+ "eval_loss": 0.2254202663898468,
343
+ "eval_runtime": 1.5031,
344
+ "eval_samples_per_second": 63.201,
345
+ "eval_steps_per_second": 1.996,
346
+ "step": 168
347
+ },
348
+ {
349
+ "epoch": 25.185185185185187,
350
+ "grad_norm": 22.49197769165039,
351
+ "learning_rate": 3.981481481481482e-05,
352
+ "loss": 0.0825,
353
+ "step": 170
354
+ },
355
+ {
356
+ "epoch": 25.925925925925927,
357
+ "eval_accuracy": 0.8736842105263158,
358
+ "eval_loss": 0.3308357298374176,
359
+ "eval_runtime": 1.5096,
360
+ "eval_samples_per_second": 62.929,
361
+ "eval_steps_per_second": 1.987,
362
+ "step": 175
363
+ },
364
+ {
365
+ "epoch": 26.666666666666668,
366
+ "grad_norm": 17.941728591918945,
367
+ "learning_rate": 3.888888888888889e-05,
368
+ "loss": 0.0795,
369
+ "step": 180
370
+ },
371
+ {
372
+ "epoch": 26.962962962962962,
373
+ "eval_accuracy": 0.9368421052631579,
374
+ "eval_loss": 0.20397017896175385,
375
+ "eval_runtime": 1.5389,
376
+ "eval_samples_per_second": 61.731,
377
+ "eval_steps_per_second": 1.949,
378
+ "step": 182
379
+ },
380
+ {
381
+ "epoch": 28.0,
382
+ "eval_accuracy": 0.9263157894736842,
383
+ "eval_loss": 0.21477854251861572,
384
+ "eval_runtime": 1.5127,
385
+ "eval_samples_per_second": 62.8,
386
+ "eval_steps_per_second": 1.983,
387
+ "step": 189
388
+ },
389
+ {
390
+ "epoch": 28.14814814814815,
391
+ "grad_norm": 11.859210014343262,
392
+ "learning_rate": 3.7962962962962964e-05,
393
+ "loss": 0.072,
394
+ "step": 190
395
+ },
396
+ {
397
+ "epoch": 28.88888888888889,
398
+ "eval_accuracy": 0.8631578947368421,
399
+ "eval_loss": 0.3449535667896271,
400
+ "eval_runtime": 1.5393,
401
+ "eval_samples_per_second": 61.716,
402
+ "eval_steps_per_second": 1.949,
403
+ "step": 195
404
+ },
405
+ {
406
+ "epoch": 29.62962962962963,
407
+ "grad_norm": 17.80461883544922,
408
+ "learning_rate": 3.7037037037037037e-05,
409
+ "loss": 0.0701,
410
+ "step": 200
411
+ },
412
+ {
413
+ "epoch": 29.925925925925927,
414
+ "eval_accuracy": 0.9263157894736842,
415
+ "eval_loss": 0.24177835881710052,
416
+ "eval_runtime": 1.7581,
417
+ "eval_samples_per_second": 54.036,
418
+ "eval_steps_per_second": 1.706,
419
+ "step": 202
420
+ },
421
+ {
422
+ "epoch": 30.962962962962962,
423
+ "eval_accuracy": 0.9263157894736842,
424
+ "eval_loss": 0.24954235553741455,
425
+ "eval_runtime": 1.5375,
426
+ "eval_samples_per_second": 61.787,
427
+ "eval_steps_per_second": 1.951,
428
+ "step": 209
429
+ },
430
+ {
431
+ "epoch": 31.11111111111111,
432
+ "grad_norm": 11.903740882873535,
433
+ "learning_rate": 3.611111111111111e-05,
434
+ "loss": 0.0635,
435
+ "step": 210
436
+ },
437
+ {
438
+ "epoch": 32.0,
439
+ "eval_accuracy": 0.8947368421052632,
440
+ "eval_loss": 0.3266756236553192,
441
+ "eval_runtime": 1.5287,
442
+ "eval_samples_per_second": 62.145,
443
+ "eval_steps_per_second": 1.962,
444
+ "step": 216
445
+ },
446
+ {
447
+ "epoch": 32.592592592592595,
448
+ "grad_norm": 9.013089179992676,
449
+ "learning_rate": 3.518518518518519e-05,
450
+ "loss": 0.0537,
451
+ "step": 220
452
+ },
453
+ {
454
+ "epoch": 32.888888888888886,
455
+ "eval_accuracy": 0.9157894736842105,
456
+ "eval_loss": 0.3727841377258301,
457
+ "eval_runtime": 1.7008,
458
+ "eval_samples_per_second": 55.857,
459
+ "eval_steps_per_second": 1.764,
460
+ "step": 222
461
+ },
462
+ {
463
+ "epoch": 33.925925925925924,
464
+ "eval_accuracy": 0.9052631578947369,
465
+ "eval_loss": 0.28518009185791016,
466
+ "eval_runtime": 1.5355,
467
+ "eval_samples_per_second": 61.868,
468
+ "eval_steps_per_second": 1.954,
469
+ "step": 229
470
+ },
471
+ {
472
+ "epoch": 34.074074074074076,
473
+ "grad_norm": 37.26246643066406,
474
+ "learning_rate": 3.425925925925926e-05,
475
+ "loss": 0.0607,
476
+ "step": 230
477
+ },
478
+ {
479
+ "epoch": 34.96296296296296,
480
+ "eval_accuracy": 0.9473684210526315,
481
+ "eval_loss": 0.23858819901943207,
482
+ "eval_runtime": 1.5204,
483
+ "eval_samples_per_second": 62.485,
484
+ "eval_steps_per_second": 1.973,
485
+ "step": 236
486
+ },
487
+ {
488
+ "epoch": 35.55555555555556,
489
+ "grad_norm": 14.775269508361816,
490
+ "learning_rate": 3.3333333333333335e-05,
491
+ "loss": 0.052,
492
+ "step": 240
493
+ },
494
+ {
495
+ "epoch": 36.0,
496
+ "eval_accuracy": 0.9157894736842105,
497
+ "eval_loss": 0.20699043571949005,
498
+ "eval_runtime": 1.6355,
499
+ "eval_samples_per_second": 58.086,
500
+ "eval_steps_per_second": 1.834,
501
+ "step": 243
502
+ },
503
+ {
504
+ "epoch": 36.888888888888886,
505
+ "eval_accuracy": 0.9473684210526315,
506
+ "eval_loss": 0.18596884608268738,
507
+ "eval_runtime": 1.5409,
508
+ "eval_samples_per_second": 61.654,
509
+ "eval_steps_per_second": 1.947,
510
+ "step": 249
511
+ },
512
+ {
513
+ "epoch": 37.03703703703704,
514
+ "grad_norm": 7.913994312286377,
515
+ "learning_rate": 3.240740740740741e-05,
516
+ "loss": 0.049,
517
+ "step": 250
518
+ },
519
+ {
520
+ "epoch": 37.925925925925924,
521
+ "eval_accuracy": 0.8947368421052632,
522
+ "eval_loss": 0.3068939745426178,
523
+ "eval_runtime": 1.5267,
524
+ "eval_samples_per_second": 62.227,
525
+ "eval_steps_per_second": 1.965,
526
+ "step": 256
527
+ },
528
+ {
529
+ "epoch": 38.51851851851852,
530
+ "grad_norm": 28.389638900756836,
531
+ "learning_rate": 3.148148148148148e-05,
532
+ "loss": 0.0578,
533
+ "step": 260
534
+ },
535
+ {
536
+ "epoch": 38.96296296296296,
537
+ "eval_accuracy": 0.8736842105263158,
538
+ "eval_loss": 0.4477124810218811,
539
+ "eval_runtime": 1.6335,
540
+ "eval_samples_per_second": 58.158,
541
+ "eval_steps_per_second": 1.837,
542
+ "step": 263
543
+ },
544
+ {
545
+ "epoch": 40.0,
546
+ "grad_norm": 66.57148742675781,
547
+ "learning_rate": 3.055555555555556e-05,
548
+ "loss": 0.0533,
549
+ "step": 270
550
+ },
551
+ {
552
+ "epoch": 40.0,
553
+ "eval_accuracy": 0.8947368421052632,
554
+ "eval_loss": 0.26121658086776733,
555
+ "eval_runtime": 1.5219,
556
+ "eval_samples_per_second": 62.422,
557
+ "eval_steps_per_second": 1.971,
558
+ "step": 270
559
+ },
560
+ {
561
+ "epoch": 40.888888888888886,
562
+ "eval_accuracy": 0.8842105263157894,
563
+ "eval_loss": 0.264914333820343,
564
+ "eval_runtime": 1.509,
565
+ "eval_samples_per_second": 62.957,
566
+ "eval_steps_per_second": 1.988,
567
+ "step": 276
568
+ },
569
+ {
570
+ "epoch": 41.48148148148148,
571
+ "grad_norm": 5.600070953369141,
572
+ "learning_rate": 2.962962962962963e-05,
573
+ "loss": 0.0505,
574
+ "step": 280
575
+ },
576
+ {
577
+ "epoch": 41.925925925925924,
578
+ "eval_accuracy": 0.9263157894736842,
579
+ "eval_loss": 0.19498933851718903,
580
+ "eval_runtime": 1.6324,
581
+ "eval_samples_per_second": 58.195,
582
+ "eval_steps_per_second": 1.838,
583
+ "step": 283
584
+ },
585
+ {
586
+ "epoch": 42.96296296296296,
587
+ "grad_norm": 5.095949172973633,
588
+ "learning_rate": 2.8703703703703706e-05,
589
+ "loss": 0.0433,
590
+ "step": 290
591
+ },
592
+ {
593
+ "epoch": 42.96296296296296,
594
+ "eval_accuracy": 0.8842105263157894,
595
+ "eval_loss": 0.29025542736053467,
596
+ "eval_runtime": 1.5094,
597
+ "eval_samples_per_second": 62.938,
598
+ "eval_steps_per_second": 1.988,
599
+ "step": 290
600
+ },
601
+ {
602
+ "epoch": 44.0,
603
+ "eval_accuracy": 0.9368421052631579,
604
+ "eval_loss": 0.2526479661464691,
605
+ "eval_runtime": 1.523,
606
+ "eval_samples_per_second": 62.376,
607
+ "eval_steps_per_second": 1.97,
608
+ "step": 297
609
+ },
610
+ {
611
+ "epoch": 44.44444444444444,
612
+ "grad_norm": 9.107303619384766,
613
+ "learning_rate": 2.777777777777778e-05,
614
+ "loss": 0.0395,
615
+ "step": 300
616
+ },
617
+ {
618
+ "epoch": 44.888888888888886,
619
+ "eval_accuracy": 0.8842105263157894,
620
+ "eval_loss": 0.30155882239341736,
621
+ "eval_runtime": 1.7275,
622
+ "eval_samples_per_second": 54.991,
623
+ "eval_steps_per_second": 1.737,
624
+ "step": 303
625
+ },
626
+ {
627
+ "epoch": 45.925925925925924,
628
+ "grad_norm": 10.148141860961914,
629
+ "learning_rate": 2.6851851851851855e-05,
630
+ "loss": 0.035,
631
+ "step": 310
632
+ },
633
+ {
634
+ "epoch": 45.925925925925924,
635
+ "eval_accuracy": 0.8947368421052632,
636
+ "eval_loss": 0.3509025275707245,
637
+ "eval_runtime": 1.5108,
638
+ "eval_samples_per_second": 62.88,
639
+ "eval_steps_per_second": 1.986,
640
+ "step": 310
641
+ },
642
+ {
643
+ "epoch": 46.96296296296296,
644
+ "eval_accuracy": 0.8842105263157894,
645
+ "eval_loss": 0.29430651664733887,
646
+ "eval_runtime": 1.5062,
647
+ "eval_samples_per_second": 63.071,
648
+ "eval_steps_per_second": 1.992,
649
+ "step": 317
650
+ },
651
+ {
652
+ "epoch": 47.407407407407405,
653
+ "grad_norm": 4.681646347045898,
654
+ "learning_rate": 2.5925925925925925e-05,
655
+ "loss": 0.0335,
656
+ "step": 320
657
+ },
658
+ {
659
+ "epoch": 48.0,
660
+ "eval_accuracy": 0.8842105263157894,
661
+ "eval_loss": 0.2613106667995453,
662
+ "eval_runtime": 1.6847,
663
+ "eval_samples_per_second": 56.39,
664
+ "eval_steps_per_second": 1.781,
665
+ "step": 324
666
+ },
667
+ {
668
+ "epoch": 48.888888888888886,
669
+ "grad_norm": 10.262737274169922,
670
+ "learning_rate": 2.5e-05,
671
+ "loss": 0.0408,
672
+ "step": 330
673
+ },
674
+ {
675
+ "epoch": 48.888888888888886,
676
+ "eval_accuracy": 0.9157894736842105,
677
+ "eval_loss": 0.21650759875774384,
678
+ "eval_runtime": 1.5245,
679
+ "eval_samples_per_second": 62.315,
680
+ "eval_steps_per_second": 1.968,
681
+ "step": 330
682
+ },
683
+ {
684
+ "epoch": 49.925925925925924,
685
+ "eval_accuracy": 0.9263157894736842,
686
+ "eval_loss": 0.28715887665748596,
687
+ "eval_runtime": 1.5241,
688
+ "eval_samples_per_second": 62.331,
689
+ "eval_steps_per_second": 1.968,
690
+ "step": 337
691
+ },
692
+ {
693
+ "epoch": 50.370370370370374,
694
+ "grad_norm": 9.619718551635742,
695
+ "learning_rate": 2.4074074074074074e-05,
696
+ "loss": 0.0244,
697
+ "step": 340
698
+ },
699
+ {
700
+ "epoch": 50.96296296296296,
701
+ "eval_accuracy": 0.8842105263157894,
702
+ "eval_loss": 0.31339362263679504,
703
+ "eval_runtime": 1.6247,
704
+ "eval_samples_per_second": 58.472,
705
+ "eval_steps_per_second": 1.846,
706
+ "step": 344
707
+ },
708
+ {
709
+ "epoch": 51.851851851851855,
710
+ "grad_norm": 6.175246715545654,
711
+ "learning_rate": 2.314814814814815e-05,
712
+ "loss": 0.0323,
713
+ "step": 350
714
+ },
715
+ {
716
+ "epoch": 52.0,
717
+ "eval_accuracy": 0.9157894736842105,
718
+ "eval_loss": 0.3006380796432495,
719
+ "eval_runtime": 1.5132,
720
+ "eval_samples_per_second": 62.78,
721
+ "eval_steps_per_second": 1.983,
722
+ "step": 351
723
+ },
724
+ {
725
+ "epoch": 52.888888888888886,
726
+ "eval_accuracy": 0.8736842105263158,
727
+ "eval_loss": 0.37583690881729126,
728
+ "eval_runtime": 1.5262,
729
+ "eval_samples_per_second": 62.247,
730
+ "eval_steps_per_second": 1.966,
731
+ "step": 357
732
+ },
733
+ {
734
+ "epoch": 53.333333333333336,
735
+ "grad_norm": 11.230114936828613,
736
+ "learning_rate": 2.2222222222222223e-05,
737
+ "loss": 0.0241,
738
+ "step": 360
739
+ },
740
+ {
741
+ "epoch": 53.925925925925924,
742
+ "eval_accuracy": 0.9263157894736842,
743
+ "eval_loss": 0.3033463954925537,
744
+ "eval_runtime": 1.6639,
745
+ "eval_samples_per_second": 57.093,
746
+ "eval_steps_per_second": 1.803,
747
+ "step": 364
748
+ },
749
+ {
750
+ "epoch": 54.81481481481482,
751
+ "grad_norm": 6.497807502746582,
752
+ "learning_rate": 2.1296296296296296e-05,
753
+ "loss": 0.0193,
754
+ "step": 370
755
+ },
756
+ {
757
+ "epoch": 54.96296296296296,
758
+ "eval_accuracy": 0.9368421052631579,
759
+ "eval_loss": 0.27406617999076843,
760
+ "eval_runtime": 1.5407,
761
+ "eval_samples_per_second": 61.662,
762
+ "eval_steps_per_second": 1.947,
763
+ "step": 371
764
+ },
765
+ {
766
+ "epoch": 56.0,
767
+ "eval_accuracy": 0.9368421052631579,
768
+ "eval_loss": 0.1684454083442688,
769
+ "eval_runtime": 1.5222,
770
+ "eval_samples_per_second": 62.409,
771
+ "eval_steps_per_second": 1.971,
772
+ "step": 378
773
+ },
774
+ {
775
+ "epoch": 56.2962962962963,
776
+ "grad_norm": 1.369285225868225,
777
+ "learning_rate": 2.037037037037037e-05,
778
+ "loss": 0.0273,
779
+ "step": 380
780
+ },
781
+ {
782
+ "epoch": 56.888888888888886,
783
+ "eval_accuracy": 0.9473684210526315,
784
+ "eval_loss": 0.240325465798378,
785
+ "eval_runtime": 1.6193,
786
+ "eval_samples_per_second": 58.666,
787
+ "eval_steps_per_second": 1.853,
788
+ "step": 384
789
+ },
790
+ {
791
+ "epoch": 57.77777777777778,
792
+ "grad_norm": 16.219879150390625,
793
+ "learning_rate": 1.9444444444444445e-05,
794
+ "loss": 0.0244,
795
+ "step": 390
796
+ },
797
+ {
798
+ "epoch": 57.925925925925924,
799
+ "eval_accuracy": 0.9473684210526315,
800
+ "eval_loss": 0.14995306730270386,
801
+ "eval_runtime": 1.509,
802
+ "eval_samples_per_second": 62.958,
803
+ "eval_steps_per_second": 1.988,
804
+ "step": 391
805
+ },
806
+ {
807
+ "epoch": 58.96296296296296,
808
+ "eval_accuracy": 0.9368421052631579,
809
+ "eval_loss": 0.13768525421619415,
810
+ "eval_runtime": 1.5002,
811
+ "eval_samples_per_second": 63.323,
812
+ "eval_steps_per_second": 2.0,
813
+ "step": 398
814
+ },
815
+ {
816
+ "epoch": 59.25925925925926,
817
+ "grad_norm": 17.92609214782715,
818
+ "learning_rate": 1.8518518518518518e-05,
819
+ "loss": 0.0268,
820
+ "step": 400
821
+ },
822
+ {
823
+ "epoch": 60.0,
824
+ "eval_accuracy": 0.9157894736842105,
825
+ "eval_loss": 0.18984580039978027,
826
+ "eval_runtime": 1.5627,
827
+ "eval_samples_per_second": 60.793,
828
+ "eval_steps_per_second": 1.92,
829
+ "step": 405
830
+ },
831
+ {
832
+ "epoch": 60.74074074074074,
833
+ "grad_norm": 4.754770278930664,
834
+ "learning_rate": 1.7592592592592595e-05,
835
+ "loss": 0.0405,
836
+ "step": 410
837
+ },
838
+ {
839
+ "epoch": 60.888888888888886,
840
+ "eval_accuracy": 0.9052631578947369,
841
+ "eval_loss": 0.17558255791664124,
842
+ "eval_runtime": 1.5023,
843
+ "eval_samples_per_second": 63.237,
844
+ "eval_steps_per_second": 1.997,
845
+ "step": 411
846
+ },
847
+ {
848
+ "epoch": 61.925925925925924,
849
+ "eval_accuracy": 0.9263157894736842,
850
+ "eval_loss": 0.1907454878091812,
851
+ "eval_runtime": 1.5123,
852
+ "eval_samples_per_second": 62.817,
853
+ "eval_steps_per_second": 1.984,
854
+ "step": 418
855
+ },
856
+ {
857
+ "epoch": 62.22222222222222,
858
+ "grad_norm": 6.852460861206055,
859
+ "learning_rate": 1.6666666666666667e-05,
860
+ "loss": 0.0219,
861
+ "step": 420
862
+ },
863
+ {
864
+ "epoch": 62.96296296296296,
865
+ "eval_accuracy": 0.9052631578947369,
866
+ "eval_loss": 0.17904597520828247,
867
+ "eval_runtime": 1.5634,
868
+ "eval_samples_per_second": 60.763,
869
+ "eval_steps_per_second": 1.919,
870
+ "step": 425
871
+ },
872
+ {
873
+ "epoch": 63.7037037037037,
874
+ "grad_norm": 17.433446884155273,
875
+ "learning_rate": 1.574074074074074e-05,
876
+ "loss": 0.0329,
877
+ "step": 430
878
+ },
879
+ {
880
+ "epoch": 64.0,
881
+ "eval_accuracy": 0.9368421052631579,
882
+ "eval_loss": 0.18854853510856628,
883
+ "eval_runtime": 1.5466,
884
+ "eval_samples_per_second": 61.427,
885
+ "eval_steps_per_second": 1.94,
886
+ "step": 432
887
+ },
888
+ {
889
+ "epoch": 64.88888888888889,
890
+ "eval_accuracy": 0.9368421052631579,
891
+ "eval_loss": 0.15500715374946594,
892
+ "eval_runtime": 1.5091,
893
+ "eval_samples_per_second": 62.951,
894
+ "eval_steps_per_second": 1.988,
895
+ "step": 438
896
+ },
897
+ {
898
+ "epoch": 65.18518518518519,
899
+ "grad_norm": 12.199813842773438,
900
+ "learning_rate": 1.4814814814814815e-05,
901
+ "loss": 0.019,
902
+ "step": 440
903
+ },
904
+ {
905
+ "epoch": 65.92592592592592,
906
+ "eval_accuracy": 0.9157894736842105,
907
+ "eval_loss": 0.18106068670749664,
908
+ "eval_runtime": 1.5434,
909
+ "eval_samples_per_second": 61.553,
910
+ "eval_steps_per_second": 1.944,
911
+ "step": 445
912
+ },
913
+ {
914
+ "epoch": 66.66666666666667,
915
+ "grad_norm": 5.085737228393555,
916
+ "learning_rate": 1.388888888888889e-05,
917
+ "loss": 0.0205,
918
+ "step": 450
919
+ },
920
+ {
921
+ "epoch": 66.96296296296296,
922
+ "eval_accuracy": 0.9263157894736842,
923
+ "eval_loss": 0.2165663242340088,
924
+ "eval_runtime": 1.5356,
925
+ "eval_samples_per_second": 61.864,
926
+ "eval_steps_per_second": 1.954,
927
+ "step": 452
928
+ },
929
+ {
930
+ "epoch": 68.0,
931
+ "eval_accuracy": 0.9052631578947369,
932
+ "eval_loss": 0.17012225091457367,
933
+ "eval_runtime": 1.505,
934
+ "eval_samples_per_second": 63.122,
935
+ "eval_steps_per_second": 1.993,
936
+ "step": 459
937
+ },
938
+ {
939
+ "epoch": 68.14814814814815,
940
+ "grad_norm": 4.042739391326904,
941
+ "learning_rate": 1.2962962962962962e-05,
942
+ "loss": 0.0232,
943
+ "step": 460
944
+ },
945
+ {
946
+ "epoch": 68.88888888888889,
947
+ "eval_accuracy": 0.9157894736842105,
948
+ "eval_loss": 0.21532098948955536,
949
+ "eval_runtime": 1.4982,
950
+ "eval_samples_per_second": 63.41,
951
+ "eval_steps_per_second": 2.002,
952
+ "step": 465
953
+ },
954
+ {
955
+ "epoch": 69.62962962962963,
956
+ "grad_norm": 3.4432425498962402,
957
+ "learning_rate": 1.2037037037037037e-05,
958
+ "loss": 0.0269,
959
+ "step": 470
960
+ },
961
+ {
962
+ "epoch": 69.92592592592592,
963
+ "eval_accuracy": 0.9263157894736842,
964
+ "eval_loss": 0.22287000715732574,
965
+ "eval_runtime": 1.5307,
966
+ "eval_samples_per_second": 62.063,
967
+ "eval_steps_per_second": 1.96,
968
+ "step": 472
969
+ },
970
+ {
971
+ "epoch": 70.96296296296296,
972
+ "eval_accuracy": 0.9263157894736842,
973
+ "eval_loss": 0.2237367182970047,
974
+ "eval_runtime": 1.5031,
975
+ "eval_samples_per_second": 63.204,
976
+ "eval_steps_per_second": 1.996,
977
+ "step": 479
978
+ },
979
+ {
980
+ "epoch": 71.11111111111111,
981
+ "grad_norm": 9.640032768249512,
982
+ "learning_rate": 1.1111111111111112e-05,
983
+ "loss": 0.0306,
984
+ "step": 480
985
+ },
986
+ {
987
+ "epoch": 72.0,
988
+ "eval_accuracy": 0.9368421052631579,
989
+ "eval_loss": 0.18282432854175568,
990
+ "eval_runtime": 1.5,
991
+ "eval_samples_per_second": 63.334,
992
+ "eval_steps_per_second": 2.0,
993
+ "step": 486
994
+ },
995
+ {
996
+ "epoch": 72.5925925925926,
997
+ "grad_norm": 9.28294849395752,
998
+ "learning_rate": 1.0185185185185185e-05,
999
+ "loss": 0.0298,
1000
+ "step": 490
1001
+ },
1002
+ {
1003
+ "epoch": 72.88888888888889,
1004
+ "eval_accuracy": 0.9368421052631579,
1005
+ "eval_loss": 0.14476627111434937,
1006
+ "eval_runtime": 1.5255,
1007
+ "eval_samples_per_second": 62.276,
1008
+ "eval_steps_per_second": 1.967,
1009
+ "step": 492
1010
+ },
1011
+ {
1012
+ "epoch": 73.92592592592592,
1013
+ "eval_accuracy": 0.9157894736842105,
1014
+ "eval_loss": 0.19477160274982452,
1015
+ "eval_runtime": 1.5341,
1016
+ "eval_samples_per_second": 61.925,
1017
+ "eval_steps_per_second": 1.956,
1018
+ "step": 499
1019
+ },
1020
+ {
1021
+ "epoch": 74.07407407407408,
1022
+ "grad_norm": 0.5367471575737,
1023
+ "learning_rate": 9.259259259259259e-06,
1024
+ "loss": 0.0154,
1025
+ "step": 500
1026
+ },
1027
+ {
1028
+ "epoch": 74.96296296296296,
1029
+ "eval_accuracy": 0.9157894736842105,
1030
+ "eval_loss": 0.25699615478515625,
1031
+ "eval_runtime": 1.5191,
1032
+ "eval_samples_per_second": 62.538,
1033
+ "eval_steps_per_second": 1.975,
1034
+ "step": 506
1035
+ },
1036
+ {
1037
+ "epoch": 75.55555555555556,
1038
+ "grad_norm": 16.879281997680664,
1039
+ "learning_rate": 8.333333333333334e-06,
1040
+ "loss": 0.0193,
1041
+ "step": 510
1042
+ },
1043
+ {
1044
+ "epoch": 76.0,
1045
+ "eval_accuracy": 0.9263157894736842,
1046
+ "eval_loss": 0.24617379903793335,
1047
+ "eval_runtime": 1.5485,
1048
+ "eval_samples_per_second": 61.351,
1049
+ "eval_steps_per_second": 1.937,
1050
+ "step": 513
1051
+ },
1052
+ {
1053
+ "epoch": 76.88888888888889,
1054
+ "eval_accuracy": 0.9157894736842105,
1055
+ "eval_loss": 0.21939770877361298,
1056
+ "eval_runtime": 1.5513,
1057
+ "eval_samples_per_second": 61.24,
1058
+ "eval_steps_per_second": 1.934,
1059
+ "step": 519
1060
+ },
1061
+ {
1062
+ "epoch": 77.03703703703704,
1063
+ "grad_norm": 9.813831329345703,
1064
+ "learning_rate": 7.4074074074074075e-06,
1065
+ "loss": 0.0188,
1066
+ "step": 520
1067
+ },
1068
+ {
1069
+ "epoch": 77.92592592592592,
1070
+ "eval_accuracy": 0.9157894736842105,
1071
+ "eval_loss": 0.22538267076015472,
1072
+ "eval_runtime": 1.5394,
1073
+ "eval_samples_per_second": 61.71,
1074
+ "eval_steps_per_second": 1.949,
1075
+ "step": 526
1076
+ },
1077
+ {
1078
+ "epoch": 78.51851851851852,
1079
+ "grad_norm": 6.672292709350586,
1080
+ "learning_rate": 6.481481481481481e-06,
1081
+ "loss": 0.0198,
1082
+ "step": 530
1083
+ },
1084
+ {
1085
+ "epoch": 78.96296296296296,
1086
+ "eval_accuracy": 0.9263157894736842,
1087
+ "eval_loss": 0.192366823554039,
1088
+ "eval_runtime": 1.511,
1089
+ "eval_samples_per_second": 62.873,
1090
+ "eval_steps_per_second": 1.985,
1091
+ "step": 533
1092
+ },
1093
+ {
1094
+ "epoch": 80.0,
1095
+ "grad_norm": 12.444893836975098,
1096
+ "learning_rate": 5.555555555555556e-06,
1097
+ "loss": 0.0147,
1098
+ "step": 540
1099
+ },
1100
+ {
1101
+ "epoch": 80.0,
1102
+ "eval_accuracy": 0.9368421052631579,
1103
+ "eval_loss": 0.15250863134860992,
1104
+ "eval_runtime": 1.5101,
1105
+ "eval_samples_per_second": 62.91,
1106
+ "eval_steps_per_second": 1.987,
1107
+ "step": 540
1108
+ },
1109
+ {
1110
+ "epoch": 80.88888888888889,
1111
+ "eval_accuracy": 0.9473684210526315,
1112
+ "eval_loss": 0.13136789202690125,
1113
+ "eval_runtime": 1.5108,
1114
+ "eval_samples_per_second": 62.88,
1115
+ "eval_steps_per_second": 1.986,
1116
+ "step": 546
1117
+ },
1118
+ {
1119
+ "epoch": 81.48148148148148,
1120
+ "grad_norm": 5.455392837524414,
1121
+ "learning_rate": 4.6296296296296296e-06,
1122
+ "loss": 0.0282,
1123
+ "step": 550
1124
+ },
1125
+ {
1126
+ "epoch": 81.92592592592592,
1127
+ "eval_accuracy": 0.9368421052631579,
1128
+ "eval_loss": 0.13807313144207,
1129
+ "eval_runtime": 1.5108,
1130
+ "eval_samples_per_second": 62.881,
1131
+ "eval_steps_per_second": 1.986,
1132
+ "step": 553
1133
+ },
1134
+ {
1135
+ "epoch": 82.96296296296296,
1136
+ "grad_norm": 0.557128369808197,
1137
+ "learning_rate": 3.7037037037037037e-06,
1138
+ "loss": 0.0168,
1139
+ "step": 560
1140
+ },
1141
+ {
1142
+ "epoch": 82.96296296296296,
1143
+ "eval_accuracy": 0.9157894736842105,
1144
+ "eval_loss": 0.14955918490886688,
1145
+ "eval_runtime": 1.5176,
1146
+ "eval_samples_per_second": 62.599,
1147
+ "eval_steps_per_second": 1.977,
1148
+ "step": 560
1149
+ },
1150
+ {
1151
+ "epoch": 84.0,
1152
+ "eval_accuracy": 0.9263157894736842,
1153
+ "eval_loss": 0.1806280016899109,
1154
+ "eval_runtime": 1.5082,
1155
+ "eval_samples_per_second": 62.991,
1156
+ "eval_steps_per_second": 1.989,
1157
+ "step": 567
1158
+ },
1159
+ {
1160
+ "epoch": 84.44444444444444,
1161
+ "grad_norm": 6.701883316040039,
1162
+ "learning_rate": 2.777777777777778e-06,
1163
+ "loss": 0.018,
1164
+ "step": 570
1165
+ },
1166
+ {
1167
+ "epoch": 84.88888888888889,
1168
+ "eval_accuracy": 0.9263157894736842,
1169
+ "eval_loss": 0.20803439617156982,
1170
+ "eval_runtime": 1.5233,
1171
+ "eval_samples_per_second": 62.366,
1172
+ "eval_steps_per_second": 1.969,
1173
+ "step": 573
1174
+ },
1175
+ {
1176
+ "epoch": 85.92592592592592,
1177
+ "grad_norm": 5.107237339019775,
1178
+ "learning_rate": 1.8518518518518519e-06,
1179
+ "loss": 0.0172,
1180
+ "step": 580
1181
+ },
1182
+ {
1183
+ "epoch": 85.92592592592592,
1184
+ "eval_accuracy": 0.9157894736842105,
1185
+ "eval_loss": 0.21987786889076233,
1186
+ "eval_runtime": 1.5032,
1187
+ "eval_samples_per_second": 63.198,
1188
+ "eval_steps_per_second": 1.996,
1189
+ "step": 580
1190
+ },
1191
+ {
1192
+ "epoch": 86.96296296296296,
1193
+ "eval_accuracy": 0.9263157894736842,
1194
+ "eval_loss": 0.19394682347774506,
1195
+ "eval_runtime": 1.5704,
1196
+ "eval_samples_per_second": 60.494,
1197
+ "eval_steps_per_second": 1.91,
1198
+ "step": 587
1199
+ },
1200
+ {
1201
+ "epoch": 87.4074074074074,
1202
+ "grad_norm": 13.376716613769531,
1203
+ "learning_rate": 9.259259259259259e-07,
1204
+ "loss": 0.0117,
1205
+ "step": 590
1206
+ },
1207
+ {
1208
+ "epoch": 88.0,
1209
+ "eval_accuracy": 0.9263157894736842,
1210
+ "eval_loss": 0.18152017891407013,
1211
+ "eval_runtime": 1.5369,
1212
+ "eval_samples_per_second": 61.812,
1213
+ "eval_steps_per_second": 1.952,
1214
+ "step": 594
1215
+ },
1216
+ {
1217
+ "epoch": 88.88888888888889,
1218
+ "grad_norm": 8.48166561126709,
1219
+ "learning_rate": 0.0,
1220
+ "loss": 0.0149,
1221
+ "step": 600
1222
+ },
1223
+ {
1224
+ "epoch": 88.88888888888889,
1225
+ "eval_accuracy": 0.9263157894736842,
1226
+ "eval_loss": 0.1811211258172989,
1227
+ "eval_runtime": 1.529,
1228
+ "eval_samples_per_second": 62.13,
1229
+ "eval_steps_per_second": 1.962,
1230
+ "step": 600
1231
+ },
1232
+ {
1233
+ "epoch": 88.88888888888889,
1234
+ "step": 600,
1235
+ "total_flos": 1.9132429834630103e+18,
1236
+ "train_loss": 0.19732174752900997,
1237
+ "train_runtime": 1871.4873,
1238
+ "train_samples_per_second": 45.686,
1239
+ "train_steps_per_second": 0.321
1240
+ }
1241
+ ],
1242
+ "logging_steps": 10,
1243
+ "max_steps": 600,
1244
+ "num_input_tokens_seen": 0,
1245
+ "num_train_epochs": 100,
1246
+ "save_steps": 500,
1247
+ "stateful_callbacks": {
1248
+ "TrainerControl": {
1249
+ "args": {
1250
+ "should_epoch_stop": false,
1251
+ "should_evaluate": false,
1252
+ "should_log": false,
1253
+ "should_save": false,
1254
+ "should_training_stop": false
1255
+ },
1256
+ "attributes": {}
1257
+ }
1258
+ },
1259
+ "total_flos": 1.9132429834630103e+18,
1260
+ "train_batch_size": 32,
1261
+ "trial_name": null,
1262
+ "trial_params": null
1263
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18be3493fba34b5ff2a06754d242143c7cc13c3f17fadae41c1895a702cd0069
3
+ size 5240