Severian commited on
Commit
e7dc96e
1 Parent(s): 9d5345a

Upload MistralForCausalLM

Browse files
README.md CHANGED
@@ -1,197 +1,23 @@
1
  ---
2
- library_name: transformers
3
- pipeline_tag: text-generation
4
  license: mit
 
5
  datasets:
6
  - Severian/Internal-Knowledge-Map
 
7
  ---
8
 
9
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/NjdIwN-vXhF7STzDdjM2x.webp" width="500" height="500">
10
-
11
- # New Fixed Version with extended training available now!
12
-
13
-
14
- This model is the second trained with experimental 'Internal Knowledge Map' dataset. Developed with an aim to go beyond the scope of usual data processing capabilities, this model gets trained to build comprehensive understanding and reasoning in a wide range of knowledge domains with elaborate guidelines. It bases its reasoning on a specially selected dataset emphasizing the interrelations of the diverse disciplines which aim to synthesize, integrate, and apply complex information in ways that mimic humanly abstract reasoning and creative thought processes.
15
-
16
- At the very core of the development of this model is the desire to make sure that LLMs engage in a kind of cognitive activity not limited to memory but actually taking on abstract reasoning, problem-solving, and generation of new insights. To achieve this, 'Nexus-IKM-Mistral-7B' has been fine-tuned until convergance at ~15 Epochs on this unique dataset, which resulted in the model demonstrating greater capability for giving rise to insights and problem-solving in complex, multi-disciplinary settings. This involves improved ability in drawing links between different pieces of knowledge, reasoning through complex scenarios, and proposing innovative solutions that cut across various domains, including science, technology, environmental studies, and humanities.
17
 
18
- Test this out and see if you find anything interesting or intriguing. I will keep iterating more versions but this one seems like a fun and useful way to start.
19
 
 
20
 
21
  ## GGUF Q8 Version: https://huggingface.co/Severian/Nexus-IKM-Mistral-7B-GGUF
22
 
23
 
24
  **If you'd like to train your own version, here is the full notebook to recreate the training on Unsloth yourself (https://colab.research.google.com/drive/1828t77iO2nLRXVfB8HoI11eFu-79-Oe7?usp=sharing). You'll just have to drop in the train.jsonl from the Dataset repo (https://huggingface.co/datasets/Severian/Internal-Knowledge-Map) into your Colab directory and rename it dataset.jsonl**
25
 
 
26
 
 
27
 
28
- # Training Snaphot
29
-
30
-
31
- ```
32
- Step Training Loss
33
- 1 3.223000
34
- 2 3.221300
35
- 3 3.215900
36
- 4 3.210600
37
- 5 3.203000
38
- 6 3.193500
39
- 7 3.184000
40
- 8 3.173400
41
- 9 3.162400
42
- 10 3.151500
43
- 11 3.140500
44
- 12 3.128800
45
- 13 3.117600
46
- 14 3.106700
47
- 15 3.095500
48
- 16 3.084700
49
- 17 3.073700
50
- 18 3.062700
51
- 19 3.052300
52
- 20 3.041800
53
-
54
-
55
- 201 1.273200
56
- 202 1.257600
57
- 203 1.241900
58
- 204 1.226100
59
- 205 1.210800
60
- 206 1.195500
61
- 207 1.180800
62
- 208 1.166000
63
- 209 1.151200
64
- 210 1.136900
65
- 211 1.122000
66
- 212 1.106600
67
- 213 1.091200
68
- 214 1.075200
69
- 215 1.059200
70
- 216 1.042900
71
- 217 1.026600
72
- 218 1.010300
73
- 219 0.994200
74
- 416 0.041700
75
- 417 0.041700
76
- 418 0.041600
77
- 419 0.041600
78
- 420 0.041600
79
- 421 0.041600
80
- 422 0.041500
81
- 423 0.041500
82
- 424 0.041500
83
- 425 0.041400
84
- 426 0.041400
85
- 427 0.041400
86
-
87
- 668 0.035200
88
- 669 0.035100
89
- 670 0.035100
90
- 671 0.035100
91
- 672 0.035100
92
- 673 0.035000
93
- 674 0.035000
94
- 675 0.035000
95
- 676 0.035000
96
- 677 0.034900
97
- 678 0.034900
98
- 679 0.034900
99
- 680 0.034800
100
- 681 0.034800
101
- 682 0.034800
102
- 683 0.034800
103
- 684 0.034800
104
- 685 0.034700
105
- 686 0.034700
106
-
107
- 1209 0.006600
108
- 1210 0.006500
109
- 1211 0.006300
110
- 1212 0.006200
111
- 1213 0.006100
112
- 1214 0.006000
113
- 1215 0.005800
114
- 1216 0.005700
115
- 1217 0.005600
116
- 1218 0.005500
117
- 1219 0.005400
118
- 1220 0.005300
119
- 1221 0.005100
120
- 1222 0.004900
121
- 1223 0.004800
122
- 1224 0.004700
123
- 1225 0.004600
124
- 1226 0.004500
125
- 1227 0.004400
126
- 1228 0.004300
127
- 1229 0.004200
128
- 1230 0.004000
129
- 1231 0.003900
130
- 1232 0.003800
131
- 1233 0.003700
132
- 1234 0.003500
133
- 1235 0.003400
134
- 1236 0.003300
135
- 1237 0.003200
136
- 1238 0.003000
137
- 1239 0.003000
138
- 1240 0.002900
139
- 1241 0.002800
140
- 1242 0.002700
141
- 1243 0.002600
142
- 1244 0.002500
143
- 1245 0.002400
144
- 1246 0.002300
145
- 1247 0.002200
146
- 1248 0.002100
147
- 1249 0.002000
148
- 1250 0.001900
149
- 1251 0.001800
150
- 1252 0.001800
151
- 1253 0.001700
152
- 1254 0.001600
153
- 1255 0.001600
154
- 1256 0.001500
155
- 1257 0.001400
156
- 1258 0.001300
157
- 1259 0.001300
158
- 1260 0.001200
159
- 1261 0.001200
160
- 1262 0.001100
161
- 1263 0.001100
162
- 1264 0.001000
163
- 1265 0.001000
164
- 1266 0.000900
165
- 1267 0.000900
166
- 1268 0.000800
167
- 1269 0.000800
168
- 1270 0.000800
169
- 1271 0.000800
170
- 1272 0.000700
171
- 1273 0.000700
172
- 1274 0.000700
173
- 1275 0.000600
174
- 1276 0.000600
175
- 1277 0.000600
176
- 1278 0.000600
177
- 1279 0.000500
178
- 1280 0.000500
179
- 1281 0.000500
180
- 1282 0.000500
181
- 1283 0.000500
182
- 1284 0.000500
183
- 1285 0.000500
184
- 1286 0.000400
185
- 1287 0.000400
186
- 1288 0.000400
187
- 1289 0.000400
188
- 1290 0.000400
189
- 1291 0.000400
190
- 1292 0.000400
191
- 1293 0.000400
192
- 1294 0.000400
193
- 1295 0.000400
194
- 1296 0.000400
195
- 1297 0.000300
196
- 1298 0.000300
197
- ```
 
1
  ---
 
 
2
  license: mit
3
+ library_name: transformers
4
  datasets:
5
  - Severian/Internal-Knowledge-Map
6
+ pipeline_tag: text-generation
7
  ---
8
 
9
+ # New Fixed Version with extended training being uploaded by end of day 3/5!
 
 
 
 
 
 
 
10
 
 
11
 
12
+ ## Unfortunately there are some issues with this current model in how it was fused during training, leading to bad outputs. I am retraining and will reupload ASAP. In the meantime you can still use the Q8 GGUF version which works great.
13
 
14
  ## GGUF Q8 Version: https://huggingface.co/Severian/Nexus-IKM-Mistral-7B-GGUF
15
 
16
 
17
  **If you'd like to train your own version, here is the full notebook to recreate the training on Unsloth yourself (https://colab.research.google.com/drive/1828t77iO2nLRXVfB8HoI11eFu-79-Oe7?usp=sharing). You'll just have to drop in the train.jsonl from the Dataset repo (https://huggingface.co/datasets/Severian/Internal-Knowledge-Map) into your Colab directory and rename it dataset.jsonl**
18
 
19
+ This model is the second trained with experimental 'Internal Knowledge Map' dataset. Developed with an aim to go beyond the scope of usual data processing capabilities, this model gets trained to build comprehensive understanding and reasoning in a wide range of knowledge domains with elaborate guidelines. It bases its reasoning on a specially selected dataset emphasizing the interrelations of the diverse disciplines which aim to synthesize, integrate, and apply complex information in ways that mimic humanly abstract reasoning and creative thought processes.
20
 
21
+ At the very core of the development of this model is the desire to make sure that LLMs engage in a kind of cognitive activity not limited to memory but actually taking on abstract reasoning, problem-solving, and generation of new insights. To achieve this, 'Nexus-IKM-Mistral-7B' has been fine-tuned until 10 Epochs on this unique dataset, which resulted in the model demonstrating greater capability for giving rise to insights and problem-solving in complex, multi-disciplinary settings. This involves improved ability in drawing links between different pieces of knowledge, reasoning through complex scenarios, and proposing innovative solutions that cut across various domains, including science, technology, environmental studies, and humanities.
22
 
23
+ Test this out and see if you find anything interesting or intriguing. I will keep iterating more versions but this one seems like a fun and useful way to start.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
@@ -21,7 +21,7 @@
21
  "sliding_window": null,
22
  "tie_word_embeddings": false,
23
  "torch_dtype": "float16",
24
- "transformers_version": "4.38.2",
25
  "unsloth_version": "2024.3",
26
  "use_cache": true,
27
  "vocab_size": 32000
 
1
  {
2
+ "_name_or_path": "/Users/anima/text-generation-webui/models/Nexus-Mistral7B-v2",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
21
  "sliding_window": null,
22
  "tie_word_embeddings": false,
23
  "torch_dtype": "float16",
24
+ "transformers_version": "4.38.1",
25
  "unsloth_version": "2024.3",
26
  "use_cache": true,
27
  "vocab_size": 32000
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.38.1"
6
+ }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1147f81bd4893b47bd533ab912adc2acd841db4d6d6ad0a2fbbe9cde45a88400
3
- size 4943161664
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:472816b5d2e5305557cac6388019cef040d5fdfaf1be0ef379bbdf25a56a4da9
3
+ size 4943162240
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:56b6afc4aca0bf0ad7e9c06b487f14dce81c15449a39dcd8c35048a563cb0ea9
3
- size 4999818600
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f774df44e29de2d0cafc93aafb26259298dc1577de6eb108761b185f8bef9c9d
3
+ size 4999819232
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd2d82a346acc4ca2b3255f2dc7d51e7c06e06ae5beee51a24ed4bab1717f941
3
- size 4278371624
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d207361453f0496cd7d52cd9d8a369009cdeef3feae04ade3a147ce10465034
3
+ size 4540516256
model.safetensors.index.json CHANGED
@@ -1,297 +1,298 @@
1
  {
2
  "metadata": {
3
- "total_size": 14221320192
4
  },
5
  "weight_map": {
6
- "embed_tokens.weight": "model-00001-of-00003.safetensors",
7
- "layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
8
- "layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
9
- "layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
10
- "layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
11
- "layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
12
- "layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
13
- "layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
14
- "layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
15
- "layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
16
- "layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
17
- "layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
18
- "layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
19
- "layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
20
- "layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
21
- "layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
22
- "layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
23
- "layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
24
- "layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
25
- "layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
26
- "layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
27
- "layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
28
- "layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
29
- "layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
30
- "layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
31
- "layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
32
- "layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
33
- "layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
34
- "layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
35
- "layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
36
- "layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
37
- "layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
38
- "layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
39
- "layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
40
- "layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
41
- "layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
42
- "layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
43
- "layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
44
- "layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
45
- "layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
46
- "layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
47
- "layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
48
- "layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
49
- "layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
50
- "layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
51
- "layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
52
- "layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
53
- "layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
54
- "layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
55
- "layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
56
- "layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
57
- "layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
58
- "layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
59
- "layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
60
- "layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
61
- "layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
62
- "layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
63
- "layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
64
- "layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
65
- "layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
66
- "layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
67
- "layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
68
- "layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
69
- "layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
70
- "layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
71
- "layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
72
- "layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
73
- "layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
74
- "layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
75
- "layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
76
- "layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
77
- "layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
78
- "layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
79
- "layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
80
- "layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
81
- "layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
82
- "layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
83
- "layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
84
- "layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
85
- "layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
86
- "layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
87
- "layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
88
- "layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
89
- "layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
90
- "layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
91
- "layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
92
- "layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
93
- "layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
94
- "layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
95
- "layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
96
- "layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
97
- "layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
98
- "layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
99
- "layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
100
- "layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
101
- "layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
102
- "layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
103
- "layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
104
- "layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
105
- "layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
106
- "layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
107
- "layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
108
- "layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
109
- "layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
110
- "layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
111
- "layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
112
- "layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
113
- "layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
114
- "layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
115
- "layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
116
- "layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
117
- "layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
118
- "layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
119
- "layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
120
- "layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
121
- "layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
122
- "layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
123
- "layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
124
- "layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
125
- "layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
126
- "layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
127
- "layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
128
- "layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
129
- "layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
130
- "layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
131
- "layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
132
- "layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
133
- "layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
134
- "layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
135
- "layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
136
- "layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
137
- "layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
138
- "layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
139
- "layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
140
- "layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
141
- "layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
142
- "layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
143
- "layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
144
- "layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
145
- "layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
146
- "layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
147
- "layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
148
- "layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
149
- "layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
150
- "layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
151
- "layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
152
- "layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
153
- "layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
154
- "layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
155
- "layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
156
- "layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
157
- "layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
158
- "layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
159
- "layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
160
- "layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
161
- "layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
162
- "layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
163
- "layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
164
- "layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
165
- "layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
166
- "layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
167
- "layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
168
- "layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
169
- "layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
170
- "layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
171
- "layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
172
- "layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
173
- "layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
174
- "layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
175
- "layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
176
- "layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
177
- "layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
178
- "layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
179
- "layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
180
- "layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
181
- "layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
182
- "layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
183
- "layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
184
- "layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
185
- "layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
186
- "layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
187
- "layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
188
- "layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
189
- "layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
190
- "layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
191
- "layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
192
- "layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
193
- "layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
194
- "layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
195
- "layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
196
- "layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
197
- "layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
198
- "layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
199
- "layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
200
- "layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
201
- "layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
202
- "layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
203
- "layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
204
- "layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
205
- "layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
206
- "layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
207
- "layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
208
- "layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
209
- "layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
210
- "layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
211
- "layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
212
- "layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
213
- "layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
214
- "layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
215
- "layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
216
- "layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
217
- "layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
218
- "layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
219
- "layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
220
- "layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
221
- "layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
222
- "layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
223
- "layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
224
- "layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
225
- "layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
226
- "layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
227
- "layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
228
- "layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
229
- "layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
230
- "layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
231
- "layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
232
- "layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
233
- "layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
234
- "layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
235
- "layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
236
- "layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
237
- "layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
238
- "layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
239
- "layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
240
- "layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
241
- "layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
242
- "layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
243
- "layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
244
- "layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
245
- "layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
246
- "layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
247
- "layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
248
- "layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
249
- "layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
250
- "layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
251
- "layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
252
- "layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
253
- "layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
254
- "layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
255
- "layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
256
- "layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
257
- "layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
258
- "layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
259
- "layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
260
- "layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
261
- "layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
262
- "layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
263
- "layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
264
- "layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
265
- "layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
266
- "layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
267
- "layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
268
- "layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
269
- "layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
270
- "layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
271
- "layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
272
- "layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
273
- "layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
274
- "layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
275
- "layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
276
- "layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
277
- "layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
278
- "layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
279
- "layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
280
- "layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
281
- "layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
282
- "layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
283
- "layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
284
- "layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
285
- "layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
286
- "layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
287
- "layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
288
- "layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
289
- "layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
290
- "layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
291
- "layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
292
- "layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
293
- "layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
294
- "layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
295
- "norm.weight": "model-00003-of-00003.safetensors"
 
296
  }
297
  }
 
1
  {
2
  "metadata": {
3
+ "total_size": 14483464192
4
  },
5
  "weight_map": {
6
+ "lm_head.weight": "model-00003-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.norm.weight": "model-00003-of-00003.safetensors"
297
  }
298
  }