davidadamczyk commited on
Commit
b77082e
1 Parent(s): 5c593fa

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/all-mpnet-base-v2
3
+ library_name: setfit
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - setfit
9
+ - sentence-transformers
10
+ - text-classification
11
+ - generated_from_setfit_trainer
12
+ widget:
13
+ - text: 'John Ondespot Help me out. So Yellen has to tell the President that they
14
+ cannot afford to pay bondholders in the favour of US civil servants and military
15
+ and homeless to keep society rolling and let the big banks hold out for money
16
+ down the line? To float the entire USA financial system from collapse but also
17
+ from societal rioting on Capitol Hill? I am getting this? Cause the more I read
18
+ this is quite a debt watched by the major credit leaders of the US commercial
19
+ and credit banking system?
20
+
21
+ '
22
+ - text: 'Independent I disagree that, in your words, Lula "is the biggest thief in
23
+ Brazil''s history." The excellent Guardian article you cite requires a careful
24
+ reading to the end. To me, it seems like the Brazilian parliamentary system practically
25
+ encourages corruption and has been rife with corruption in most administrations. Lula
26
+ too fell into corruption to gain political support to enact his social reforms
27
+ when faced with a minority in Congress. (This reminds me of the leftist Peruvian
28
+ president who tried to dissolve the conservative dominated Congress that block
29
+ any of his reforms.) Lula resorted to bribes to get support from minority parties.
30
+ From the Guardian article: "Although illegal, this allowed the Workers’ Party
31
+ to get things done. Lula’s first term delivered impressive progress on alleviating
32
+ poverty, social spending and environmental controls."At the same time, "it was
33
+ the Workers’ Party that had put in place the judicial reforms that allowed the
34
+ investigation to go ahead. There would have been no Car Wash if the government
35
+ had not appointed, in September 2013, an independent attorney general."So maybe
36
+ Lula will prove to be a better president today.
37
+
38
+ '
39
+ - text: 'The reality is that in Brazil the level of corruption has exceeded all limits,
40
+ our system is similar to the American one, but imagine that a former president
41
+ convicted of corruption in which he should have served a sentence of 9 years in
42
+ 2018 was released for cheating by the judiciary and could still run for office
43
+ (which is illegal under our constitution).Lula is not just a communist, he is
44
+ the "kingpin" these protests are a sample of the desperation of people who fear
45
+ for their freedom and integrity.
46
+
47
+ '
48
+ - text: 'The ‘Trump of the Tropics’ Goes Bust The definitive challenge for Luiz Inácio
49
+ Lula da Silva: to be president for all the people. SÃO PAULO, Brazil — As a shocked
50
+ nation watched live on television and social media, thousands of radical supporters
51
+ of a defeated president marched on the seat of the federal government, convinced
52
+ that an election had been stolen. The mob ransacked the Congress, the Supreme
53
+ Court and the presidential palace. It took the authorities several hours to arrest
54
+ hundreds of people and finally restore order. The definitive challenge for Luiz
55
+ Inácio Lula da Silva: to be president for all the people.
56
+
57
+ '
58
+ - text: 'Friends,Speaker McCarthy and Representative Taylor Greene aren''t the problems---WE
59
+ ARE!!!! And, by we, I mean the people who registered and voted for them. These
60
+ clowns aren''t in the House of Representatives by osmosis, our fellow citizens
61
+ voted them into office. Obviously, some Americans want the US to be run this way.
62
+ But if you don''t, you can do something about it. Find out who''s going to be
63
+ running for office in your area (county, city, state, federal) and start asking
64
+ them questions? Are they running to represent you or someone else? Go ahead and
65
+ ask them personal questions, tell them you read about it on "deepfake" website.
66
+ But more importantly, don''t complain online. You can do something to stop them.
67
+ It''s a simple 4 step process: 1) Clean out your ears! 2) Support the people you
68
+ think will actually help you. 3) Register and 4) Vote. Yes, vote. Vote it like
69
+ my life depends on it because it does!
70
+
71
+ '
72
+ inference: true
73
+ model-index:
74
+ - name: SetFit with sentence-transformers/all-mpnet-base-v2
75
+ results:
76
+ - task:
77
+ type: text-classification
78
+ name: Text Classification
79
+ dataset:
80
+ name: Unknown
81
+ type: unknown
82
+ split: test
83
+ metrics:
84
+ - type: accuracy
85
+ value: 1.0
86
+ name: Accuracy
87
+ ---
88
+
89
+ # SetFit with sentence-transformers/all-mpnet-base-v2
90
+
91
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
92
+
93
+ The model has been trained using an efficient few-shot learning technique that involves:
94
+
95
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
96
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
97
+
98
+ ## Model Details
99
+
100
+ ### Model Description
101
+ - **Model Type:** SetFit
102
+ - **Sentence Transformer body:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
103
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
104
+ - **Maximum Sequence Length:** 384 tokens
105
+ - **Number of Classes:** 2 classes
106
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
107
+ <!-- - **Language:** Unknown -->
108
+ <!-- - **License:** Unknown -->
109
+
110
+ ### Model Sources
111
+
112
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
113
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
114
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
115
+
116
+ ### Model Labels
117
+ | Label | Examples |
118
+ |:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
119
+ | yes | <ul><li>"NYT.1/1/2023. As Lula Becomes Brazil's President, Bolsonaro Flees to Florida.Kudos to the NYT journalism for a first-rate article about the chaotic and surrealistic end of the ex-military president Bolsonaro's administration. Among his many policy mistakes, some described as of criminal nature, the death of his political career was to escape the country before passing the presidential sash to President Lula. Bolsonaro is lucky to be a politician and no longer a military man. For an army officer to flee from a combat theater leaving behind his comrades, is a court martial offense. One thing is for sure. He destroyed any hope of the Brazilian military to one day return to power. Moreover, President Lula's success or failure depends on how his administration deals with the economy rather than on political opposition from Bolsonaro that from Orlando or Rio de Janeiro will fade away.\n"</li><li>'A few days ago I listened to an interview with the left-of-center new President of Brazil, Luiz Inácio Lula da Silva. He said education, health care and food for poor people aren’t cost, but investments.How I wish American legislatures would think like him.\n'</li><li>'After the dictatorship there was a blanket pardon. No military men was ever prosecuted for the assassinations, torture, rapes committed in the name of the government. Lula said he will be the president for all Brazilians, including the ones who did not vote for him. He said it was time to reach out in the families and end divisions. But he said he will prosecute crimes of the previous administration. He is correct. Brazil lost (proportionally) more people than any other country to COVID. A country thst has been a leader and an example in mass vaccinations. The hundreds of thousands who died did not need to die. And they should not be hidden under the carpet as if nothing happened.\n'</li></ul> |
120
+ | no | <ul><li>'rivvir No, they didn\'t just want to "die in a war," they also didn\'t want to kill other people they have no reason to kill in some utterly immoral war...that\'s a far cry from the "same danger" as being "poor and desperate."Also, while the journey north has it perils for sure, have a look at the Rio Grande in a southern climate, then look at the Bearing Sea in fall weather!\n'</li><li>'"Spectacle produced fame, which produced power, which produced influence and possibly control." Yes, indeed. And since the Republicans have nothing to sell BUT spectacle -- because "more tax breaks for the wealthy" somehow doesn\'t get sufficient votes from the hoi polloi -- they kept offering it and the hoi polloi (or about a third of us) kept buying it, and now they\'re caught in their own trap. They created the monster that\'s taken control from them.\n'</li><li>"While undoubtedly all this is true, the recent layoffs are different than most. Because what we have is companies, some of the richest in the world, laying off many thousands of employees even though they continue to be profitable. So the ask of managers is difficult. It's not just look the person in the eye. It is: look the person in the eye and tell them that the company to which they'll loyally devoted many years of service has decided to make them unemployed, not out of necessity, not because the company is at risk, but so that some greedy shareholders can earn a few more pennies. They would be asking the manager to defend the indefensible. And if the manager doesn't agree with the lay-offs, it puts them in a very awkward position. Should they resign in disgust (and so one more person without a way to feed their family or pay their mortgage)? Or should they at least tell the employee they don't agree (but what consequences could this have for them if word gets back to their superiors)? Or should they pretend to agree that this appalling, cynical lay-off is somehow appropriate and just a measured, proportionate response to the fact that some activist shareholder only earned $3.2 billion this year? Somehow, while it is totally wrong, it also feels appropriate that these most cynical and inhumane of lay-offs be executed in the most cynical inhumane way.\n"</li></ul> |
121
+
122
+ ## Evaluation
123
+
124
+ ### Metrics
125
+ | Label | Accuracy |
126
+ |:--------|:---------|
127
+ | **all** | 1.0 |
128
+
129
+ ## Uses
130
+
131
+ ### Direct Use for Inference
132
+
133
+ First install the SetFit library:
134
+
135
+ ```bash
136
+ pip install setfit
137
+ ```
138
+
139
+ Then you can load this model and run inference.
140
+
141
+ ```python
142
+ from setfit import SetFitModel
143
+
144
+ # Download from the 🤗 Hub
145
+ model = SetFitModel.from_pretrained("davidadamczyk/setfit-model-7")
146
+ # Run inference
147
+ preds = model("John Ondespot Help me out. So Yellen has to tell the President that they cannot afford to pay bondholders in the favour of US civil servants and military and homeless to keep society rolling and let the big banks hold out for money down the line? To float the entire USA financial system from collapse but also from societal rioting on Capitol Hill? I am getting this? Cause the more I read this is quite a debt watched by the major credit leaders of the US commercial and credit banking system?
148
+ ")
149
+ ```
150
+
151
+ <!--
152
+ ### Downstream Use
153
+
154
+ *List how someone could finetune this model on their own dataset.*
155
+ -->
156
+
157
+ <!--
158
+ ### Out-of-Scope Use
159
+
160
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
161
+ -->
162
+
163
+ <!--
164
+ ## Bias, Risks and Limitations
165
+
166
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
167
+ -->
168
+
169
+ <!--
170
+ ### Recommendations
171
+
172
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
173
+ -->
174
+
175
+ ## Training Details
176
+
177
+ ### Training Set Metrics
178
+ | Training set | Min | Median | Max |
179
+ |:-------------|:----|:-------|:----|
180
+ | Word count | 23 | 107.2 | 272 |
181
+
182
+ | Label | Training Sample Count |
183
+ |:------|:----------------------|
184
+ | no | 18 |
185
+ | yes | 22 |
186
+
187
+ ### Training Hyperparameters
188
+ - batch_size: (16, 16)
189
+ - num_epochs: (1, 1)
190
+ - max_steps: -1
191
+ - sampling_strategy: oversampling
192
+ - num_iterations: 120
193
+ - body_learning_rate: (2e-05, 2e-05)
194
+ - head_learning_rate: 2e-05
195
+ - loss: CosineSimilarityLoss
196
+ - distance_metric: cosine_distance
197
+ - margin: 0.25
198
+ - end_to_end: False
199
+ - use_amp: False
200
+ - warmup_proportion: 0.1
201
+ - l2_weight: 0.01
202
+ - seed: 42
203
+ - eval_max_steps: -1
204
+ - load_best_model_at_end: False
205
+
206
+ ### Training Results
207
+ | Epoch | Step | Training Loss | Validation Loss |
208
+ |:------:|:----:|:-------------:|:---------------:|
209
+ | 0.0017 | 1 | 0.3073 | - |
210
+ | 0.0833 | 50 | 0.1154 | - |
211
+ | 0.1667 | 100 | 0.0012 | - |
212
+ | 0.25 | 150 | 0.0002 | - |
213
+ | 0.3333 | 200 | 0.0002 | - |
214
+ | 0.4167 | 250 | 0.0001 | - |
215
+ | 0.5 | 300 | 0.0001 | - |
216
+ | 0.5833 | 350 | 0.0001 | - |
217
+ | 0.6667 | 400 | 0.0001 | - |
218
+ | 0.75 | 450 | 0.0001 | - |
219
+ | 0.8333 | 500 | 0.0001 | - |
220
+ | 0.9167 | 550 | 0.0001 | - |
221
+ | 1.0 | 600 | 0.0001 | - |
222
+
223
+ ### Framework Versions
224
+ - Python: 3.10.13
225
+ - SetFit: 1.1.0
226
+ - Sentence Transformers: 3.0.1
227
+ - Transformers: 4.45.2
228
+ - PyTorch: 2.4.0+cu124
229
+ - Datasets: 2.21.0
230
+ - Tokenizers: 0.20.0
231
+
232
+ ## Citation
233
+
234
+ ### BibTeX
235
+ ```bibtex
236
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
237
+ doi = {10.48550/ARXIV.2209.11055},
238
+ url = {https://arxiv.org/abs/2209.11055},
239
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
240
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
241
+ title = {Efficient Few-Shot Learning Without Prompts},
242
+ publisher = {arXiv},
243
+ year = {2022},
244
+ copyright = {Creative Commons Attribution 4.0 International}
245
+ }
246
+ ```
247
+
248
+ <!--
249
+ ## Glossary
250
+
251
+ *Clearly define terms in order to be accessible across audiences.*
252
+ -->
253
+
254
+ <!--
255
+ ## Model Card Authors
256
+
257
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
258
+ -->
259
+
260
+ <!--
261
+ ## Model Card Contact
262
+
263
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
264
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.45.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.4.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": [
4
+ "no",
5
+ "yes"
6
+ ]
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e0f9688faea23948459ab90568a8d0cb74e9b4d09e476d04ff2a8edcfc8a9ad
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28f7610a9921eb077b641a5b463c4e841f7890bff0a61eafdea2888cc978851c
3
+ size 7023
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 384,
60
+ "pad_to_multiple_of": null,
61
+ "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
+ "sep_token": "</s>",
65
+ "stride": 0,
66
+ "strip_accents": null,
67
+ "tokenize_chinese_chars": true,
68
+ "tokenizer_class": "MPNetTokenizer",
69
+ "truncation_side": "right",
70
+ "truncation_strategy": "longest_first",
71
+ "unk_token": "[UNK]"
72
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff