Add SetFit model

Browse files

Files changed (13) hide show

1_Pooling/config.json +9 -0
README.md +225 -0
config.json +65 -0
config_sentence_transformers.json +7 -0
config_setfit.json +9 -0
model.safetensors +3 -0
model_head.pkl +3 -0
modules.json +14 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +73 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+ "word_embedding_dimension": 768,
+ "pooling_mode_cls_token": false,
+ "pooling_mode_mean_tokens": true,
+ "pooling_mode_max_tokens": false,
+ "pooling_mode_mean_sqrt_len_tokens": false,
+ "pooling_mode_weightedmean_tokens": false,
+ "pooling_mode_lasttoken": false
+}

README.md ADDED Viewed

	@@ -0,0 +1,225 @@

+---
+library_name: setfit
+tags:
+- setfit
+- sentence-transformers
+- text-classification
+- generated_from_setfit_trainer
+metrics:
+- accuracy
+widget:
+- text: Specific information applicable to Parties, including regional economic integration
+ organizations and their member States, that have reached an agreement to act jointly
+ under Article 4, paragraph 2, of the Paris Agreement, including the Parties that
+ agreed to act jointly and the terms of the agreement, in accordance with Article
+ 4, paragraphs 16–18, of the Paris Agreement. Not applicable. (c). How the Party’s
+ preparation of its nationally determined contribution has been informed by the
+ outcomes of the global stocktake, in accordance with Article 4, paragraph 9, of
+ the Paris Agreement.
+- text: 'In the shipping and aviation sectors, emission reduction efforts will be
+ focused on distributing eco-friendly ships and enhancing the operational efficiency
+ of aircraft. Agriculture, livestock farming and fisheries: The Republic Korea
+ is introducing various options to accelerate low-carbon farming, for instance,
+ improving irrigation techniques in rice paddies and adopting low-input systems
+ for nitrogen fertilizers.'
+- text: As part of this commitment, Oman s upstream oil and gas industry is developing
+ economically viable solutions to phase out routine flaring as quickly as possible
+ and ahead of the World Bank s target date. IV. Climate Preparedness and Resilience.
+ The Sultanate of Oman has stepped up its efforts in advancing its expertise and
+ methodologies to better manage the climate change risks over the past five years.
+ The adaptation efforts are underway, and the status of adaptation planning is
+ still at a nascent stage.
+- text: 'Synergy and coherence 46 VII- Gender and youth 46 VIII- Education and employment
+ 48 ANNEXES. 49 Annex No. 1: Details of mitigation measures, conditional and non-conditional,
+ by sector 49 Annex No.2: List of adaptation actions proposed by sectors. 57 Annex
+ No.3: GCF project portfolio. 63 CONTRIBUTION DENTERMINEE AT NATIONAL LEVEL CDN
+ MAURITANIE LIST OF TABLES Table 1: Summary of funding needs for the CND 2021-2030
+ updated. 12 Table 2: CND 2021-2030 mitigation measures updated by sector (cumulative
+ cost and reduction potential for the period). 14 Table 3: CND 2021-2030 adaptation
+ measures updated by sector. Error!'
+- text: In the transport sector, restructuing is planned through a number of large
+ infrastructure initiatives aiming to revive the role of public transport and achieving
+ a relevant share of fuel efficient vehicles. Under both the conditional and unconditional
+ mitigation scenarios, Lebanon will achieve sizeable emission reductions. With
+ regards to adaptation, Lebanon has planned comprehensive sectoral actions related
+ to water, agriculture/forestry and biodiversity, for example related to irrigation,
+ forest management, etc. It also continues developing adaptation strategies in
+ the remaining sectors.
+pipeline_tag: text-classification
+inference: false
+co2_eq_emissions:
+ emissions: 25.8151164022705
+ source: codecarbon
+ training_type: fine-tuning
+ on_cloud: false
+ cpu_model: Intel(R) Xeon(R) CPU @ 2.00GHz
+ ram_total_size: 12.674781799316406
+ hours_used: 0.622
+ hardware_used: 1 x Tesla T4
+base_model: ppsingh/SECTOR-multilabel-mpnet_w
+---
+# SetFit with ppsingh/SECTOR-multilabel-mpnet_w
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [ppsingh/SECTOR-multilabel-mpnet_w](https://huggingface.co/ppsingh/SECTOR-multilabel-mpnet_w) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
+## Model Details
+### Model Description
+- **Model Type:** SetFit
+- **Sentence Transformer body:** [ppsingh/SECTOR-multilabel-mpnet_w](https://huggingface.co/ppsingh/SECTOR-multilabel-mpnet_w)
+- **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 4 classes
+<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
+- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
+- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+## Uses
+### Direct Use for Inference
+First install the SetFit library:
+```bash
+pip install setfit
+```
+Then you can load this model and run inference.
+```python
+from setfit import SetFitModel
+# Download from the 🤗 Hub
+model = SetFitModel.from_pretrained("ppsingh/iki_sector_setfit")
+# Run inference
+preds = model("In the shipping and aviation sectors, emission reduction efforts will be focused on distributing eco-friendly ships and enhancing the operational efficiency of aircraft. Agriculture, livestock farming and fisheries: The Republic Korea is introducing various options to accelerate low-carbon farming, for instance, improving irrigation techniques in rice paddies and adopting low-input systems for nitrogen fertilizers.")
+```
+<!--
+### Downstream Use
+*List how someone could finetune this model on their own dataset.*
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Set Metrics
+| Training set | Min | Median | Max |
+|:-------------|:----|:-------|:----|
+| Word count | 35 | 76.164 | 170 |
+### Training Hyperparameters
+- batch_size: (16, 2)
+- num_epochs: (1, 0)
+- max_steps: -1
+- sampling_strategy: oversampling
+- body_learning_rate: (2e-05, 1e-05)
+- head_learning_rate: 0.01
+- loss: CosineSimilarityLoss
+- distance_metric: cosine_distance
+- margin: 0.25
+- end_to_end: False
+- use_amp: False
+- warmup_proportion: 0.01
+- seed: 42
+- eval_max_steps: -1
+- load_best_model_at_end: False
+### Training Results
+| Epoch | Step | Training Loss | Validation Loss |
+|:------:|:----:|:-------------:|:---------------:|
+| 0.0005 | 1 | 0.2029 | - |
+| 0.0993 | 200 | 0.0111 | 0.1124 |
+| 0.1985 | 400 | 0.0063 | 0.111 |
+| 0.2978 | 600 | 0.0183 | 0.1214 |
+| 0.3970 | 800 | 0.0197 | 0.1248 |
+| 0.4963 | 1000 | 0.0387 | 0.1339 |
+| 0.5955 | 1200 | 0.0026 | 0.1181 |
+| 0.6948 | 1400 | 0.0378 | 0.1208 |
+| 0.7940 | 1600 | 0.0285 | 0.1267 |
+| 0.8933 | 1800 | 0.0129 | 0.1254 |
+| 0.9926 | 2000 | 0.0341 | 0.1271 |
+### Environmental Impact
+Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
+- **Carbon Emitted**: 0.026 kg of CO2
+- **Hours Used**: 0.622 hours
+### Training Hardware
+- **On Cloud**: No
+- **GPU Model**: 1 x Tesla T4
+- **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
+- **RAM Size**: 12.67 GB
+### Framework Versions
+- Python: 3.10.12
+- SetFit: 1.0.3
+- Sentence Transformers: 2.3.1
+- Transformers: 4.35.2
+- PyTorch: 2.1.0+cu121
+- Datasets: 2.3.0
+- Tokenizers: 0.15.1
+## Citation
+### BibTeX
+```bibtex
+@article{https://doi.org/10.48550/arxiv.2209.11055,
+ doi = {10.48550/ARXIV.2209.11055},
+ url = {https://arxiv.org/abs/2209.11055},
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+ title = {Efficient Few-Shot Learning Without Prompts},
+ publisher = {arXiv},
+ year = {2022},
+ copyright = {Creative Commons Attribution 4.0 International}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+ "_name_or_path": "ppsingh/SECTOR-multilabel-mpnet_w",
+ "architectures": [
+ "MPNetModel"
+ ],
+ "attention_probs_dropout_prob": 0.1,
+ "bos_token_id": 0,
+ "eos_token_id": 2,
+ "hidden_act": "gelu",
+ "hidden_dropout_prob": 0.1,
+ "hidden_size": 768,
+ "id2label": {
+ "0": "Agriculture",
+ "1": "Buildings",
+ "2": "Coastal Zone",
+ "3": "Cross-Cutting Area",
+ "4": "Disaster Risk Management (DRM)",
+ "5": "Economy-wide",
+ "6": "Education",
+ "7": "Energy",
+ "8": "Environment",
+ "9": "Health",
+ "10": "Industries",
+ "11": "LULUCF/Forestry",
+ "12": "Social Development",
+ "13": "Tourism",
+ "14": "Transport",
+ "15": "Urban",
+ "16": "Waste",
+ "17": "Water"
+ },
+ "initializer_range": 0.02,
+ "intermediate_size": 3072,
+ "label2id": {
+ "Agriculture": 0,
+ "Buildings": 1,
+ "Coastal Zone": 2,
+ "Cross-Cutting Area": 3,
+ "Disaster Risk Management (DRM)": 4,
+ "Economy-wide": 5,
+ "Education": 6,
+ "Energy": 7,
+ "Environment": 8,
+ "Health": 9,
+ "Industries": 10,
+ "LULUCF/Forestry": 11,
+ "Social Development": 12,
+ "Tourism": 13,
+ "Transport": 14,
+ "Urban": 15,
+ "Waste": 16,
+ "Water": 17
+ },
+ "layer_norm_eps": 1e-05,
+ "max_position_embeddings": 514,
+ "model_type": "mpnet",
+ "num_attention_heads": 12,
+ "num_hidden_layers": 12,
+ "pad_token_id": 1,
+ "problem_type": "multi_label_classification",
+ "relative_attention_num_buckets": 32,
+ "torch_dtype": "float32",
+ "transformers_version": "4.35.2",
+ "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+ "__version__": {
+ "sentence_transformers": "2.3.1",
+ "transformers": "4.35.2",
+ "pytorch": "2.1.0+cu121"
+ }
+}

config_setfit.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+ "normalize_embeddings": true,
+ "labels": [
+ "Economy-wide",
+ "Energy",
+ "Other Sector",
+ "Transport"
+ ]
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c5f137bb2e1e7da1eebf8b21d4b5878675c41b4240614b3e4bccb248029eb52c
+size 437967672

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88559af6420967265b7519c9b35c1a3efa8a7ef3ee3c4b40d3f5f3225ffab36b
+size 13858

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+ {
+ "idx": 0,
+ "name": "0",
+ "path": "",
+ "type": "sentence_transformers.models.Transformer"
+ },
+ {
+ "idx": 1,
+ "name": "1",
+ "path": "1_Pooling",
+ "type": "sentence_transformers.models.Pooling"
+ }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+ "max_seq_length": 512,
+ "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+ "bos_token": {
+ "content": "<s>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "cls_token": {
+ "content": "<s>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "eos_token": {
+ "content": "</s>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "mask_token": {
+ "content": "<mask>",
+ "lstrip": true,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "pad_token": {
+ "content": "<pad>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "sep_token": {
+ "content": "</s>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "unk_token": {
+ "content": "[UNK]",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+ "added_tokens_decoder": {
+ "0": {
+ "content": "<s>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "1": {
+ "content": "<pad>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "2": {
+ "content": "</s>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "3": {
+ "content": "<unk>",
+ "lstrip": false,
+ "normalized": true,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "104": {
+ "content": "[UNK]",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "30526": {
+ "content": "<mask>",
+ "lstrip": true,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ }
+ },
+ "bos_token": "<s>",
+ "clean_up_tokenization_spaces": true,
+ "cls_token": "<s>",
+ "do_lower_case": true,
+ "eos_token": "</s>",
+ "mask_token": "<mask>",
+ "max_length": 128,
+ "model_max_length": 512,
+ "pad_to_multiple_of": null,
+ "pad_token": "<pad>",
+ "pad_token_type_id": 0,
+ "padding_side": "right",
+ "problem_type": "multi_label_classification",
+ "sep_token": "</s>",
+ "stride": 0,
+ "strip_accents": null,
+ "tokenize_chinese_chars": true,
+ "tokenizer_class": "MPNetTokenizer",
+ "truncation_side": "right",
+ "truncation_strategy": "longest_first",
+ "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff