--- license: apache-2.0 datasets: - wikitext - ptb_text_only language: - en metrics: - perplexity pipeline_tag: text-generation model-index: - name: distilgpt2 results: - task: type: text-generation dataset: name: penn_treebank type: ptb_text_only metrics: - name: perlexity@BASELINE type: dmx-perlexity value: 63.45857238769531 - name: perlexity@FALLBACK type: dmx-perlexity value: 64.36720275878906 - task: type: text-generation dataset: name: wikitext2 type: wikitext-2-raw-v1 metrics: - name: perlexity@BASELINE type: dmx-perlexity value: 46.05925369262695 - name: perlexity@FALLBACK type: dmx-perlexity value: 46.570838928222656 --- This is a quantized version of [DistilGPT2](https://huggingface.co/distilbert/distilgpt2). We provide the following two quantization configurations: BASELINE: Everything in original format, equivalent to original model. FALLBACK: Quantized Linear and Conv1D layers to BFP16. Added approximation functions for Layer Norm, GELU and Softmax. ### Usage Example Prerequisites: - Install dmx-mltools: "pip install dmx-mltools" - clone this repo. "cd" to the cloned repo. ```python >>> import os >>> import torch >>> from mltools import dmx >>> from transformers import pipeline,AutoModelForCausalLM >>> import evaluate >>> from datasets import load_dataset # Get model >>> my_hf_token = os.environ.get("Dmatrix_HF_Token") >>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") >>> pipe = pipeline( >>> "text-generation", >>> model="d-matrix/distilgpt2", >>> device=device, >>> use_auth_token=my_hf_token, >>> ) >>> pipe.model = dmx.Model(pipe.model,monkey_patched=False,hf=True,input_names=["input_ids", "labels"]) # Configure quantization formats >>> pipe.model.transform('FALLBACK.yaml') # Evaluate >>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric") >>> input_texts = load_dataset("ptb_text_only", "penn_treebank", split="test")["sentence"] >>> pipe.model.eval() >>> results = perplexity.compute(model=pipe.model.body,references=input_texts) >>> print(results) {'loss': 4.164604187011719, 'perplexity': 64.36720275878906} ```