README.md · d-matrix/gpt2 at 7f21a5e722fb0a50ecc4d5a00a1575f9fa45dca2

metadata

license: apache-2.0
datasets:
  - wikitext
  - ptb_text_only
language:
  - en
metrics:
  - perplexity
pipeline_tag: text-generation
model-index:
  - name: distilgpt2
    results:
      - task:
          type: text-generation
        dataset:
          name: penn_treebank
          type: ptb_text_only
        metrics:
          - name: perlexity@BASELINE
            type: dmx-perlexity
            value: 63.45857238769531
          - name: perlexity@FALLBACK
            type: dmx-perlexity
            value: 64.36720275878906
      - task:
          type: text-generation
        dataset:
          name: wikitext2
          type: wikitext-2-raw-v1
        metrics:
          - name: perlexity@BASELINE
            type: dmx-perlexity
            value: 46.05925369262695
          - name: perlexity@FALLBACK
            type: dmx-perlexity
            value: 46.570838928222656

This is a quantized version of DistilGPT2. We provide the following two quantization configurations:

BASELINE: Everything in original format, equivalent to original model.

FALLBACK: Quantized Linear and Conv1D layers to BFP16. Added approximation functions for Layer Norm, GELU and Softmax.

Usage Example

Prerequisites:

Install dmx-mltools: "pip install dmx-mltools"
clone this repo. "cd" to the cloned repo.

>>> import os
>>> import torch
>>> from mltools import dmx
>>> from transformers import pipeline,AutoModelForCausalLM
>>> import evaluate
>>> from datasets import load_dataset

# Get model
>>> my_hf_token = os.environ.get("Dmatrix_HF_Token")
>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

>>> pipe = pipeline(
>>>     "text-generation",
>>>     model="d-matrix/distilgpt2",
>>>     device=device,
>>>     use_auth_token=my_hf_token,
>>> )
>>> pipe.model = dmx.Model(pipe.model,monkey_patched=False,hf=True,input_names=["input_ids", "labels"])

# Configure quantization formats
>>> pipe.model.transform('FALLBACK.yaml')

# Evaluate
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = load_dataset("ptb_text_only", "penn_treebank", split="test")["sentence"]
>>> pipe.model.eval()
>>> results = perplexity.compute(model=pipe.model.body,references=input_texts)
>>> print(results)
{'loss': 4.164604187011719, 'perplexity': 64.36720275878906}