d-matrix
/

gpt2

Text Generation

Model card Files Files and versions Community

gpt2 / README.md

d-matrix

Create README.md

93cadd1 verified 8 months ago

|

2.26 kB

	---
	license: apache-2.0
	datasets:
	- wikitext
	- ptb_text_only
	language:
	- en
	metrics:
	- perplexity
	pipeline_tag: text-generation
	model-index:
	- name: distilgpt2
	results:
	- task:
	type: text-generation
	dataset:
	name: penn_treebank
	type: ptb_text_only
	metrics:
	- name: perlexity@BASELINE
	type: dmx-perlexity
	value: 63.45857238769531
	- name: perlexity@FALLBACK
	type: dmx-perlexity
	value: 64.36720275878906
	- task:
	type: text-generation
	dataset:
	name: wikitext2
	type: wikitext-2-raw-v1
	metrics:
	- name: perlexity@BASELINE
	type: dmx-perlexity
	value: 46.05925369262695
	- name: perlexity@FALLBACK
	type: dmx-perlexity
	value: 46.570838928222656
	---
	This is a quantized version of [DistilGPT2](https://huggingface.co/distilbert/distilgpt2). We provide the following two quantization configurations:

	BASELINE: Everything in original format, equivalent to original model.

	FALLBACK: Quantized Linear and Conv1D layers to BFP16. Added approximation functions for Layer Norm, GELU and Softmax.

	### Usage Example

	Prerequisites:
	- Install dmx-mltools: "pip install dmx-mltools"
	- clone this repo. "cd" to the cloned repo.
	```python
	>>> import os
	>>> import torch
	>>> from mltools import dmx
	>>> from transformers import pipeline,AutoModelForCausalLM
	>>> import evaluate
	>>> from datasets import load_dataset

	# Get model
	>>> my_hf_token = os.environ.get("Dmatrix_HF_Token")
	>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

	>>> pipe = pipeline(
	>>> "text-generation",
	>>> model="d-matrix/distilgpt2",
	>>> device=device,
	>>> use_auth_token=my_hf_token,
	>>> )
	>>> pipe.model = dmx.Model(pipe.model,monkey_patched=False,hf=True,input_names=["input_ids", "labels"])

	# Configure quantization formats
	>>> pipe.model.transform('FALLBACK.yaml')

	# Evaluate
	>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
	>>> input_texts = load_dataset("ptb_text_only", "penn_treebank", split="test")["sentence"]
	>>> pipe.model.eval()
	>>> results = perplexity.compute(model=pipe.model.body,references=input_texts)
	>>> print(results)
	{'loss': 4.164604187011719, 'perplexity': 64.36720275878906}
	```