Edit model card
YAML Metadata Error: "datasets[2]" with value "MIT Movie" is not valid. If possible, use a dataset id from https://huggingface.co/datasets.
YAML Metadata Error: "language[0]" must only contain lowercase characters
YAML Metadata Error: "language[0]" with value "English" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.

roberta-base + DAPT + Task Transfer for Domain-Specific QA

Objective: This is Roberta Base with Domain Adaptive Pretraining on Movie Corpora --> Then trained for the NER task using MIT Movie Dataset --> Then a changed head to do the SQuAD Task. This makes a QA model capable of answering questions in the movie domain, with additional information coming from a different task (NER - Task Transfer).
https://huggingface.co/thatdramebaazguy/movie-roberta-base was used as the MovieRoberta.

model_name = "thatdramebaazguy/movie-roberta-MITmovie-squad"
pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="question-answering")

Overview

Language model: roberta-base
Language: English
Downstream-task: NER --> QA
Training data: imdb, polarity movie data, cornell_movie_dialogue, 25mlens movie names, MIT Movie, SQuADv1
Eval data: MoviesQA (From https://github.com/ibm-aur-nlp/domain-specific-QA)
Infrastructure: 4x Tesla v100
Code: See example

Hyperparameters

Num examples = 88567  
Num Epochs = 3  
Instantaneous batch size per device = 32  
Total train batch size (w. parallel, distributed & accumulation) = 128  

Performance

Eval on SQuADv1

  • eval_samples = 10790
  • exact_match = 83.0274
  • f1 = 90.1615

Eval on MoviesQA

  • eval_samples = 5032
  • exact_match = 51.64944
  • f1 = 65.53983

Github Repo:


Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.