julianrisch commited on
Commit
e5f3d13
1 Parent(s): a17f728

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -56
README.md CHANGED
@@ -34,7 +34,7 @@ model-index:
34
  **Downstream-task:** Extractive QA
35
  **Training data:** SQuAD 2.0
36
  **Eval data:** SQuAD 2.0 dev set - German MLQA - German XQuAD
37
- **Code:** See [example](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering.py) in [FARM](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering.py)
38
  **Infrastructure**: 4x Tesla v100
39
 
40
  ## Hyperparameters
@@ -50,31 +50,34 @@ learning_rate=2e-5,
50
  Corresponding experiment logs in mlflow: [link](https://public-mlflow.deepset.ai/#/experiments/2/runs/b25ec75e07614accb3f1ce03d43dbe08)
51
 
52
 
53
- ## Performance
54
- Evaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).
55
- ```
56
- "exact": 73.91560683904657
57
- "f1": 77.14103746689592
58
- ```
59
 
60
- Evaluated on German MLQA: test-context-de-question-de.json
61
- "exact": 33.67279167589108
62
- "f1": 44.34437105434842
63
- "total": 4517
 
64
 
65
- Evaluated on German XQuAD: xquad.de.json
66
- "exact": 48.739495798319325
67
- "f1": 62.552615701071495
68
- "total": 1190
69
 
 
 
 
 
70
 
71
- ## Usage
 
 
 
 
 
 
 
72
 
73
  ### In Transformers
74
  ```python
75
- from transformers.pipelines import pipeline
76
- from transformers.modeling_auto import AutoModelForQuestionAnswering
77
- from transformers.tokenization_auto import AutoTokenizer
78
 
79
  model_name = "deepset/xlm-roberta-base-squad2"
80
 
@@ -91,34 +94,22 @@ model = AutoModelForQuestionAnswering.from_pretrained(model_name)
91
  tokenizer = AutoTokenizer.from_pretrained(model_name)
92
  ```
93
 
94
- ### In FARM
95
-
96
- ```python
97
- from farm.modeling.adaptive_model import AdaptiveModel
98
- from farm.modeling.tokenization import Tokenizer
99
- from farm.infer import Inferencer
100
-
101
- model_name = "deepset/xlm-roberta-base-squad2"
102
-
103
- # a) Get predictions
104
- nlp = Inferencer.load(model_name, task_type="question_answering")
105
- QA_input = [{"questions": ["Why is model conversion important?"],
106
- "text": "The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks."}]
107
- res = nlp.inference_from_dicts(dicts=QA_input, rest_api_schema=True)
108
-
109
- # b) Load model & tokenizer
110
- model = AdaptiveModel.convert_from_transformers(model_name, device="cpu", task_type="question_answering")
111
- tokenizer = Tokenizer.load(model_name)
112
  ```
113
-
114
- ### In haystack
115
- For doing QA at scale (i.e. many docs instead of single paragraph), you can load the model also in [haystack](https://github.com/deepset-ai/haystack/):
116
- ```python
117
- reader = FARMReader(model_name_or_path="deepset/xlm-roberta-base-squad2")
118
- # or
119
- reader = TransformersReader(model="deepset/roberta-base-squad2",tokenizer="deepset/xlm-roberta-base-squad2")
120
  ```
121
 
 
 
 
 
 
 
 
 
 
122
 
123
  ## Authors
124
  Branden Chan: `branden.chan [at] deepset.ai`
@@ -127,18 +118,29 @@ Malte Pietsch: `malte.pietsch [at] deepset.ai`
127
  Tanay Soni: `tanay.soni [at] deepset.ai`
128
 
129
  ## About us
130
- ![deepset logo](https://workablehr.s3.amazonaws.com/uploads/account/logo/476306/logo)
131
 
132
- We bring NLP to the industry via open source!
133
- Our focus: Industry specific language models & large scale QA systems.
134
-
135
- Some of our work:
136
- - [German BERT (aka "bert-base-german-cased")](https://deepset.ai/german-bert)
137
- - [GermanQuAD and GermanDPR datasets and models (aka "gelectra-base-germanquad", "gbert-base-germandpr")](https://deepset.ai/germanquad)
138
- - [FARM](https://github.com/deepset-ai/FARM)
139
- - [Haystack](https://github.com/deepset-ai/haystack/)
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
- Get in touch:
142
- [Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)
143
 
144
- By the way: [we're hiring!](http://www.deepset.ai/jobs)
 
34
  **Downstream-task:** Extractive QA
35
  **Training data:** SQuAD 2.0
36
  **Eval data:** SQuAD 2.0 dev set - German MLQA - German XQuAD
37
+ **Code:** See [an example extractive QA pipeline built with Haystack](https://haystack.deepset.ai/tutorials/34_extractive_qa_pipeline)
38
  **Infrastructure**: 4x Tesla v100
39
 
40
  ## Hyperparameters
 
50
  Corresponding experiment logs in mlflow: [link](https://public-mlflow.deepset.ai/#/experiments/2/runs/b25ec75e07614accb3f1ce03d43dbe08)
51
 
52
 
53
+ ## Usage
 
 
 
 
 
54
 
55
+ ### In Haystack
56
+ Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents.
57
+ To load and run the model with [Haystack](https://github.com/deepset-ai/haystack/):
58
+ ```python
59
+ # After running pip install haystack-ai "transformers[torch,sentencepiece]"
60
 
61
+ from haystack import Document
62
+ from haystack.components.readers import ExtractiveReader
 
 
63
 
64
+ docs = [
65
+ Document(content="Python is a popular programming language"),
66
+ Document(content="python ist eine beliebte Programmiersprache"),
67
+ ]
68
 
69
+ reader = ExtractiveReader(model="deepset/xlm-roberta-base-squad2")
70
+ reader.warm_up()
71
+
72
+ question = "What is a popular programming language?"
73
+ result = reader.run(query=question, documents=docs)
74
+ # {'answers': [ExtractedAnswer(query='What is a popular programming language?', score=0.5740374326705933, data='python', document=Document(id=..., content: '...'), context=None, document_offset=ExtractedAnswer.Span(start=0, end=6),...)]}
75
+ ```
76
+ For a complete example with an extractive question answering pipeline that scales over many documents, check out the [corresponding Haystack tutorial](https://haystack.deepset.ai/tutorials/34_extractive_qa_pipeline).
77
 
78
  ### In Transformers
79
  ```python
80
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
 
 
81
 
82
  model_name = "deepset/xlm-roberta-base-squad2"
83
 
 
94
  tokenizer = AutoTokenizer.from_pretrained(model_name)
95
  ```
96
 
97
+ ## Performance
98
+ Evaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ```
100
+ "exact": 73.91560683904657
101
+ "f1": 77.14103746689592
 
 
 
 
 
102
  ```
103
 
104
+ Evaluated on German MLQA: test-context-de-question-de.json
105
+ "exact": 33.67279167589108
106
+ "f1": 44.34437105434842
107
+ "total": 4517
108
+
109
+ Evaluated on German XQuAD: xquad.de.json
110
+ "exact": 48.739495798319325
111
+ "f1": 62.552615701071495
112
+ "total": 1190
113
 
114
  ## Authors
115
  Branden Chan: `branden.chan [at] deepset.ai`
 
118
  Tanay Soni: `tanay.soni [at] deepset.ai`
119
 
120
  ## About us
 
121
 
122
+ <div class="grid lg:grid-cols-2 gap-x-4 gap-y-3">
123
+ <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
124
+ <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png" class="w-40"/>
125
+ </div>
126
+ <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
127
+ <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/haystack-logo-colored.png" class="w-40"/>
128
+ </div>
129
+ </div>
130
+
131
+ [deepset](http://deepset.ai/) is the company behind the production-ready open-source AI framework [Haystack](https://haystack.deepset.ai/).
132
+
133
+ Some of our other work:
134
+ - [Distilled roberta-base-squad2 (aka "tinyroberta-squad2")](https://huggingface.co/deepset/tinyroberta-squad2)
135
+ - [German BERT](https://deepset.ai/german-bert), [GermanQuAD and GermanDPR](https://deepset.ai/germanquad), [German embedding model](https://huggingface.co/mixedbread-ai/deepset-mxbai-embed-de-large-v1)
136
+ - [deepset Cloud](https://www.deepset.ai/deepset-cloud-product), [deepset Studio](https://www.deepset.ai/deepset-studio)
137
+
138
+ ## Get in touch and join the Haystack community
139
+
140
+ <p>For more info on Haystack, visit our <strong><a href="https://github.com/deepset-ai/haystack">GitHub</a></strong> repo and <strong><a href="https://docs.haystack.deepset.ai">Documentation</a></strong>.
141
+
142
+ We also have a <strong><a class="h-7" href="https://haystack.deepset.ai/community">Discord community open to everyone!</a></strong></p>
143
 
144
+ [Twitter](https://twitter.com/Haystack_AI) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://haystack.deepset.ai/) | [YouTube](https://www.youtube.com/@deepset_ai)
 
145
 
146
+ By the way: [we're hiring!](http://www.deepset.ai/jobs)