update readme
Browse files
README.md
CHANGED
@@ -9,11 +9,11 @@ datasets:
|
|
9 |
inference: false
|
10 |
---
|
11 |
|
12 |
-
## English NER in Flair (Ontonotes
|
13 |
|
14 |
This is the 18-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
|
15 |
|
16 |
-
F1-Score: **89.
|
17 |
|
18 |
Predicts 18 tags:
|
19 |
|
@@ -51,7 +51,7 @@ from flair.data import Sentence
|
|
51 |
from flair.models import SequenceTagger
|
52 |
|
53 |
# load tagger
|
54 |
-
tagger = SequenceTagger.load("flair/ner-english")
|
55 |
|
56 |
# make example sentence
|
57 |
sentence = Sentence("On September 1st George Washington won 1 dollar.")
|
@@ -77,7 +77,7 @@ Span [4,5]: "George Washington" [− Labels: PERSON (0.9604)]
|
|
77 |
Span [7,8]: "1 dollar" [− Labels: MONEY (0.9837)]
|
78 |
```
|
79 |
|
80 |
-
So, the entities "*September 1st*" (labeled as a **date**), "*George Washington*" (labeled as a **person**) and "*1 dollar*" (labeled as a **money**) are found in the sentence "*George Washington
|
81 |
|
82 |
|
83 |
---
|
@@ -91,8 +91,12 @@ from flair.data import Corpus
|
|
91 |
from flair.datasets import CONLL_03
|
92 |
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
|
93 |
|
94 |
-
# 1.
|
95 |
-
corpus: Corpus =
|
|
|
|
|
|
|
|
|
96 |
|
97 |
# 2. what tag do we want to predict?
|
98 |
tag_type = 'ner'
|
@@ -104,7 +108,7 @@ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
|
|
104 |
embedding_types = [
|
105 |
|
106 |
# GloVe embeddings
|
107 |
-
WordEmbeddings('
|
108 |
|
109 |
# contextual string embeddings, forward
|
110 |
FlairEmbeddings('news-forward'),
|
@@ -130,7 +134,7 @@ from flair.trainers import ModelTrainer
|
|
130 |
trainer = ModelTrainer(tagger, corpus)
|
131 |
|
132 |
# 7. run training
|
133 |
-
trainer.train('resources/taggers/ner-english',
|
134 |
train_with_dev=True,
|
135 |
max_epochs=150)
|
136 |
```
|
|
|
9 |
inference: false
|
10 |
---
|
11 |
|
12 |
+
## English NER in Flair (Ontonotes default model)
|
13 |
|
14 |
This is the 18-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
|
15 |
|
16 |
+
F1-Score: **89.27** (Ontonotes)
|
17 |
|
18 |
Predicts 18 tags:
|
19 |
|
|
|
51 |
from flair.models import SequenceTagger
|
52 |
|
53 |
# load tagger
|
54 |
+
tagger = SequenceTagger.load("flair/ner-english-ontonotes")
|
55 |
|
56 |
# make example sentence
|
57 |
sentence = Sentence("On September 1st George Washington won 1 dollar.")
|
|
|
77 |
Span [7,8]: "1 dollar" [− Labels: MONEY (0.9837)]
|
78 |
```
|
79 |
|
80 |
+
So, the entities "*September 1st*" (labeled as a **date**), "*George Washington*" (labeled as a **person**) and "*1 dollar*" (labeled as a **money**) are found in the sentence "*On September 1st George Washington won 1 dollar*".
|
81 |
|
82 |
|
83 |
---
|
|
|
91 |
from flair.datasets import CONLL_03
|
92 |
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
|
93 |
|
94 |
+
# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
|
95 |
+
corpus: Corpus = ColumnCorpus(
|
96 |
+
"resources/tasks/onto-ner",
|
97 |
+
column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
|
98 |
+
tag_to_bioes="ner",
|
99 |
+
)
|
100 |
|
101 |
# 2. what tag do we want to predict?
|
102 |
tag_type = 'ner'
|
|
|
108 |
embedding_types = [
|
109 |
|
110 |
# GloVe embeddings
|
111 |
+
WordEmbeddings('en-crawl'),
|
112 |
|
113 |
# contextual string embeddings, forward
|
114 |
FlairEmbeddings('news-forward'),
|
|
|
134 |
trainer = ModelTrainer(tagger, corpus)
|
135 |
|
136 |
# 7. run training
|
137 |
+
trainer.train('resources/taggers/ner-english-ontonotes',
|
138 |
train_with_dev=True,
|
139 |
max_epochs=150)
|
140 |
```
|