Aref Sadeghian
commited on
Commit
โข
ec25715
1
Parent(s):
1a5a1f8
Update README.md
Browse files
README.md
CHANGED
@@ -8,8 +8,8 @@ license: mit
|
|
8 |
|
9 |
### Named Entity Recognition
|
10 |
briefly:
|
11 |
-
* We
|
12 |
-
* Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data),
|
13 |
|
14 |
![alt text](./eval.jpg)
|
15 |
|
@@ -23,7 +23,7 @@ model_checkpoint = "ArefSadeghian/arabert-finetuned-caner"
|
|
23 |
token_classifier = pipeline(
|
24 |
"token-classification", model=model_checkpoint, aggregation_strategy="simple"
|
25 |
)
|
26 |
-
s = "
|
27 |
token_classifier(s)
|
28 |
```
|
29 |
|
|
|
8 |
|
9 |
### Named Entity Recognition
|
10 |
briefly:
|
11 |
+
* We had to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
|
12 |
+
* Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reached the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
|
13 |
|
14 |
![alt text](./eval.jpg)
|
15 |
|
|
|
23 |
token_classifier = pipeline(
|
24 |
"token-classification", model=model_checkpoint, aggregation_strategy="simple"
|
25 |
)
|
26 |
+
s = "ุญูุฏููุซูููุง ุณูููููู
ูุงูู ุจููู ุจููุงููุ ุญูุฏููุซูููุง ุนูููููู ุจููู ุงููุญูุณููู ุจููู ุงููููุถูููุ ุญูุฏููุซูููุง ู
ูุญูู
ููุฏู ุจููู ุงููุญูุณููู ุงูุตููุจููุงุญููููุ ุญูุฏููุซูููุง ุงููููุงุณูู
ู ุจููู ุฅูุจูุฑูุงูููู
ูุ ุญูุฏููุซูููุง ุนูุจูุฏู ุงูููููู ุจููู ุณูุจูุฃูุ ุนููู ุฌูุงุจูุฑู ุจููู ุนูุจูุฏู ุงููููููุ ุนููู ุงููููุจูููู ุตููููู ุงูููููู ุนููููููู ููุณููููู
ู: "ู
ููู ููููุชู ู
ููููุงููู ููุนูููููู ู
ููููุงูููุ ุงููููููู
ูู ููุงูู ู
ููู ููุงููุงููุ ููุนูุงุฏู ู
ููู ุนูุงุฏูุงูู" ููููุงูู ุฌูุงุจูุฑู: ููุฑูุฃูููุชู ููุฌูููู ุงููููุงุณู ููููู
ูุฆูุฐู ูููููู
ูุง ููุงูู ุงููููุจูููู ุตููููู ุงูููููู ุนููููููู ููุณููููู
ู: "ุนูููููู ู
ููููุงููู" ุตูุนูุฏููุง ุนูููู ุนูุงุชููู ุจูุนูุถูููู
ูุ ููููููู ุจูุนูุถูููู
ู ููุจูุนูุถู: ุนูููููู ู
ููููุงูููุ ููููุงูู ุฑูุณูููู ุงูููููู ุตููููู ุงูููููู ุนููููููู ููุณููููู
ู: "ุงููููููู
ูู ุงููุญูููู ููุงููุญููููุ "
|
27 |
token_classifier(s)
|
28 |
```
|
29 |
|