Aref Sadeghian
commited on
Commit
โข
f93759e
1
Parent(s):
884a82e
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
<h1 align="center">arabert-finetuned-caner</h1>
|
5 |
+
|
6 |
+
<p align="center">An ongoing project for implementation of NLP methods in the field of islamic studies.</p>
|
7 |
+
|
8 |
+
|
9 |
+
### Named Entity Recognition
|
10 |
+
briefly:
|
11 |
+
* We forced to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
|
12 |
+
* Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reched the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
|
13 |
+
<p align="center">
|
14 |
+
<img src="https://github.com/NLP-Final-Projects/NoorHedayat/blob/main/img/eval.jpg" style="width: 80%;">
|
15 |
+
</p>
|
16 |
+
|
17 |
+
* Trained model is available at [huggingface](https://huggingface.co/ArefSadeghian/arabert-finetuned-caner) and you can use it with the following code snippet:
|
18 |
+
|
19 |
+
```python
|
20 |
+
!pip install transformers
|
21 |
+
from transformers import pipeline
|
22 |
+
model_checkpoint = "ArefSadeghian/arabert-finetuned-caner"
|
23 |
+
# Replace this with above latest checkpoint
|
24 |
+
token_classifier = pipeline(
|
25 |
+
"token-classification", model=model_checkpoint, aggregation_strategy="simple"
|
26 |
+
)
|
27 |
+
s = "ุนูู ุจู ู
ุญู
ุฏุ ุนู ุนูู ุจู ุงูุญูู
ุ ุนู ุญู
ุงุฏ ุจู ุนุซู
ุงู ุงูุฎุซุนู
ูุ ุนู ู
ุนุงููุฉ ุจู ุญู
ุฑุงูุ ุนู ุฒูุฏ ุจู ุซุงุจุชุ ูุงู: ุณู
ุนุช ุงููุจู ุตุ ูููู: ุฅู ุฃูู ู
ุง ุฎูู ุงููู ุนุฒ ูุฌู ุงูููู
ุ ููุงู: ุงูุชุจ ูุง ููู
ุ ููุงู: ู
ุง ุฃูุชุจุุ ููุงู: ุงูุชุจ ู
ุง ููููู ุงููู ุนุฒ ูุฌู ุฅูู ููู
ุงูููุงู
ุฉุ ููุงู: ูุง ุฑุจ ู
ุง ุจุงูู ุฃูุชุจ ู
ุง ููููุ ุตุฏูู ุงููู ุฃูุช ุงูุฐู ูููู ูุงูุจุงููุงุช ุงูุตุงูุญุงุช ุฎูุฑ ุนูุฏ ุฑุจู ุซูุงุจุง ูุฎูุฑ ุฃู
ููุง."
|
28 |
+
token_classifier(s)
|
29 |
+
```
|
30 |
+
|
31 |
+
* This model is deployed on an Huggingface space using Gradio. So you can use it online [here](https://huggingface.co/spaces/ArefSadeghian/ArefSadeghian-arabert-finetuned-caner)!
|