File size: 3,230 Bytes
61c2ef0
 
 
f93759e
 
 
 
 
 
 
ec25715
 
1a5a1f8
 
f93759e
2f5109d
f93759e
 
 
 
2f5109d
f93759e
 
 
 
3601d93
f93759e
 
 
cc683dc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: mit
---
<h1 align="center">arabert-finetuned-caner</h1>

<p align="center">An ongoing project for implementation of NLP methods in the field of islamic studies.</p>


### Named Entity Recognition
briefly:
* We had to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
* Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reached the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)

![alt text](./eval.jpg)

* Trained model is available at [huggingface](https://huggingface.co/Montazer/arabert-finetuned-caner) and you can use it with the following code snippet:

```python
!pip install transformers
from transformers import pipeline
model_checkpoint = "Montazer/arabert-finetuned-caner"
# Replace this with above latest checkpoint
token_classifier = pipeline(
    "token-classification", model=model_checkpoint, aggregation_strategy="simple"
)
s = "ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ุนูŽุจู’ุฏ ุงู„ู„ู‘ูŽู‡ูุŒ ุญูŽุฏู‘ูŽุซูŽู†ููŠ ุนูุจูŽูŠู’ุฏู ุงู„ู„ู‘ูŽู‡ู ุจู’ู†ู ุนูู…ูŽุฑูŽ ุงู„ู’ู‚ูŽูˆูŽุงุฑููŠุฑููŠู‘ูุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ูŠููˆู†ูุณู ุจู’ู†ู ุฃูŽุฑู’ู‚ูŽู…ูŽุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ูŠูŽุฒููŠุฏู ุจู’ู†ู ุฃูŽุจููŠ ุฒููŠูŽุงุฏูุŒ ุนูŽู†ู’ ุนูŽุจู’ุฏู ุงู„ุฑู‘ูŽุญู’ู…ูŽู†ู ุจู’ู†ู ุฃูŽุจููŠ ู„ูŽูŠู’ู„ูŽู‰ุŒ ู‚ูŽุงู„ูŽ ุดูŽู‡ูุฏู’ุชู ุนูŽู„ููŠู‘ู‹ุง ุฑูŽุถููŠูŽ ุงู„ู„ู‘ูŽู‡ู ุนูŽู†ู’ู‡ู ูููŠ ุงู„ุฑู‘ูŽุญูŽุจูŽุฉู ูŠูŽู†ู’ุดูุฏู ุงู„ู†ู‘ูŽุงุณูŽ ุฃูŽู†ู’ุดูุฏู ุงู„ู„ู‘ูŽู‡ูŽ ู…ูŽู†ู’ ุณูŽู…ูุนูŽ ุฑูŽุณููˆู„ูŽ ุงู„ู„ู‘ูŽู‡ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ ูŠูŽู‚ููˆู„ู ูŠูŽูˆู’ู…ูŽ ุบูŽุฏููŠุฑู ุฎูู…ู‘ู ู…ูŽู†ู’ ูƒูู†ู’ุชู ู…ูŽูˆู’ู„ูŽุงู‡ู ููŽุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ูŽุงู‡ู ู„ูŽู…ู‘ูŽุง ู‚ูŽุงู…ูŽ ููŽุดูŽู‡ูุฏูŽ ู‚ูŽุงู„ูŽ ุนูŽุจู’ุฏู ุงู„ุฑู‘ูŽุญู’ู…ูŽู†ู ููŽู‚ูŽุงู…ูŽ ุงุซู’ู†ูŽุง ุนูŽุดูŽุฑูŽ ุจูŽุฏู’ุฑููŠู‘ู‹ุง ูƒูŽุฃูŽู†ู‘ููŠ ุฃูŽู†ู’ุธูุฑู ุฅูู„ูŽู‰ ุฃูŽุญูŽุฏูู‡ูู…ู’ ููŽู‚ูŽุงู„ููˆุง ู†ูŽุดู’ู‡ูŽุฏู ุฃูŽู†ู‘ูŽุง ุณูŽู…ูุนู’ู†ูŽุง ุฑูŽุณููˆู„ูŽ ุงู„ู„ู‘ูŽู‡ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ ูŠูŽู‚ููˆู„ู ูŠูŽูˆู’ู…ูŽ ุบูŽุฏููŠุฑู ุฎูู…ู‘ู ุฃูŽู„ูŽุณู’ุชู ุฃูŽูˆู’ู„ูŽู‰ ุจูุงู„ู’ู…ูุคู’ู…ูู†ููŠู†ูŽ ู…ูู†ู’ ุฃูŽู†ู’ููุณูู‡ูู…ู’ ูˆูŽุฃูŽุฒู’ูˆูŽุงุฌููŠ ุฃูู…ู‘ูŽู‡ูŽุงุชูู‡ูู…ู’ ููŽู‚ูู„ู’ู†ูŽุง ุจูŽู„ูŽู‰ ูŠูŽุง ุฑูŽุณููˆู„ูŽ ุงู„ู„ู‘ูŽู‡ู ู‚ูŽุงู„ูŽ ููŽู…ูŽู†ู’ ูƒูู†ู’ุชู ู…ูŽูˆู’ู„ูŽุงู‡ู ููŽุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ูŽุงู‡ู ุงู„ู„ู‘ูŽู‡ูู…ู‘ูŽ ูˆูŽุงู„ู ู…ูŽู†ู’ ูˆูŽุงู„ูŽุงู‡ู ูˆูŽุนูŽุงุฏู ู…ูŽู†ู’ ุนูŽุงุฏูŽุงู‡ู"
token_classifier(s)
```

* This model is deployed on a Huggingface space using Gradio. So you can use it online [here](https://huggingface.co/spaces/Montazer/arabert-finetuned-on-caner)!