Aref Sadeghian commited on
Commit
f93759e
โ€ข
1 Parent(s): 884a82e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -1,3 +1,31 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ <h1 align="center">arabert-finetuned-caner</h1>
5
+
6
+ <p align="center">An ongoing project for implementation of NLP methods in the field of islamic studies.</p>
7
+
8
+
9
+ ### Named Entity Recognition
10
+ briefly:
11
+ * We forced to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
12
+ * Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reched the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
13
+ <p align="center">
14
+ <img src="https://github.com/NLP-Final-Projects/NoorHedayat/blob/main/img/eval.jpg" style="width: 80%;">
15
+ </p>
16
+
17
+ * Trained model is available at [huggingface](https://huggingface.co/ArefSadeghian/arabert-finetuned-caner) and you can use it with the following code snippet:
18
+
19
+ ```python
20
+ !pip install transformers
21
+ from transformers import pipeline
22
+ model_checkpoint = "ArefSadeghian/arabert-finetuned-caner"
23
+ # Replace this with above latest checkpoint
24
+ token_classifier = pipeline(
25
+ "token-classification", model=model_checkpoint, aggregation_strategy="simple"
26
+ )
27
+ s = "ุนู„ูŠ ุจู† ู…ุญู…ุฏุŒ ุนู† ุนู„ูŠ ุจู† ุงู„ุญูƒู…ุŒ ุนู† ุญู…ุงุฏ ุจู† ุนุซู…ุงู† ุงู„ุฎุซุนู…ูŠุŒ ุนู† ู…ุนุงูˆูŠุฉ ุจู† ุญู…ุฑุงู†ุŒ ุนู† ุฒูŠุฏ ุจู† ุซุงุจุชุŒ ู‚ุงู„: ุณู…ุนุช ุงู„ู†ุจูŠ ุตุŒ ูŠู‚ูˆู„: ุฅู† ุฃูˆู„ ู…ุง ุฎู„ู‚ ุงู„ู„ู‡ ุนุฒ ูˆุฌู„ ุงู„ู‚ู„ู…ุŒ ูู‚ุงู„: ุงูƒุชุจ ูŠุง ู‚ู„ู…ุŒ ูู‚ุงู„: ู…ุง ุฃูƒุชุจุŸุŒ ูู‚ุงู„: ุงูƒุชุจ ู…ุง ูŠู‚ูˆู„ู‡ ุงู„ู„ู‡ ุนุฒ ูˆุฌู„ ุฅู„ู‰ ูŠูˆู… ุงู„ู‚ูŠุงู…ุฉุŒ ูู‚ุงู„: ูŠุง ุฑุจ ู…ุง ุจุงู„ูŠ ุฃูƒุชุจ ู…ุง ูŠู‚ูˆู„ุŒ ุตุฏู‚ูƒ ุงู„ู„ู‡ ุฃู†ุช ุงู„ุฐูŠ ูŠู‚ูˆู„ ูˆุงู„ุจุงู‚ูŠุงุช ุงู„ุตุงู„ุญุงุช ุฎูŠุฑ ุนู†ุฏ ุฑุจูƒ ุซูˆุงุจุง ูˆุฎูŠุฑ ุฃู…ู„ู‹ุง."
28
+ token_classifier(s)
29
+ ```
30
+
31
+ * This model is deployed on an Huggingface space using Gradio. So you can use it online [here](https://huggingface.co/spaces/ArefSadeghian/ArefSadeghian-arabert-finetuned-caner)!