Aref Sadeghian commited on
Commit
ec25715
โ€ข
1 Parent(s): 1a5a1f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -8,8 +8,8 @@ license: mit
8
 
9
  ### Named Entity Recognition
10
  briefly:
11
- * We forced to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
12
- * Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reched the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
13
 
14
  ![alt text](./eval.jpg)
15
 
@@ -23,7 +23,7 @@ model_checkpoint = "ArefSadeghian/arabert-finetuned-caner"
23
  token_classifier = pipeline(
24
  "token-classification", model=model_checkpoint, aggregation_strategy="simple"
25
  )
26
- s = "ุนู„ูŠ ุจู† ู…ุญู…ุฏุŒ ุนู† ุนู„ูŠ ุจู† ุงู„ุญูƒู…ุŒ ุนู† ุญู…ุงุฏ ุจู† ุนุซู…ุงู† ุงู„ุฎุซุนู…ูŠุŒ ุนู† ู…ุนุงูˆูŠุฉ ุจู† ุญู…ุฑุงู†ุŒ ุนู† ุฒูŠุฏ ุจู† ุซุงุจุชุŒ ู‚ุงู„: ุณู…ุนุช ุงู„ู†ุจูŠ ุตุŒ ูŠู‚ูˆู„: ุฅู† ุฃูˆู„ ู…ุง ุฎู„ู‚ ุงู„ู„ู‡ ุนุฒ ูˆุฌู„ ุงู„ู‚ู„ู…ุŒ ูู‚ุงู„: ุงูƒุชุจ ูŠุง ู‚ู„ู…ุŒ ูู‚ุงู„: ู…ุง ุฃูƒุชุจุŸุŒ ูู‚ุงู„: ุงูƒุชุจ ู…ุง ูŠู‚ูˆู„ู‡ ุงู„ู„ู‡ ุนุฒ ูˆุฌู„ ุฅู„ู‰ ูŠูˆู… ุงู„ู‚ูŠุงู…ุฉุŒ ูู‚ุงู„: ูŠุง ุฑุจ ู…ุง ุจุงู„ูŠ ุฃูƒุชุจ ู…ุง ูŠู‚ูˆู„ุŒ ุตุฏู‚ูƒ ุงู„ู„ู‡ ุฃู†ุช ุงู„ุฐูŠ ูŠู‚ูˆู„ ูˆุงู„ุจุงู‚ูŠุงุช ุงู„ุตุงู„ุญุงุช ุฎูŠุฑ ุนู†ุฏ ุฑุจูƒ ุซูˆุงุจุง ูˆุฎูŠุฑ ุฃู…ู„ู‹ุง."
27
  token_classifier(s)
28
  ```
29
 
 
8
 
9
  ### Named Entity Recognition
10
  briefly:
11
+ * We had to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
12
+ * Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reached the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
13
 
14
  ![alt text](./eval.jpg)
15
 
 
23
  token_classifier = pipeline(
24
  "token-classification", model=model_checkpoint, aggregation_strategy="simple"
25
  )
26
+ s = "ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ุณูู„ูŽูŠู’ู…ูŽุงู†ู ุจู’ู†ู ุจูู„ุงูŽู„ุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ุนูŽู„ููŠู‘ู ุจู’ู†ู ุงู„ู’ุญูŽุณูŽู†ู ุจู’ู†ู ุงู„ู’ููŽุถู’ู„ูุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ู…ูุญูŽู…ู‘ูŽุฏู ุจู’ู†ู ุงู„ู’ุญูŽุณูŽู†ู ุงู„ุตู‘ูŽุจู‘ูŽุงุญููŠู‘ูุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ุงู„ู’ู‚ูŽุงุณูู…ู ุจู’ู†ู ุฅูุจู’ุฑูŽุงู‡ููŠู…ูŽุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ุนูŽุจู’ุฏู ุงู„ู„ู‘ูŽู‡ู ุจู’ู†ู ุณูŽุจู’ุฃูุŒ ุนูŽู†ู’ ุฌูŽุงุจูุฑู ุจู’ู†ู ุนูŽุจู’ุฏู ุงู„ู„ู‘ูŽู‡ูุŒ ุนูŽู†ู ุงู„ู†ู‘ูŽุจููŠู‘ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ: "ู…ูŽู†ู’ ูƒูู†ู’ุชู ู…ูŽูˆู’ู„ุงูŽู‡ู ููŽุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ุงูŽู‡ูุŒ ุงู„ู„ู‘ูŽู‡ูู…ู‘ูŽ ูˆูŽุงู„ู ู…ูŽู†ู’ ูˆูŽุงู„ูŽุงู‡ูุŒ ูˆูŽุนูŽุงุฏู ู…ูŽู†ู’ ุนูŽุงุฏูŽุงู‡ู" ููŽู‚ูŽุงู„ูŽ ุฌูŽุงุจูุฑูŒ: ููŽุฑูŽุฃูŽูŠู’ุชู ูˆูุฌููˆู‡ูŽ ุงู„ู†ู‘ูŽุงุณู ูŠูŽูˆู’ู…ูŽุฆูุฐู ูƒูู„ู‘ูŽู…ูŽุง ู‚ูŽุงู„ูŽ ุงู„ู†ู‘ูŽุจููŠู‘ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ: "ุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ุงูŽู‡ู" ุตูŽุนูุฏููˆุง ุนูŽู„ูŽู‰ ุนูŽุงุชูู‚ู ุจูŽุนู’ุถูู‡ูู…ู’ุŒ ูŠูŽู‚ููˆู„ู ุจูŽุนู’ุถูู‡ูู…ู’ ู„ูุจูŽุนู’ุถู: ุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ุงูŽู‡ูุŒ ููŽู‚ูŽุงู„ูŽ ุฑูŽุณููˆู„ู ุงู„ู„ู‘ูŽู‡ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ: "ุงู„ู„ู‘ูŽู‡ูู…ู‘ูŽ ุงู„ู’ุญูŽู‚ู‘ู ูˆูŽุงู„ู’ุญูŽู‚ู‘ูุŒ "
27
  token_classifier(s)
28
  ```
29