Montazer
/

arabert-finetuned-caner

Token Classification

Transformers

PyTorch

bert

Inference Endpoints

Model card Files Files and versions Community

Aref Sadeghian commited on Apr 18, 2023

Commit

ec25715

•

1 Parent(s): 1a5a1f8

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -8,8 +8,8 @@ license: mit
 ### Named Entity Recognition
 briefly:
-* We forced to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
-* Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reched the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
 ![alt text](./eval.jpg)
@@ -23,7 +23,7 @@ model_checkpoint = "ArefSadeghian/arabert-finetuned-caner"
 token_classifier = pipeline(
     "token-classification", model=model_checkpoint, aggregation_strategy="simple"
 )
-s = "علي بن محمد، عن علي بن الحكم، عن حماد بن عثمان الخثعمي، عن معاوية بن حمران، عن زيد بن ثابت، قال: سمعت النبي ص، يقول: إن أول ما خلق الله عز وجل القلم، فقال: اكتب يا قلم، فقال: ما أكتب؟، فقال: اكتب ما يقوله الله عز وجل إلى يوم القيامة، فقال: يا رب ما بالي أكتب ما يقول، صدقك الله أنت الذي يقول والباقيات الصالحات خير عند ربك ثوابا وخير أملًا."
 token_classifier(s)
 ```

 ### Named Entity Recognition
 briefly:
+* We had to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
+* Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reached the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)
 ![alt text](./eval.jpg)
 token_classifier = pipeline(
     "token-classification", model=model_checkpoint, aggregation_strategy="simple"
 )
+s = "حَدَّثَنَا سُلَيْمَانُ بْنُ بِلاَل، حَدَّثَنَا عَلِيُّ بْنُ الْحَسَنِ بْنِ الْفَضْلِ، حَدَّثَنَا مُحَمَّدُ بْنُ الْحَسَنِ الصَّبَّاحِيُّ، حَدَّثَنَا الْقَاسِمُ بْنُ إِبْرَاهِيمَ، حَدَّثَنَا عَبْدُ اللَّهِ بْنُ سَبْأٍ، عَنْ جَابِرِ بْنِ عَبْدِ اللَّهِ، عَنِ النَّبِيِّ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ: "مَنْ كُنْتُ مَوْلاَهُ فَعَلِيٌّ مَوْلاَهُ، اللَّهُمَّ وَالِ مَنْ وَالَاهُ، وَعَادِ مَنْ عَادَاهُ" فَقَالَ جَابِرٌ: فَرَأَيْتُ وُجُوهَ النَّاسِ يَوْمَئِذٍ كُلَّمَا قَالَ النَّبِيُّ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ: "عَلِيٌّ مَوْلاَهُ" صَعِدُوا عَلَى عَاتِقِ بَعْضِهِمْ، يَقُولُ بَعْضُهُمْ لِبَعْضٍ: عَلِيٌّ مَوْلاَهُ، فَقَالَ رَسُولُ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ: "اللَّهُمَّ الْحَقُّ وَالْحَقُّ، "
 token_classifier(s)
 ```