--- license: mit ---

arabert-finetuned-caner

An ongoing project for implementation of NLP methods in the field of islamic studies.

### Named Entity Recognition briefly: * We had to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags. * Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reached the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.) ![alt text](./eval.jpg) * Trained model is available at [huggingface](https://huggingface.co/ArefSadeghian/arabert-finetuned-caner) and you can use it with the following code snippet: ```python !pip install transformers from transformers import pipeline model_checkpoint = "ArefSadeghian/arabert-finetuned-caner" # Replace this with above latest checkpoint token_classifier = pipeline( "token-classification", model=model_checkpoint, aggregation_strategy="simple" ) s = "حَدَّثَنَا سُلَيْمَانُ بْنُ بِلاَل، حَدَّثَنَا عَلِيُّ بْنُ الْحَسَنِ بْنِ الْفَضْلِ، حَدَّثَنَا مُحَمَّدُ بْنُ الْحَسَنِ الصَّبَّاحِيُّ، حَدَّثَنَا الْقَاسِمُ بْنُ إِبْرَاهِيمَ، حَدَّثَنَا عَبْدُ اللَّهِ بْنُ سَبْأٍ، عَنْ جَابِرِ بْنِ عَبْدِ اللَّهِ، عَنِ النَّبِيِّ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ: "مَنْ كُنْتُ مَوْلاَهُ فَعَلِيٌّ مَوْلاَهُ، اللَّهُمَّ وَالِ مَنْ وَالَاهُ، وَعَادِ مَنْ عَادَاهُ" فَقَالَ جَابِرٌ: فَرَأَيْتُ وُجُوهَ النَّاسِ يَوْمَئِذٍ كُلَّمَا قَالَ النَّبِيُّ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ: "عَلِيٌّ مَوْلاَهُ" صَعِدُوا عَلَى عَاتِقِ بَعْضِهِمْ، يَقُولُ بَعْضُهُمْ لِبَعْضٍ: عَلِيٌّ مَوْلاَهُ، فَقَالَ رَسُولُ اللَّهِ صَلَّى اللَّهُ عَلَيْهِ وَسَلَّمَ: "اللَّهُمَّ الْحَقُّ وَالْحَقُّ، " token_classifier(s) ``` * This model is deployed on an Huggingface space using Gradio. So you can use it online [here](https://huggingface.co/spaces/ArefSadeghian/ArefSadeghian-arabert-finetuned-caner)!