--- license: mit ---

arabert-finetuned-caner

An ongoing project for implementation of NLP methods in the field of islamic studies.

### Named Entity Recognition briefly: * We forced to prepair CANERCorpus dataset which is avialable at [huggingface](https://huggingface.co/datasets/caner/). The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags. * Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reched the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.) ![alt text](./eval.jpg) * Trained model is available at [huggingface](https://huggingface.co/ArefSadeghian/arabert-finetuned-caner) and you can use it with the following code snippet: ```python !pip install transformers from transformers import pipeline model_checkpoint = "ArefSadeghian/arabert-finetuned-caner" # Replace this with above latest checkpoint token_classifier = pipeline( "token-classification", model=model_checkpoint, aggregation_strategy="simple" ) s = "علي بن محمد، عن علي بن الحكم، عن حماد بن عثمان الخثعمي، عن معاوية بن حمران، عن زيد بن ثابت، قال: سمعت النبي ص، يقول: إن أول ما خلق الله عز وجل القلم، فقال: اكتب يا قلم، فقال: ما أكتب؟، فقال: اكتب ما يقوله الله عز وجل إلى يوم القيامة، فقال: يا رب ما بالي أكتب ما يقول، صدقك الله أنت الذي يقول والباقيات الصالحات خير عند ربك ثوابا وخير أملًا." token_classifier(s) ``` * This model is deployed on an Huggingface space using Gradio. So you can use it online [here](https://huggingface.co/spaces/ArefSadeghian/ArefSadeghian-arabert-finetuned-caner)!