Edit model card

arabert-finetuned-caner

An ongoing project for implementation of NLP methods in the field of islamic studies.

Named Entity Recognition

briefly:

  • We had to prepair CANERCorpus dataset which is avialable at huggingface. The dataset was not in the BIO format so model couldn't learn anything from it. We used an html version of dataset available on github and extracted a HuggingFace format dataset from it with BIO tags.
  • Fine tunning started from a pre-traind model named "bert-base-arabertv02" and after 3 epoch of training model on the above mentioned dataset (80% splitted to training data and 20% to validation data), reached the following results: (evaluation is done by using compute metrics of python evaluate module. note that precision is overall precision, recall is overall recall and so on.)

alt text

  • Trained model is available at huggingface and you can use it with the following code snippet:
!pip install transformers
from transformers import pipeline
model_checkpoint = "Montazer/arabert-finetuned-caner"
# Replace this with above latest checkpoint
token_classifier = pipeline(
    "token-classification", model=model_checkpoint, aggregation_strategy="simple"
)
s = "ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ุนูŽุจู’ุฏ ุงู„ู„ู‘ูŽู‡ูุŒ ุญูŽุฏู‘ูŽุซูŽู†ููŠ ุนูุจูŽูŠู’ุฏู ุงู„ู„ู‘ูŽู‡ู ุจู’ู†ู ุนูู…ูŽุฑูŽ ุงู„ู’ู‚ูŽูˆูŽุงุฑููŠุฑููŠู‘ูุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ูŠููˆู†ูุณู ุจู’ู†ู ุฃูŽุฑู’ู‚ูŽู…ูŽุŒ ุญูŽุฏู‘ูŽุซูŽู†ูŽุง ูŠูŽุฒููŠุฏู ุจู’ู†ู ุฃูŽุจููŠ ุฒููŠูŽุงุฏูุŒ ุนูŽู†ู’ ุนูŽุจู’ุฏู ุงู„ุฑู‘ูŽุญู’ู…ูŽู†ู ุจู’ู†ู ุฃูŽุจููŠ ู„ูŽูŠู’ู„ูŽู‰ุŒ ู‚ูŽุงู„ูŽ ุดูŽู‡ูุฏู’ุชู ุนูŽู„ููŠู‘ู‹ุง ุฑูŽุถููŠูŽ ุงู„ู„ู‘ูŽู‡ู ุนูŽู†ู’ู‡ู ูููŠ ุงู„ุฑู‘ูŽุญูŽุจูŽุฉู ูŠูŽู†ู’ุดูุฏู ุงู„ู†ู‘ูŽุงุณูŽ ุฃูŽู†ู’ุดูุฏู ุงู„ู„ู‘ูŽู‡ูŽ ู…ูŽู†ู’ ุณูŽู…ูุนูŽ ุฑูŽุณููˆู„ูŽ ุงู„ู„ู‘ูŽู‡ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ ูŠูŽู‚ููˆู„ู ูŠูŽูˆู’ู…ูŽ ุบูŽุฏููŠุฑู ุฎูู…ู‘ู ู…ูŽู†ู’ ูƒูู†ู’ุชู ู…ูŽูˆู’ู„ูŽุงู‡ู ููŽุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ูŽุงู‡ู ู„ูŽู…ู‘ูŽุง ู‚ูŽุงู…ูŽ ููŽุดูŽู‡ูุฏูŽ ู‚ูŽุงู„ูŽ ุนูŽุจู’ุฏู ุงู„ุฑู‘ูŽุญู’ู…ูŽู†ู ููŽู‚ูŽุงู…ูŽ ุงุซู’ู†ูŽุง ุนูŽุดูŽุฑูŽ ุจูŽุฏู’ุฑููŠู‘ู‹ุง ูƒูŽุฃูŽู†ู‘ููŠ ุฃูŽู†ู’ุธูุฑู ุฅูู„ูŽู‰ ุฃูŽุญูŽุฏูู‡ูู…ู’ ููŽู‚ูŽุงู„ููˆุง ู†ูŽุดู’ู‡ูŽุฏู ุฃูŽู†ู‘ูŽุง ุณูŽู…ูุนู’ู†ูŽุง ุฑูŽุณููˆู„ูŽ ุงู„ู„ู‘ูŽู‡ู ุตูŽู„ู‘ูŽู‰ ุงู„ู„ู‘ูŽู‡ู ุนูŽู„ูŽูŠู’ู‡ู ูˆูŽุณูŽู„ู‘ูŽู…ูŽ ูŠูŽู‚ููˆู„ู ูŠูŽูˆู’ู…ูŽ ุบูŽุฏููŠุฑู ุฎูู…ู‘ู ุฃูŽู„ูŽุณู’ุชู ุฃูŽูˆู’ู„ูŽู‰ ุจูุงู„ู’ู…ูุคู’ู…ูู†ููŠู†ูŽ ู…ูู†ู’ ุฃูŽู†ู’ููุณูู‡ูู…ู’ ูˆูŽุฃูŽุฒู’ูˆูŽุงุฌููŠ ุฃูู…ู‘ูŽู‡ูŽุงุชูู‡ูู…ู’ ููŽู‚ูู„ู’ู†ูŽุง ุจูŽู„ูŽู‰ ูŠูŽุง ุฑูŽุณููˆู„ูŽ ุงู„ู„ู‘ูŽู‡ู ู‚ูŽุงู„ูŽ ููŽู…ูŽู†ู’ ูƒูู†ู’ุชู ู…ูŽูˆู’ู„ูŽุงู‡ู ููŽุนูŽู„ููŠู‘ูŒ ู…ูŽูˆู’ู„ูŽุงู‡ู ุงู„ู„ู‘ูŽู‡ูู…ู‘ูŽ ูˆูŽุงู„ู ู…ูŽู†ู’ ูˆูŽุงู„ูŽุงู‡ู ูˆูŽุนูŽุงุฏู ู…ูŽู†ู’ ุนูŽุงุฏูŽุงู‡ู"
token_classifier(s)
  • This model is deployed on a Huggingface space using Gradio. So you can use it online here!
Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using Montazer/arabert-finetuned-caner 2