MBMMurad
/

BanglaBERT_Person_Name_Extractor

@@ -8,28 +8,6 @@ pipeline_tag: token-classification
 # Bangla-Person-Name-Extractor
 This repository contains the implementation of a Bangla Person Name Extractor model which is able to extract Person name entities from a given sentence. We approached it as a token classification task i.e. tagging each token with either a Person's name or not. We leveraged the [BanglaBERT](http://https://github.com/csebuetnlp/banglabert) model for our task, finetuning it for a binary classification task using a custom-prepare dataset. We have deployed the model into huggingface for easier access and use case.
-# Datasets
-We used two datasets to train and evaluate our pipeline.
-1. [Bengali-NER/annotated data at master · Rifat1493/Bengali-NER](http://https://github.com/Rifat1493/Bengali-NER/tree/master/annotated%20data)
-2. [banglakit/bengali-ner-data](http://https://raw.githubusercontent.com/banglakit/bengali-ner-data/master/main.jsonl)
-The annotation formats for both datasets were quite different, so we had to preprocess both of them before merging them. Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/prepare-dataset.ipynb) for preparing the dataset as required.
-# Training and Evaluation
-We treated this problem as a token classification task.So it seemed perfect to finetune BanglaBERT model for our purpose. [BanglaBERT ](https://huggingface.co/csebuetnlp/banglabert)is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
-We mainly finetuned two checkpoints of BanglaBERT.
-1. [BanglaBERT](https://huggingface.co/csebuetnlp/banglabert)
-2. [BanglaEERT small](https://huggingface.co/csebuetnlp/banglabert_small)
-BanglaBERT performed better than BanglaBERT small ( 83% F1 score vs 79% F1 score on the test set) .
-Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Training%20Notebook%20%3A%20Person%20Name%20Extractor%20using%20BanglaBERT.ipynb) to see the training process.
-**Quantitative results**
-Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference%20and%20Evaluation%20Notebook.ipynb) to see the evaluation process.
-<br></br>
-![Markdown symbol](https://github.com/MBMMurad/asl-2d-to-3d/blob/master/Screenshot%20from%202023-07-13%2023-11-59.png)
 # How to use it?
 [This Notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference_template.ipynb) contains the required Inference Template on a sentence.
 <br></br>
@@ -71,7 +49,7 @@ print(f"Input Sentence : {sentence}")
 print(f"Person Name Entities : {pred}")
 ```
-**Output :**
 ```
 Input Sentence : আব্দুর রহিম নামের কাস্টমারকে একশ টাকা বাকি দিলাম।
 Person Name Entities : ['আব্দুর' 'রহিম']
@@ -83,4 +61,24 @@ Person Name Entities : ['দেলোয়ার' 'হোসেন' 'মজু
 Input Sentence : দলীয় নেতারা তাঁর বাসভবনে যেতে চাইলে আটক হন।
 Person Name Entities : []
-```

 # Bangla-Person-Name-Extractor
 This repository contains the implementation of a Bangla Person Name Extractor model which is able to extract Person name entities from a given sentence. We approached it as a token classification task i.e. tagging each token with either a Person's name or not. We leveraged the [BanglaBERT](http://https://github.com/csebuetnlp/banglabert) model for our task, finetuning it for a binary classification task using a custom-prepare dataset. We have deployed the model into huggingface for easier access and use case.
 # How to use it?
 [This Notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference_template.ipynb) contains the required Inference Template on a sentence.
 <br></br>
 print(f"Person Name Entities : {pred}")
 ```
+**Output:**
 ```
 Input Sentence : আব্দুর রহিম নামের কাস্টমারকে একশ টাকা বাকি দিলাম।
 Person Name Entities : ['আব্দুর' 'রহিম']
 Input Sentence : দলীয় নেতারা তাঁর বাসভবনে যেতে চাইলে আটক হন।
 Person Name Entities : []
+```
+# Datasets
+We used two datasets to train and evaluate our pipeline.
+1. [Bengali-NER/annotated data at master · Rifat1493/Bengali-NER](http://https://github.com/Rifat1493/Bengali-NER/tree/master/annotated%20data)
+2. [banglakit/bengali-ner-data](http://https://raw.githubusercontent.com/banglakit/bengali-ner-data/master/main.jsonl)
+The annotation formats for both datasets were quite different, so we had to preprocess both of them before merging them. Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/prepare-dataset.ipynb) for preparing the dataset as required.
+# Training and Evaluation
+We treated this problem as a token classification task.So it seemed perfect to finetune the BanglaBERT model for our purpose. [BanglaBERT ](https://huggingface.co/csebuetnlp/banglabert)is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
+We mainly finetuned two checkpoints of BanglaBERT.
+1. [BanglaBERT](https://huggingface.co/csebuetnlp/banglabert)
+2. [BanglaEERT small](https://huggingface.co/csebuetnlp/banglabert_small)
+BanglaBERT performed better than BanglaBERT small ( 83% F1 score vs 79% F1 score on the test set) .
+Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Training%20Notebook%20%3A%20Person%20Name%20Extractor%20using%20BanglaBERT.ipynb) to see the training process.
+**Quantitative results**
+Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference%20and%20Evaluation%20Notebook.ipynb) to see the evaluation process.
+<br></br>
+![Results](https://github.com/MBMMurad/asl-2d-to-3d/blob/master/Screenshot%20from%202023-07-13%2023-11-59.png)