MBMMurad commited on
Commit
dd598b6
1 Parent(s): 4e4157d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -24
README.md CHANGED
@@ -8,28 +8,6 @@ pipeline_tag: token-classification
8
  # Bangla-Person-Name-Extractor
9
  This repository contains the implementation of a Bangla Person Name Extractor model which is able to extract Person name entities from a given sentence. We approached it as a token classification task i.e. tagging each token with either a Person's name or not. We leveraged the [BanglaBERT](http://https://github.com/csebuetnlp/banglabert) model for our task, finetuning it for a binary classification task using a custom-prepare dataset. We have deployed the model into huggingface for easier access and use case.
10
 
11
-
12
- # Datasets
13
- We used two datasets to train and evaluate our pipeline.
14
- 1. [Bengali-NER/annotated data at master · Rifat1493/Bengali-NER](http://https://github.com/Rifat1493/Bengali-NER/tree/master/annotated%20data)
15
- 2. [banglakit/bengali-ner-data](http://https://raw.githubusercontent.com/banglakit/bengali-ner-data/master/main.jsonl)
16
-
17
- The annotation formats for both datasets were quite different, so we had to preprocess both of them before merging them. Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/prepare-dataset.ipynb) for preparing the dataset as required.
18
-
19
- # Training and Evaluation
20
- We treated this problem as a token classification task.So it seemed perfect to finetune BanglaBERT model for our purpose. [BanglaBERT ](https://huggingface.co/csebuetnlp/banglabert)is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
21
- We mainly finetuned two checkpoints of BanglaBERT.
22
- 1. [BanglaBERT](https://huggingface.co/csebuetnlp/banglabert)
23
- 2. [BanglaEERT small](https://huggingface.co/csebuetnlp/banglabert_small)
24
-
25
- BanglaBERT performed better than BanglaBERT small ( 83% F1 score vs 79% F1 score on the test set) .
26
- Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Training%20Notebook%20%3A%20Person%20Name%20Extractor%20using%20BanglaBERT.ipynb) to see the training process.
27
-
28
- **Quantitative results**
29
- Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference%20and%20Evaluation%20Notebook.ipynb) to see the evaluation process.
30
- <br></br>
31
- ![Markdown symbol](https://github.com/MBMMurad/asl-2d-to-3d/blob/master/Screenshot%20from%202023-07-13%2023-11-59.png)
32
-
33
  # How to use it?
34
  [This Notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference_template.ipynb) contains the required Inference Template on a sentence.
35
  <br></br>
@@ -71,7 +49,7 @@ print(f"Input Sentence : {sentence}")
71
  print(f"Person Name Entities : {pred}")
72
  ```
73
 
74
- **Output :**
75
  ```
76
  Input Sentence : আব্দুর রহিম নামের কাস্টমারকে একশ টাকা বাকি দিলাম।
77
  Person Name Entities : ['আব্দুর' 'রহিম']
@@ -83,4 +61,24 @@ Person Name Entities : ['দেলোয়ার' 'হোসেন' 'মজু
83
 
84
  Input Sentence : দলীয় নেতারা তাঁর বাসভবনে যেতে চাইলে আটক হন।
85
  Person Name Entities : []
86
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  # Bangla-Person-Name-Extractor
9
  This repository contains the implementation of a Bangla Person Name Extractor model which is able to extract Person name entities from a given sentence. We approached it as a token classification task i.e. tagging each token with either a Person's name or not. We leveraged the [BanglaBERT](http://https://github.com/csebuetnlp/banglabert) model for our task, finetuning it for a binary classification task using a custom-prepare dataset. We have deployed the model into huggingface for easier access and use case.
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  # How to use it?
12
  [This Notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference_template.ipynb) contains the required Inference Template on a sentence.
13
  <br></br>
 
49
  print(f"Person Name Entities : {pred}")
50
  ```
51
 
52
+ **Output:**
53
  ```
54
  Input Sentence : আব্দুর রহিম নামের কাস্টমারকে একশ টাকা বাকি দিলাম।
55
  Person Name Entities : ['আব্দুর' 'রহিম']
 
61
 
62
  Input Sentence : দলীয় নেতারা তাঁর বাসভবনে যেতে চাইলে আটক হন।
63
  Person Name Entities : []
64
+ ```
65
+ # Datasets
66
+ We used two datasets to train and evaluate our pipeline.
67
+ 1. [Bengali-NER/annotated data at master · Rifat1493/Bengali-NER](http://https://github.com/Rifat1493/Bengali-NER/tree/master/annotated%20data)
68
+ 2. [banglakit/bengali-ner-data](http://https://raw.githubusercontent.com/banglakit/bengali-ner-data/master/main.jsonl)
69
+
70
+ The annotation formats for both datasets were quite different, so we had to preprocess both of them before merging them. Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/prepare-dataset.ipynb) for preparing the dataset as required.
71
+
72
+ # Training and Evaluation
73
+ We treated this problem as a token classification task.So it seemed perfect to finetune the BanglaBERT model for our purpose. [BanglaBERT ](https://huggingface.co/csebuetnlp/banglabert)is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
74
+ We mainly finetuned two checkpoints of BanglaBERT.
75
+ 1. [BanglaBERT](https://huggingface.co/csebuetnlp/banglabert)
76
+ 2. [BanglaEERT small](https://huggingface.co/csebuetnlp/banglabert_small)
77
+
78
+ BanglaBERT performed better than BanglaBERT small ( 83% F1 score vs 79% F1 score on the test set) .
79
+ Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Training%20Notebook%20%3A%20Person%20Name%20Extractor%20using%20BanglaBERT.ipynb) to see the training process.
80
+
81
+ **Quantitative results**
82
+ Please refer to [this notebook](https://github.com/MBMMurad/Bangla-Person-Name-Extractor/blob/main/Inference%20and%20Evaluation%20Notebook.ipynb) to see the evaluation process.
83
+ <br></br>
84
+ ![Results](https://github.com/MBMMurad/asl-2d-to-3d/blob/master/Screenshot%20from%202023-07-13%2023-11-59.png)