File size: 5,836 Bytes
232e5e5
 
 
 
 
 
79e1434
 
 
 
 
232e5e5
 
 
 
37c732e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79e1434
 
 
 
 
 
 
 
 
 
37c732e
79e1434
 
 
 
c91f315
79e1434
 
 
 
 
37c732e
 
232e5e5
 
 
 
 
 
 
 
 
 
 
 
 
7f0977b
 
 
 
815b45f
37c732e
815b45f
232e5e5
 
 
 
 
 
d501dd7
 
 
79e1434
d501dd7
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# Credit Risk Modelling

# About

An interactive tool demonstrating credit risk modelling.

Emphasis on:
* Building models
* Comparing techniques
* Interpretating results

## Built With

- [Streamlit](https://streamlit.io/)

### Hardware initially built on:
Processor: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80Ghz, 2803 Mhz, 4 Core(s), 8 Logical Processor(s)

Memory (RAM): 16GB 

## Local setup
### Obtain the repo locally and open its root folder
#### To potentially contribute
git clone https://github.com/pkiage/tool-credit-risk-modelling.git

or

gh repo clone pkiage/tool-credit-risk-modelling

#### Just to deploy locally
Download ZIP

### (optional) Setup virtual environment:
python -m venv venv

### (optional) Activate virtual environment:
#### If using Unix based OS run the following in terminal:  
.\venv\bin\activate

#### If using Windows run the following in terminal:
.\venv\Scripts\activate

### Install requirements by running the following in terminal:
#### Required packages
pip install -r requirements.txt

#### Complete graphviz installation
https://graphviz.org/download/ 

### Run the streamlit app (app.py) by running the following in terminal:

streamlit run app.py

## Deployed setup details
For faster model building and testing (particularly XGBoost) a local setup is recommended.

Free Heroku dyno type was used to deploy the app
https://devcenter.heroku.com/articles/dyno-types


Memory (RAM): 512 MB

CPU Share: 1x

Compute: 1x-4x 

Dedicated: no

Sleeps: yes

# Roadmap
Models:
- [ ] Add LightGBM
- [ ] Add Adabost
- [ ] Add Random Forest

Visualization:
- [ ] Add decision surface plot(s)

Documentation:
- [x] Add getting started and usage documentation
- [ ] Add documentation evaluating models
- [ ] Add design rationale(s)

Other:
- [x] Deploy app
- [ ] Add csv file data input
- [ ] Add tests
- [ ] Add test/code coverage badge
- [ ] Add continuous integration badge



# References

## Inspiration:

[Credit Risk Modeling in Python by Datacamp](https://www.datacamp.com/courses/credit-risk-modeling-in-python)

- General Methodology
- Data

[A Gentle Introduction to Threshold-Moving for Imbalanced Classification](https://machinelearningmastery.com/threshold-moving-for-imbalanced-classification/)

- Selecting optimal threshold using Youden's J statistic

[Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/)

- Project structure

[GraphViz Buildpack](https://github.com/weibeld/heroku-buildpack-graphviz)
- Buildpack used for Heroku deployment

## Political, Economic, Social, Technological, Legal and Environmental(PESTLE):

[Europe fit for the Digital Age: Commission proposes new rules and actions for excellence and trust in Artificial Intelligence](https://ec.europa.eu/commission/presscorner/detail/en/ip_21_1682)

[LAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE (ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION LEGISLATIVE ACTS](https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206&from=EN)

> "(37) Another area in which the use of AI systems deserves special consideration is the access to and enjoyment of certain essential private and public services and benefits necessary for people to fully participate in society or to improve one’s standard of living. In particular, AI systems used to evaluate the credit score or creditworthiness of natural persons should be classified as high-risk AI systems, since they determine those persons’ access to financial resources or essential services such as housing, electricity, and telecommunication services. AI systems used for this purpose may lead to discrimination of persons or groups and perpetuate historical patterns of discrimination, for example based on racial or ethnic origins, disabilities, age, sexual orientation, or create new forms of discriminatory impacts. Considering the very limited scale of the impact and the available alternatives on the market, it is appropriate to exempt AI systems for the purpose of creditworthiness assessment and credit scoring when put into service by small-scale providers for their own use. Natural persons applying for or receiving public assistance benefits and services from public authorities are typically dependent on those benefits and services and in a vulnerable position in relation to the responsible authorities. If AI systems are used for determining whether such benefits and services should be denied, reduced, revoked or reclaimed by authorities, they may have a significant impact on persons’ livelihood and may infringe their fundamental rights, such as the right to social protection, non-discrimination, human dignity or an effective remedy. Those systems should therefore be classified as high-risk. Nonetheless, this Regulation should not hamper the development and use of innovative approaches in the public administration, which would stand to benefit from a wider use of compliant and safe AI systems, provided that those systems do not entail a high risk to legal and natural persons."

[Europe fit for the Digital Age: Commission proposes new rules and actions for excellence and trust in Artificial Intelligence](https://ec.europa.eu/commission/presscorner/detail/en/ip_21_1682)
> "High-risk AI systems will be subject to strict obligations before they can be put on the market:
>* Adequate risk assessment and mitigation systems;
>* High quality of the datasets feeding the system to minimise risks and discriminatory outcomes;
>* Logging of activity to ensure traceability of results;
>* Detailed documentation providing all information necessary on the system and its purpose for authorities to assess its compliance;
>* Clear and adequate information to the user;
>* Appropriate human oversight measures to minimise risk;
>* High level of robustness, security and accuracy."