Bank Customer Deposit Prediction

The streamlit app is developed upon a classification model that is trained on a Portuguese banking institution data to make predictions whether a bank customer would subscribe to a term deposit. For model training, full data set is deployed, with approximately 41.4k rows and 21 columns. Then, the trained model is prepared for cloud deployment. The full workflow can be found down below.

Streamlit Application

The model is uploaded to streamlit for public use to demonstrate applicability and performance of the trained model. The model is consisted of 6 features, which are selected by RFECV algorithm as the most significant variables in terms of variance explanability. Trained model is capable of making prediction with approximately 95% accuracy with following features;

Occupation
Last contact day of the week
Number of contacts performed during this campaign and for this client
Outcome of the previous marketing campaign
Consumer price index - monthly indicator
Euribor 3 month rate - daily indicator

Model Workflow

Solving The Imbalance Issue

Responses are converted to binary variables, which were labeled as "Yes" and "No" originally, for model development purposes. Approximately, 89% of the labels are "No", or 0, and the remaining are "Yes", or 1. Thereon, two different approaches were followed to develop the best model in terms of higher prediction capability.

First, due to the imbalanced nature of the target variable, the under-sampled target variable "yes" is synthetically re-populated using SMOTE. Thereon, the main performance metric is selected as the accuracy score of prediction. Next, under-sampled target variable, which is not be re-populated, and thus, the main performance metric is Precision - Recall Curve and scores.

Moreoever, feature selection is conducted using Recursive Feature Elimination (RFE). RFE allows model to assign importance for each feature deployed. Later, the weighted features are ranked in an accordance with their corresponding importance score. The least significant features, in terms of ranking, are pruned from the model.

Model Performance

ROC Curve

Confusion Matrix Report

Precision + Recall Curve

Confusion Matrix Report

Conclusion

Two different approaches yielded almost identical performances in terms of making predictions.

Model	Performance Metric	Score
Imbalanced XGBClassifier	Precision + Recall	0.95
Balanced XGBClassifier	Accuracy	0.943

However, upon completion of feature selection, RFECV algorithm concluded significantly less features as the most important ones, with balanced data. On the contrary, Imbalanced data required more features to be able to achieve this accuracy score.

Conclusion, balanced data performed superior with less data requirement. Therefore, the streamlit app will be built upon using the features selected by RFECV as the most important with balanced data.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
Model		Model
Notebook		Notebook
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank Customer Deposit Prediction

Streamlit Application

Model Workflow

Solving The Imbalance Issue

Model Performance

ROC Curve

Confusion Matrix Report

Precision + Recall Curve

Confusion Matrix Report

Conclusion

About

Releases

Packages

Languages

dfavenfre/customer_deposit_classifier

Folders and files

Latest commit

History

Repository files navigation

Bank Customer Deposit Prediction

Streamlit Application

Model Workflow

Solving The Imbalance Issue

Model Performance

ROC Curve

Confusion Matrix Report

Precision + Recall Curve

Confusion Matrix Report

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages