Skip to main content

Defining use case.

Predicting the Bank Loan Defaulters using Machine Learning.

Problem Definition:

A data science approach to predict and understand the applicant’s profile to minimize the risk of future loan defaults.

The dataset contains information about credit applicants. Banks, globally, use this kind of dataset and type of informative data to create models to help in deciding on who to accept/refuse for a loan. After all the exploratory data analysis, cleansing and dealing with all the anomalies we might (will) find along the way, the patterns of a good/bad applicant will be exposed to be learned by machine learning models.

  • Machine Learning issue and objectives We’re dealing with a supervised binary classification problem. The goal is to train the best machine learning model to maximize the predictive capability of deeply understanding the past customer’s profile minimizing the risk of future loan defaults.

  • Performance Metric The metric used for the models’ evaluation is the ROC AUC given that we’re dealing with a highly unbalanced data.

  • Project structure: The project divides into three categories:

    • EDA: Exploratory Data Analysis

    • Data Preprocessing: Cleansing and Feature Selection

    • Feature store: Ingesting the preprocessed features into Feature Store for reusability.

    • Machine Learning: Predictive Modelling