We are a small startup focusing mainly on providing machine learning solutions in the European banking market. We work on a variety of problems including fraud detection, sentiment classification and customer intention prediction and classification.
We are interested in developing a robust machine learning system that leverages information coming from call center data.
Ultimately, we are looking for ways to improve the success rate for calls made to customers for any product that our clients offer. Towards this goal we are working on designing an ever evolving machine learning product that offers high success outcomes while offering interpretability for our clients to make informed decisions.
The data comes from direct marketing efforts of a European banking institution. The marketing campaign involves making a phone call to a customer, often multiple times to ensure a product subscription, in this case a term deposit. Term deposits are usually short-term deposits with maturities ranging from one month to a few years. The customer must understand when buying a term deposit that they can withdraw their funds only after the term ends. All customer information that might reveal personal information is removed due to privacy concerns.
Attributes:
-
age: age of customer (numeric)
-
job: type of job (categorical)
-
marital: marital status (categorical)
-
education (categorical)
-
default: has credit in default? (binary)
-
balance: average yearly balance, in euros (numeric)
-
housing: has a housing loan? (binary)
-
loan: has personal loan? (binary)
-
contact: contact communication type (categorical)
-
day: last contact day of the month (numeric)
-
month: last contact month of year (categorical)
-
duration: last contact duration, in seconds (numeric)
-
campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
Output (desired target):
- y:has the client subscribed to a term deposit? (binary)
Predict if the customer will subscribe (yes/no) to a term deposit (variable y)
Hit %81 or above accuracy by evaluating with 5-fold cross validation and reporting the average performance score.
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── LICENSE
Run project using python src/models/train_model.py
Project based on the cookiecutter data science project template. #cookiecutterdatascience