Consumers, these days have become very selective and picky when it comes to selecting a bank and its services. Consumers expect smooth and hassle free transactions. Thus, while selecting or migrating from one bank to the other, consumers often heavily depend on positive word-of-mouth or online reviews to evaluate the bank services. I thus propose to investigate the United States Consumer Complaint Database and perform machine learning algorithms on the dataset to predict whether consumers were satisfied or disputed after registering complaints regarding specific services of banks all over the United States.
Consumers will be interested in this solution while selecting or giving a thought of changing their bank. Additionally, banking companies need to know what their consumers think about them in order to improve their services. Also, companies will be able to track loopholes, for example, find out why a particular area has a lot of credit card fraud detections. The companies will aim at focusing on these complaints, their solutions thus retaining customers and maximizing profits.
The dataset I will be working on is regarding Financial Services United States Consumer Complaint data available at data.gov and Kaggle in the form of comma separated values file format The data consist of 1,048, 575 instances with 18 attributes. PROPOSED METHODOLOGY: Exploratory data analysis will answer questions like: (to name a few) A) Maximum complaints regarding which product? B) Maximum complaints from which area? C) Maximum complaints related to which Bank versus consumers satisfied or not? For predicting whether consumers are satisfied or disputed over the bank services: D) Logistic Regression, Random Forest, Gradient Boosting
The project would be delivered in the form of a presentation with all visualizations and results. In addition, python jupyter notebooks and a report that would detail the approach used including the steps from Data wrangling to the machine learning algorithm used.