Chukwuemeka Okoli
Practicum by Yandex Project 2
April 16, 2021
Project description
You're an analyst at Crankshaft List. Hundreds of free advertisements for vehicles are published on your site every day.
You need to study data collected over the last few years and determine which factors influence the price of a vehicle.
Guiding Question
What factors influence price of a vehicle?
The objective of this project is to:
- Determine which factors influence the price of a vehicle.
- Apply Exploratory Data Analysis to a real-life analytical case study.
Description of the data
The dataset contains the following fields:
price
model_year
model
condition
cylinders
fuel
— gas, diesel, etc.odometer
— the vehicle's mileage when the ad was publishedtransmission
paint_color
is_4wd
— whether the vehicle has 4-wheel drive (Boolean type)date_posted
— the date the ad was publisheddays_listed
— from publication to removal
- Python
- Jupyter Notebook
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Plotly
- Open the data file and study the general information
- Data preprocessing
- Processing missing values
- Data type replacement
- Make calculations and add them to the table
- Carry out exploratory data analysis
- Overall Conclusion
Introduction
As a Data Scientists working for a car listing site, how do you determine the factors that influence vehicle price. The answer is to look at the data. In this project, a car listing company - Crankshaft List publishes free advertisements everyday, and is hoping you use your analytic knowledge to study data collected over the last few years to assist with business decision making. The goal is to determine which factors influence the price of a vehicle.
Methods
I first inspected the data using the pandas library to obtain general information about the data. I processed the missing values, changed data type, and converted data to the right type. I made calculations and added new features to the data. I investigated the following parametesr - price, vehicle's age when the ad was placed, mileage, number of cylinders, and condition. I plotted histogram for each parameters created. Prior to analyzing the data, I determined the upper limits of outliers and removed them. I used the filtered data to plot new histograms and compared them with the earlier histogram. In analyzing the data, I studied how many days advertisements were displayed (days_listed). I plotted new histogram and calculated the statistics of the data in order to describe the typical lifetime of an ad. I then determine when ads were removed quickly, and when they were listed for an abnormally long time.
I then analyze the number of ads and the average price for each type of vehicle. I studied whether the price depends on age, mileage, condition, transmission type, and color. I plotted box-and-whisker charts, and create scatterplots for the rest using the Matplotlib and Seaborn libraries. Analysis the data was important in answering some of the business needs.
Key Findings
Deployment and Application
Future Development