This is a case study developed as the final project of a Data Science Bootcamp aiming to evaluate the progress of the course.
The main goal was to predict the profit of a new Startup based on certain features and deciding whether one should invest in a particular startup or not.
The project was divided into 6 sections and is provided with many notes regarding the author's interpretation and her initiation on data analysis. It also includes the .csv file and a directory where the plots were saved. You will see under the sections:
-
Project Definition:
- Study object
- Objective
- Additional Information
- Method
-
Exploratory Data Analysis (EDA):
- Importing/collecting data
- Data cleaning
- Data exploration
- Data preparation
-
Model Building:
- Splitting data
- Predictive modeling
- Decision Tree Regression Model
- Random Forest Regression Model
- Models score
-
Graphical Analysis
- Data Visualization
- Original data interpretation
- Predicted data interpretation
- Original vs. Predicted data
- Models evaluation
-
Linear Regression
- Linear Regression Model
- Model score and evaluation
- Original vs. Predicted data
- Graphical Data Analysis
-
Conclusion
- Comparative table
- Model's scores and errors measurement
This project was built with the following technologies:
- Python --version: 3.7
- Jupyter Notebook --version: 6.2.0
- Pandas --version: 1.2.1
- Numpy --version: 1.19.2
- Matplotlib --version: 3.3.2
- Seaborn --version: 0.11.1
- Scikit learn --version: 0.24.1
It will be necessary to have some basic configurations done on your machine to execute the project.
💡 You will not be able to either access or manipulate the project if you don't have your local server running the Jupyter Notebook
.
Before you get started, you will need to have Python installed on your machine as well as the Jupyter Notebook. There is no need to have any specific code editor or IDE if you wish to work on the code - you can do it directly on the notebooks.
Also, if you only want to snoop into the code and have no interest on changing it, you can skip directly to the topic Executing via cmd and forget about the Installation.
Since it is not externally deployed, you will need to download the startup-success-analysis project into your computer to be able to run it locally.
If you have different versions of the technologies used to create this project, no need to panic! It won't be necessary to uninstall everything. You can always give it a try on running it on you machine with the current versions you already have and see if it works. If something goes wrong - because packages and functions are constantly being updated - you can easily create a virtual environment and then install only what you need.
You can follow the steps below if you need to create and configure a virtual environment. These are for Windows command-line, if you are a Linux or MacOS user, you will need to look for the likely commands. On the Command Prompt (cmd):
# Open the project directory (full path here)
$ cd C:\..\startup-success-analysis-master
# Create the virtual environment (choosing python 3.7)
$ virtualenv venv --python=python3.7
# Access the virtual environment
$ venv\Scripts\activate.bat
After that, you will see a (venv)
sign on the command-line indicating that you are now inside the virtual environment. It should look like this:
$ (venv) C:\..\startup-success-analysis-master>_
Finally, you can install the technologies you need with whatever version you want inside your environment, and it won't affect at all any package you have on your operational system. Use the pip install
for that:
Just a friendly reminders here!
If you have installed Anaconda on your computer at any moment in your life, you probably already have all those packages available for you. Check for their versions and, if you have any troubles executing the Notebooks, create the virtual environment and install the packages usingconda
instead ofpip
.
$ pip install pandas==1.2.1
$ pip install numpy==1.19.2
$ pip install matplotlib==3.3.2
$ pip install seaborn==0.11.1
$ pip install scikit-learn==0.24.1
With these configurations, your machine is now ready to run and manipulate the project.
To run the project and have access to the Notebooks:
# Go to the project directory (full path here)
$ cd C:\..\startup-success-analysis-master
# Open the directory on the Jupyter Notebook
$ jupyter notebook
Just be patient here if your computer takes a little while without seeming to do a thing at all. It will process your inquisition and then open an external page on the web (your default browser) running the server on your localhost. You don't need to enter anything else on the command-line until this page is opened.
Raquel Câmara Porto 🍁