Analysis of the baywheels data found and scraped from here: https://s3.amazonaws.com/baywheels-data/index.html
Exploration of the data to see what's possible
Creating new features and wrangling data to create new analytical possibilities (ex: interpolating the cost of a ride based on website pricing information).
Predicting the revenue of a ride or stop by gender and/or age, time, etc.
Determining the best place to position the bikes/stops for maximum revenue
You will need Python 3.7 and pip, the package manager. Please run a virtual environment and install the dependencies listed below:
Please run a virtual environment and install the dependencies listed below:
- pandas
- bs4 (BeautifulSoup)
- Jupyter Notebook
The other packages are native to Python 3.x and should work with the import statements in the code.
All baywheels csv's are scraped from the link at the top of this readme.
These data are scraped and compiled in the baywheels-data folder (not included for repo size issues).
To access the data:
- Run through
scraping to get all the CSVs.ipynb
and save them to your local directory of choice. - From there you can run
master_df_cleaning.ipynb
if you have lots of RAM, otherwise trydev_df_data_cleaning.ipynb
. Note: make sure to update the directory variables to the appropriate path on your computer!
This notebook requires the full dataset which was not uploaded to the repo. You can reconstruct it by using the master_df_cleaning
notebook mentioned above. Alternatively, if you don't want to do any of that or run cells, you can also view Full Pipeline Preview (No Coding Required).pdf
.
The code in here is intended to be standalone.