Park Visits Prediction Using Linear Regression

Description

This project involves building a linear regression model to predict the number of visitors to a park based on various weather conditions. The dataset comprises 167 features, including different weather parameters recorded over time. The primary objective is to perform time series analysis and feature selection to develop an accurate predictive model.

Installation

To run this project, ensure you have Python installed on your system. Follow these steps to set up the environment:

Clone the repository:

git clone https://github.com/ShayanHodai/park-visitation.git
cd park-visitation

Create a virtual environment and activate it:

run ./env.sh # It creates a virtual environment and installs required packages. # for ubuntu 20.04
source venv/bin/activate

Usage

To run the project: jupyter-lab park\ visitation.ipynb

Dataset Description

The dataset contains weather data and the number of visitors to a park over different days. It includes 167 features such as temperature, humidity, wind speed, and more. The data is structured as follows:

DATE_CALENDAR: The date of record
ESTIMATED_VISITS: The number of visitors to the park

Visitors distribuation

Number of visitors over time

The final dataset used in this project is created by joining two separate datasets: one containing weather data and the other containing the number of visitors to a park over different days. This merged dataset includes a total of 167 features, such as temperature, humidity, wind speed, and other weather parameters recorded over time The weather data and park visit data were combined based on the date to create a comprehensive dataset for time series analysis and predictive modeling. The merged dataset is the result of weather.csv joining the visitation.csv

Feature Selection

Given the high dimensionality of the dataset with 167 features, feature selection is a crucial step. I employ various techniques to identify the most significant features that contribute to the prediction accuracy. The methods used include:

Correlation Analysis: Identifying and removing highly correlated features to reduce multicollinearity.
number of visitors VS most positive correlated feature and most negative correlated feature

FINDINGS!

People visit the park a lot more on sunny days!

And people are reluctant to go for a walk when it's windy, rainy, and snowy.

Model Training

The linear regression model is trained using the selected features from the feature selection step. The training process includes:

Splitting the dataset into training and testing sets.
Normalizing the data.
Training the linear regression model.
Hyperparameter tuning to optimize the model performance.

Results

The results of the model training and evaluation are documented in the results directory. This includes:

Model performance metrics (RMSE)
Plots of predicted vs actual values
Feature importance analysis

Actual number of visitors vs predicted number of visitors on the test set

Prediction for the other days:

FINDINGS!

on 2022-05-11, park will have a maximum visit of 7160.44

The 11th of May, a perfect spring day with lots of sunshine, brings people to enjoy walking, playing, laughing, and spending time with puppies

on 2023-01-08, park will have a minimum visit of 1887.7

08th of January, it's cold!

Contributing

Contributions to this project are welcome. To contribute, follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

Please ensure your code adheres to the project's coding standards and includes

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or suggestions, please contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Data		Data
docs		docs
images		images
LICENSE		LICENSE
README.md		README.md
env.sh		env.sh
park visitation.ipynb		park visitation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Park Visits Prediction Using Linear Regression

Description

Table of Contents

Installation

Usage

Dataset Description

Feature Selection

Model Training

Results

Contributing

License

Contact

About

Releases

Packages

Languages

License

ShayanHodai/park-visitation

Folders and files

Latest commit

History

Repository files navigation

Park Visits Prediction Using Linear Regression

Description

Table of Contents

Installation

Usage

Dataset Description

Feature Selection

Model Training

Results

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages