Weather Data Scraping Project

Overview

This project focuses on scraping comprehensive weather data from the website Climate Data (https://en.climate-data.org/). The objective is to gather insights into weather patterns for 350 destinations worldwide over a 30-year span (1991-2021). As of recent updates the new dataset contains data from 3,833 destinations from 6 Continents.

Project Structure

The project is organized as follows:

data/: Contains data files generated during the scraping and cleaning process.
- full_continents_url.json: JSON file with URLs for each city, organized by continent, country, and city.
- corrected_data.csv: Cleaned CSV file with weather data for 3,833 destinations, including Min. Temperature, Max. Temperature, Precipitation/Rainfall, Average Sun Hours, Rainy Days, and more.
- extract_clean_data.csv: Old CSV file version without any negative temperature values.
notebooks/: Contains Jupyter notebooks for analysis.
- analysis.ipynb: Notebook for analyzing the extracted and cleaned weather data.
scripts/: Contains Python scripts for web scraping and data processing.
- extract_urls.py: Initial scraping to obtain URLs for each city, organized in a nested dictionary format (full_continents_url.json).
- extract_clean_data.py: Secondary scraping to access the URLs and extract data from tables. The extracted data includes weather variables such as Min. Temperature, Max. Temperature, Precipitation/Rainfall, Average Sun Hours, Rainy Days, and more. The output is a cleaned CSV file (corrected_data-8.csv). An older version is also available as extract_clean_data.csv.
.gitignore: File to specify files and directories that should be ignored by version control (e.g., __pycache__, *.pyc, *.csv, etc.).
README.md: Project documentation.
requirements.txt: List of Python packages and versions required for the project.

Getting Started

To get started with the project, follow these steps:

Clone the Repository:

git clone https://github.com/allan-gadelha/weather-data-scraping.git
cd weather-data-scraping

Install Dependencies:
```
pip install -r requirementes.txt
```
Run Scripts and Notebooks:
- Run scripts/extract_urls.py to perform the initial scraping and obtain city URLs.
- Run scripts/extract_clean_data.py to perform the secondary scraping, extracting and cleaning weather data.
- Explore and analyze the extracted data using notebooks/analysis.ipynb.
Explore Cleaned Data:
- Find the cleaned weather data in the data/corrected_data.csv file.

Requeriments

Selenium
BeautifulSoup
Pandas
Pprint
Json

Contact

For any questions or issues, feel free to contact the project owner:

Email: Allan Gadelha
GitHub: Allan Gadelha

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
notebooks		notebooks
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requeriments.txt		requeriments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weather Data Scraping Project

Overview

Project Structure

Getting Started

Requeriments

Contact

About

Releases

Packages

Languages

License

allan-gadelha/weather-data-scraping

Folders and files

Latest commit

History

Repository files navigation

Weather Data Scraping Project

Overview

Project Structure

Getting Started

Requeriments

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages