This project involves creating a comprehensive Film Industry database using various API services, websites, and data sources. We use movie, TV show, and actor images and data to enhance the application. The API system enables researchers and data teams to programmatically fetch and use the API, websites, data, and images.
This ETL (Extract, Transform, Load) project aims to create a Film Industry database using data from:
- IMDB API
- OMDB API
- Kaggle
The database comprises several dataframes:
movie_votes
revenue
actors
movie_budget
genre
Each dataframe was created by a different team member, who was responsible for extracting and transforming the data to prepare it for loading into a common SQL database. Each member exported their cleaned dataframe into a CSV file to be loaded into SQL via a common pandas script. The dataframes are connected through primary and foreign keys, typically using the IMDB or OMDB code or the movie name.
The following database structure illustrates the relationships between the tables. This relational model is commonly used in data modeling and evolves over time as variables and tables change.
Sources used in this project:
- IMDB
- OMDB
- Kaggle
Data from Kaggle was extracted as flat files, while data from IMDB and OMDB was extracted via their APIs.
In this step, we:
- Cleaned the data
- Reformatted the data
- Saved the transformed data
After cleaning and saving the data as CSV files, we pushed everything to GitHub and established connections in SQL.
During this step, we verified that all files synced correctly and ensured smooth connections between tables.
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
We use SemVer for versioning. For available versions, see the tags on this repository.
- Samuel Roiz - Data Cleaning, API Keys - GitHub
- LaQuita Williams - Data Modeling - GitHub
- Leo Lima - Data Cleaning, API Keys - GitHub
- Kevin Perez - Data Cleaning - GitHub
See the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details.