Skip to content

A group project of four creating data for the film industry. Used Python, SQL, API keys, and data cleaning.

Notifications You must be signed in to change notification settings

samuelroiz/Film_Industry_Research_Project

Repository files navigation

Film Industry Research Project

This project involves creating a comprehensive Film Industry database using various API services, websites, and data sources. We use movie, TV show, and actor images and data to enhance the application. The API system enables researchers and data teams to programmatically fetch and use the API, websites, data, and images.

Project Overview

This ETL (Extract, Transform, Load) project aims to create a Film Industry database using data from:

  • IMDB API
  • OMDB API
  • Kaggle

The database comprises several dataframes:

  • movie_votes
  • revenue
  • actors
  • movie_budget
  • genre

Each dataframe was created by a different team member, who was responsible for extracting and transforming the data to prepare it for loading into a common SQL database. Each member exported their cleaned dataframe into a CSV file to be loaded into SQL via a common pandas script. The dataframes are connected through primary and foreign keys, typically using the IMDB or OMDB code or the movie name.

Data Gathering

Database Structure

Database structure

The following database structure illustrates the relationships between the tables. This relational model is commonly used in data modeling and evolves over time as variables and tables change.

API Keys

Sources used in this project:

  • IMDB
  • OMDB
  • Kaggle

Data from Kaggle was extracted as flat files, while data from IMDB and OMDB was extracted via their APIs.

Extraction of API Keys

Data Transformation

In this step, we:

  • Cleaned the data
  • Reformatted the data
  • Saved the transformed data

Transformation Code

Data Loading

After cleaning and saving the data as CSV files, we pushed everything to GitHub and established connections in SQL.

Loading Code

Data Analysis

During this step, we verified that all files synced correctly and ensured smooth connections between tables.

Analysis Code

Data Table Examples in SQL

Genre Table Example

Genre Table Example

Merge Genre and Movie Budget Table Example

Merge Genre and Movie Budget Table Example

Movie Budget and Movie Votes Table Example

Movie Budget and Movie Votes Table Example

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

Versioning

We use SemVer for versioning. For available versions, see the tags on this repository.

Authors

  • Samuel Roiz - Data Cleaning, API Keys - GitHub
  • LaQuita Williams - Data Modeling - GitHub
  • Leo Lima - Data Cleaning, API Keys - GitHub
  • Kevin Perez - Data Cleaning - GitHub

See the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

About

A group project of four creating data for the film industry. Used Python, SQL, API keys, and data cleaning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •