Crowdfunding_ETL

For this project, an ETL pipeline was built using Python, Pandas, and Python dictionary methods in order to extract and transform the data. CSV files were created from the transformed data, providing details for creating an ERD and a table schema. This CSV file data was imported into a Postgres database.

To complete this project, the following steps were taken:

Create the Category and Subcategory DataFrames
Create the Campaign DataFrame
Create the Contacts DataFrame
Create the Crowdfunding Database

Create the Category and Subcategory DataFrames

Using the providied Excel file, "crowdfunding.xlsx", a DataFrame was created with the two column names, "category_id" and "category". This new DataFrame was then exported to a new csv called, "category.csv". These steps were repeated for the "subcategory_id" and "subcategory" columns, creating a "subcategory.csv" file.

Campaign DataFrame

Using the same providied Excel file, "crowdfunding.xlsx", a new DataFrame was created with the following columns (including conversions):

cf_id
contact_id
company_name
"blurb", renamed to "description"
"goal", converted to a float
"pledged", converted to a float
outcome
backers_count
country
currency
"launched_at" column, renamed to "launch_date" and the UTC times converted to the datetime format
"deadline" column, renamed to "end_date" and the UTC times converted to the datetime format
"category_id", with unique identification numbers matching those in the "category_id" column of the category DataFrame
"subcategory_id" column, with the unique identification numbers matching those in the "subcategory_id" column of the subcategory DataFrame

This new DataFrame was then converted to a new csv called, "campaign.csv".

Contacts DataFrame

Using Pythons Dictionary methods, data was extracted and transformed from the "contacts.xlsx" Excel file into a new DataFrame. Each row was converted into a dictionary by iteration. The dictionary values from the keys were extracted by using Python list comprehensions. These values were then added to to a new list. A new DataFrame was created contaiing the extracted data and the "name" column was split to show "first_name" and "last_name" columns. Finally, the DataFrame was exported as "contacts.csv".

Crowdfunding Database

Once all the csv files were made, an Entity Relationship Diagram (ERD) was created to connect the Primary Keys and Foreign Keys along with other constraints. This database schema was saved as a Postgres file called "crowdfunding_db_schema.sql". The correlating tables were created in a Postgres database called "crowdfunding_db". The corresponding CSV files were then imported to the SQL tables.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Resources		Resources
ETL_Mini_Project_KEpley_KOkray.ipynb		ETL_Mini_Project_KEpley_KOkray.ipynb
README.md		README.md
crowdfunding_db.sql		crowdfunding_db.sql
crowdfunding_db_schema.jpg		crowdfunding_db_schema.jpg
crowdfunding_db_schema.sql		crowdfunding_db_schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdfunding_ETL

Create the Category and Subcategory DataFrames

Campaign DataFrame

Contacts DataFrame

Crowdfunding Database

About

Releases

Packages

Contributors 2

Languages

kadyepley/Crowdfunding_ETL

Folders and files

Latest commit

History

Repository files navigation

Crowdfunding_ETL

Create the Category and Subcategory DataFrames

Campaign DataFrame

Contacts DataFrame

Crowdfunding Database

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages