- Project Overview
- Objectives
- Tech Stack
- Data Pipeline Design
- Key Features
- Reporting Dashboard
- dbt overview
- References
This project is designed to build an end-to-end data pipeline that generates a data mart for sales reporting using Mage.ai. The goal is to provide daily, weekly, and year-to-date (YTD) sales insights, including critical business metrics such as gross revenue, discounts, cost of goods sold (COGS), and net profit.
The project serves as a take-home test for a Data Engineer Analyst position, focusing on developing efficient data pipelines, applying best practices for data quality, and transforming raw sales data into meaningful reports for business users.
The main objectives of this project are to:
- Ingest and clean raw sales data.
- Create staging and transformation layers to ensure data is structured and ready for analysis.
- Implement daily, weekly, and YTD sales trends for business reporting.
- Present key metrics including gross revenue, total discount applied, COGS, and net profit through a dashboard.
- Mage.ai: For pipeline orchestration and data transformation.
- dbt: For data transformation.
- BigQuery: For data storage and warehousing.
- Looker: For data visualizing.
The pipeline processes raw data, applies transformations, and loads the data into a data mart for reporting. It consists of the following steps:
- Data Ingestion: Fetching raw data from various sources (sheet, google cloud storage, postgres).
- Data Cleaning and Deduplication: Handling missing values, data validation, and deduplication to ensure data quality.
- Staging Area: Loading cleaned data into a staging area for transformation.
- Data Transformation: Using dbt to transform the data for daily, weekly, and YTD reports.
- Data Loading: Loading the transformed data into a data warehouse.
- Reporting: Creating a simple dashboard to present the sales metrics, including:
- Gross Revenue
- Total Discounts Applied
- COGS
- Net Profit
- Daily Sales Trends: Monitor day-to-day sales performance.
- Weekly Sales Trends: Analyze trends over the course of a week.
- YTD Sales: Keep track of year-to-date performance.
- Data Backfilling: Ensures that historical data is accurately processed.
A dashboard is created to present the results of the processed data. Metrics such as gross revenue, total discounts, and net profit are displayed, providing a comprehensive view of the sales performance.