This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
The project reads sales and product data from CSV files, performs a join operation to aggregate data, and computes total sales per category. It then sorts the results and writes them back to a CSV file. This example serves as a practical demonstration of using Apache Flink for complex data transformations and analytics.
- Data ingestion from CSV files
- Use of POJOs for data representation
- Dataset joins and aggregations
- Custom output formats for writing data
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Apache Flink
- Java Development Kit (JDK)
- Maven or SBT (for building the project)
-
Clone the repository:
https://github.com/airscholar/ApacheFlink-SalesAnalytics.git
-
Navigate to the project directory:
cd ApacheFlink-SalesAnalytics
-
Build the project:
mvn clean install
-
Start your Apache Flink cluster.
-
Submit the Flink job:
flink run -c salesAnalysis.DataBatchJob target/SalesAnalysis-1.0-SNAPSHOT.jar
-
Check the output.
The processed data will be written to the specified output file.