Hello world! My name is Pierre-Alexandre, and I'm excited to share my still developping data engineer portfolio. Within this repository, you'll find a comprehensive catalog of projects in various data engineering / analytics courses or group projects, each of which covers essential learning techniques and skills.
- Brief overview: The goal of the project was to get to know how Apache Kafka streams works. For training purposes, I used an Azure free B1s machine and simulated the stream of the data to avoid memory problems on the Azure instance. Each event in the simulated streaming was created as a sample JSON file from an existing dataset. The streamed data was uploaded to an Azure Blob Storage and analyzed by a Data Factory,
- Technology used: Azure (VM, Blob Storage, Azure Functions, Azure Devops, Azure Data Lake Analytics, Azure Synapse), Kafka, Docker, python
- Outcome: Stream-processing live flight data with spark and uploading into azure blob storage for analytics using azure synapse
- Brief overview: Apache Airflow was used to schedule and orchestrate ETL pipeline from 3 public API to BigQuery, transform using DBT, and display it on a streamlit webapp. Infra as Code using Pulumi was developped to deploy GCP instances automatically.
- Technology used: GCP (VM, Buckets, BigQuery, Cloud Functions), Apache Airflow, DBT core, Pulumi, python
- Outcome: Batch-processing data from 3 API into Bigquery every 10mn and display trafic on Streamlit dashboard
- Brief overview: In this group project, we used Front end languages to develop a Chatbot as a Chrome extension for Doctolib Website. it answers basic user questions using keywords.
- Technology used: Javascript, CSS, HTML, Google Developper (web extensions)
- Final result: Chrome extension launching only on Doctolib website injecting HTML into the web page for Chatbot interaction
- Brief overview: With my teammates, we predicted the chances of survival of patients entering intensive care unit at the hospital
- Methodology: data cleaning, data analysis, machine learning, visualization , making conclusions
- Technology used: python, pandas , scikit-learn, matplotlib , numpy, latex
- Final results: analysis & visualization