Skip to content

DanielFroehlich/jumpstart-library

 
 

Repository files navigation

Jumpstart Library

This repo contains documentation, source code, workshop contents, howtos…​ for different base techniques, aka "patterns", that can used in Data Science, Data Engineering, AI/ML, Analytics…​

The goal is to provide concrete elements that people will be able to learn from, reproduce, adapt and modify to fit their own needs and requirements.

The library currently features 15 different patterns, as well as 2 demos that are full implementation of scenarios where several patterns are combined.

Patterns

In this folder, you will find the following patterns

  • Pattern 1

  • Pattern 2

  • Pattern 3

Demo 1 - XRay analysis automated pipeline

In this demo, we show how to implement an automated data pipeline for chest Xray analysis:

  • Ingest chest Xrays into an object store based on Ceph.

  • The Object store sends notifications to a Kafka topic.

  • A KNative Eventing Listener to the topic triggers a KNative Serving function.

  • An ML-trained model running in a container makes a risk of Pneumonia assessment for incoming images.

  • A Grafana dashboard displays the pipeline in real time, along with images incoming, processed and anonymized, as well as full metrics.

Demo 2 - Smart City scenario

In this demo, we show how to implement this scenario:

  • Using a trained ML model, licence plates are recognized at toll location.

  • Data (plate number, location, timestamp) is send from toll locations (edge) to the core using Kafka mirroring to handle communication issues and recovery.

  • Incoming data is screened real-time to trigger alerts for wanted vehicles (Amber Alert).

  • Data is aggregated and stored into object storage.

  • A central database contains other information coming from licence registry system: car model, color,…​

  • Data analysis leveraging Presto and Superset is done against stored data (database and object storage).

Demo 3 - Industrial Condition Based Monitoring

In this demo, we show how to implement industrial condition based monitoring using hybrid cloud ML approach. Machine inference-based anomaly detection on metric time-series sensor data at the edge, with a central data lake and ML model retraining. It also shows how hybrid deployments (cluster at the edge and in the cloud) can be managed, and how the CI/CD pipelines and Model Training/Execution flows can be implemented.

Contributing

We recommend to use a virtual python environment to develop for this repository. To create such an environment use

python3 -m venv .venv
source .venv/bin/activate

Then install the dependencies with:

pip -r requirements.txt
Note
This file is generated with pip3 freeze > requirements.txt

Then install the pre-commit hook with

pre-commit install

Now you’re ready to contribute code.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.5%
  • Python 1.3%
  • Other 0.2%