Skip to content

Streaming Data Curriculum Offerings

Richard Hightower edited this page Jul 14, 2017 · 7 revisions

Streaming Data Curriculum Offerings


 

Part 1: Streaming Data Foundations

Topics:

  • Introduce real-time streaming dashboard as preview of final project
  • Understand the tools, processes, and topics
  • Guided walkthrough using notebook environment
  • Accessing data streams

Hands On:

  • Setup environment
  • Clone repo
  • Install libraries
  • Run sample code
  • Jupyter/Databricks notebook (Docker Image?)

Part 2: Creation of Topics, Produces, and Consumers

### Topics:

  • Broker Overview
  • Creating a topic
  • Replication (high level)
  • Partitioning (high level)
  • Producing to a topic
  • Data quality
  • Error handling
  • Data registration
  • Consuming messages from a topic
  • Data quality
  • Error handling
  • Data registration

Hands On

  • Kafka Python; Confluent Kafka Python libraries
  1. Working with an existing topic:
  2. Change replication factor on existing topic, compare speed
  • Modify retention period on topic, see message drop off
  • Modify security policy on who can publish to a topic
  • Creating a new topic
  • Create a new topic from scratch
  • Set replication factor
  • Set partitioning
  • Producing to a topic
  • Handling inbound data quality issues
  • Throwing curveball/handling errors
  • Registering producing a topic with metadata management tool (Nebula)
  • Consuming messages from a topic
  • Dealing with incomplete data
  • Handling inconsistent data (timestamps)
  • Registering consuming messages from a topic with metadata management tool (Nebula)  

Part 3: Transformation

Topics:

  • Filtering
  • Aggregation
  • Replication
  • Partitioning
  • API interaction

Hands On:

  • Creating a new topic
  • Create a new topic from scratch
  • Advanced work with replication/partitioning
  • Filtering/aggregation (Kafka Streams?)  

Part 4: Optimization

Topics:

  • Streaming vs. micro-batching
  • Configuration
  • Troubleshooting
  • Monitoring
  • Cluster management/scaling
  • CI/CD

Hands On:

  • Adding a new node to a cluster
  • Building an interactive real-time dashboard