-
Notifications
You must be signed in to change notification settings - Fork 9
Streaming Data Curriculum Offerings
Richard Hightower edited this page Jul 14, 2017
·
7 revisions
- Introduce real-time streaming dashboard as preview of final project
- Understand the tools, processes, and topics
- Guided walkthrough using notebook environment
- Accessing data streams
- Setup environment
- Clone repo
- Install libraries
- Run sample code
- Jupyter/Databricks notebook (Docker Image?)
### Topics:
- Broker Overview
- Creating a topic
- Replication (high level)
- Partitioning (high level)
- Producing to a topic
- Data quality
- Error handling
- Data registration
- Consuming messages from a topic
- Data quality
- Error handling
- Data registration
- Kafka Python; Confluent Kafka Python libraries
- Working with an existing topic:
- Change replication factor on existing topic, compare speed
- Modify retention period on topic, see message drop off
- Modify security policy on who can publish to a topic
- Creating a new topic
- Create a new topic from scratch
- Set replication factor
- Set partitioning
- Producing to a topic
- Handling inbound data quality issues
- Throwing curveball/handling errors
- Registering producing a topic with metadata management tool (Nebula)
- Consuming messages from a topic
- Dealing with incomplete data
- Handling inconsistent data (timestamps)
- Registering consuming messages from a topic with metadata management tool (Nebula)
- Filtering
- Aggregation
- Replication
- Partitioning
- API interaction
- Creating a new topic
- Create a new topic from scratch
- Advanced work with replication/partitioning
- Filtering/aggregation (Kafka Streams?)
- Streaming vs. micro-batching
- Configuration
- Troubleshooting
- Monitoring
- Cluster management/scaling
- CI/CD
- Adding a new node to a cluster
- Building an interactive real-time dashboard