Skip to content

Latest commit

 

History

History
184 lines (136 loc) · 6.34 KB

File metadata and controls

184 lines (136 loc) · 6.34 KB

Documentation   📑    Hands-on Tutorials   🎯    RisingWave Cloud   🚀    Get Instant Help

Stream processing with RisingWave

In this hands-on workshop, we’ll learn how to process real-time streaming data using SQL in RisingWave. The system we’ll use is RisingWave, an open-source SQL database for processing and managing streaming data. You may not feel unfamiliar with RisingWave’s user experience, as it’s fully wire compatible with PostgreSQL.

RisingWave

We’ll cover the following topics in this Workshop:

  • Why Stream Processing?
  • Stateless computation (Filters, Projections)
  • Stateful Computation (Aggregations, Joins)
  • Data Ingestion and Delivery

RisingWave in 10 Minutes: https://tutorials.risingwave.com/docs/intro

Workshop video:

Project Repository

Homework

Please setup the environment in Getting Started and for the Homework first.

Question 0

This question is just a warm-up to introduce dynamic filter, please attempt it before viewing its solution.

What are the dropoff taxi zones at the latest dropoff times?

For this part, we will use the dynamic filter pattern.

Solution
CREATE MATERIALIZED VIEW latest_dropoff_time AS
    WITH t AS (
        SELECT MAX(tpep_dropoff_datetime) AS latest_dropoff_time
        FROM trip_data
    )
    SELECT taxi_zone.Zone as taxi_zone, latest_dropoff_time
    FROM t,
            trip_data
    JOIN taxi_zone
        ON trip_data.DOLocationID = taxi_zone.location_id
    WHERE trip_data.tpep_dropoff_datetime = t.latest_dropoff_time;

--    taxi_zone    | latest_dropoff_time
-- ----------------+---------------------
--  Midtown Center | 2022-01-03 17:24:54
-- (1 row)

Question 1

Create a materialized view to compute the average, min and max trip time between each taxi zone.

Note that we consider the do not consider a->b and b->a as the same trip pair. So as an example, you would consider the following trip pairs as different pairs:

Yorkville East -> Steinway
Steinway -> Yorkville East

From this MV, find the pair of taxi zones with the highest average trip time. You may need to use the dynamic filter pattern for this.

Bonus (no marks): Create an MV which can identify anomalies in the data. For example, if the average trip time between two zones is 1 minute, but the max trip time is 10 minutes and 20 minutes respectively.

Options:

  1. Yorkville East, Steinway
  2. Murray Hill, Midwood
  3. East Flatbush/Farragut, East Harlem North
  4. Midtown Center, University Heights/Morris Heights

p.s. The trip time between taxi zones does not take symmetricity into account, i.e. A -> B and B -> A are considered different trips. This applies to subsequent questions as well.

Question 2

Recreate the MV(s) in question 1, to also find the number of trips for the pair of taxi zones with the highest average trip time.

Options:

  1. 5
  2. 3
  3. 10
  4. 1

Question 3

From the latest pickup time to 17 hours before, what are the top 3 busiest zones in terms of number of pickups? For example if the latest pickup time is 2020-01-01 17:00:00, then the query should return the top 3 busiest zones from 2020-01-01 00:00:00 to 2020-01-01 17:00:00.

HINT: You can use dynamic filter pattern to create a filter condition based on the latest pickup time.

NOTE: For this question 17 hours was picked to ensure we have enough data to work with.

Options:

  1. Clinton East, Upper East Side North, Penn Station
  2. LaGuardia Airport, Lincoln Square East, JFK Airport
  3. Midtown Center, Upper East Side South, Upper East Side North
  4. LaGuardia Airport, Midtown Center, Upper East Side North

Submitting the solutions

Rewards 🥳

Everyone who completes the homework will get a pen and a sticker, and 5 lucky winners will receive a Tshirt and other secret surprises! We encourage you to share your achievements with this workshop on your socials and look forward to your submissions 😁

See you around!

Solution