Pattern Matching with MATCH_RECOGNIZE

This session discusses how to write SQL queries that detect patterns in streaming data.

Specifically, you will learn about

The MATCH_RECOGNIZE clause defined in SQL:2016
Supported semantics and limitations of Flink's implementation
Examples of how to perform pattern matching on append tables

Slides

Pattern Matching with MATCH_RECOGNIZE

Hands-On Exercise

This exercises will teach you how to detect a pattern in data streams and to perform a computation on the matching set of rows.

Rides and Rest

In this exercise, we want to detect how long taxis rest before they start a new ride.

Use the MATCH_RECOGNIZE clause to detect a pattern of:

a start event,
an arbitrary number of intermediate events for the same taxi but from different rides,
an end event,
and the next start event.

Compute the resting times in minutes.

Click here for hints.

Use a AFTER MATCH SKIP TO LAST variable strategy to include the last start event in the next pattern matching.
Use the built-in TIMESTAMPDIFF function to calculate the difference in minutes.

The output should look similar to:

                   taxiId                ride_start                  ride_end           next_ride_start           minutes_of_rest
                2013000002     2013-01-01 00:00:00.0     2013-01-01 00:06:00.0     2013-01-01 00:16:00.0                        10
                2013000004     2013-01-01 00:00:00.0     2013-01-01 00:08:00.0     2013-01-01 00:13:00.0                         5
                2013000032     2013-01-01 00:00:00.0     2013-01-01 00:05:00.0     2013-01-01 00:06:00.0                         1
                2013000128     2013-01-01 00:01:00.0     2013-01-01 00:04:00.0     2013-01-01 00:12:00.0                         8
                2013000256     2013-01-01 00:02:00.0     2013-01-01 00:10:00.0     2013-01-01 00:10:00.0                         0
                2013000512     2013-01-01 00:03:25.0     2013-01-01 00:04:51.0     2013-01-01 00:10:00.0                         5
                2013000512     2013-01-01 00:10:00.0     2013-01-01 00:13:31.0     2013-01-01 00:14:19.0                         0
                2013001028     2013-01-01 00:05:52.0     2013-01-01 00:13:20.0     2013-01-01 00:14:12.0                         0
                2013000258     2013-01-01 00:02:00.0     2013-01-01 00:08:00.0     2013-01-01 00:11:00.0                         3
                2013002070     2013-01-01 00:08:57.0     2013-01-01 00:12:26.0     2013-01-01 00:13:46.0                         1

Click to see the solution.

SELECT * FROM Rides
MATCH_RECOGNIZE(
  PARTITION BY taxiId
  ORDER BY rideTime
  MEASURES
    START_RIDE.rideTime AS ride_start,
    END_RIDE.rideTime AS ride_end,
    NEXT_RIDE.rideTime AS next_ride_start,
    TIMESTAMPDIFF(MINUTE, END_RIDE.rideTime, NEXT_RIDE.rideTime) AS minutes_of_rest
  AFTER MATCH SKIP TO LAST NEXT_RIDE
  PATTERN (START_RIDE M* END_RIDE NEXT_RIDE)
  DEFINE
    START_RIDE AS START_RIDE.isStart = true,
    M AS M.rideId <> START_RIDE.rideId,
    END_RIDE AS END_RIDE.isStart = false,
    NEXT_RIDE AS NEXT_RIDE.isStart = true
);

The query matches the pattern mentioned above. The variable START_RIDE detects the start event. M defines a greedy set of ride events that don't belong to the same ride of the start event. The END_RIDE detects the end event. The NEXT_RIDE variable defines the following start event for computing the resting time. Once a match has been detected, we compute the difference in minutes and return it.

Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.

Welcome

Setting up the Training Environment

Sessions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pattern Matching with MATCH_RECOGNIZE

Slides

Hands-On Exercise

Rides and Rest

Clone this wiki locally