-
Notifications
You must be signed in to change notification settings - Fork 229
Pattern Matching with MATCH_RECOGNIZE
This session discusses how to write SQL queries that detect patterns in streaming data.
Specifically, you will learn about
- The
MATCH_RECOGNIZE
clause defined in SQL:2016 - Supported semantics and limitations of Flink's implementation
- Examples of how to perform pattern matching on append tables
This exercises will teach you how to detect a pattern in data streams and to perform a computation on the matching set of rows.
In this exercise, we want to detect how long taxis rest before they start a new ride.
Use the MATCH_RECOGNIZE
clause to detect a pattern of:
- a start event,
- an arbitrary number of intermediate events for the same taxi but from different rides,
- an end event,
- and the next start event.
Compute the resting times in minutes.
Click here for hints.
- Use a
AFTER MATCH SKIP TO LAST variable
strategy to include the last start event in the next pattern matching. - Use the built-in
TIMESTAMPDIFF
function to calculate the difference in minutes.
The output should look similar to:
taxiId ride_start ride_end next_ride_start minutes_of_rest
2013000002 2013-01-01 00:00:00.0 2013-01-01 00:06:00.0 2013-01-01 00:16:00.0 10
2013000004 2013-01-01 00:00:00.0 2013-01-01 00:08:00.0 2013-01-01 00:13:00.0 5
2013000032 2013-01-01 00:00:00.0 2013-01-01 00:05:00.0 2013-01-01 00:06:00.0 1
2013000128 2013-01-01 00:01:00.0 2013-01-01 00:04:00.0 2013-01-01 00:12:00.0 8
2013000256 2013-01-01 00:02:00.0 2013-01-01 00:10:00.0 2013-01-01 00:10:00.0 0
2013000512 2013-01-01 00:03:25.0 2013-01-01 00:04:51.0 2013-01-01 00:10:00.0 5
2013000512 2013-01-01 00:10:00.0 2013-01-01 00:13:31.0 2013-01-01 00:14:19.0 0
2013001028 2013-01-01 00:05:52.0 2013-01-01 00:13:20.0 2013-01-01 00:14:12.0 0
2013000258 2013-01-01 00:02:00.0 2013-01-01 00:08:00.0 2013-01-01 00:11:00.0 3
2013002070 2013-01-01 00:08:57.0 2013-01-01 00:12:26.0 2013-01-01 00:13:46.0 1
Click to see the solution.
SELECT * FROM Rides
MATCH_RECOGNIZE(
PARTITION BY taxiId
ORDER BY rideTime
MEASURES
START_RIDE.rideTime AS ride_start,
END_RIDE.rideTime AS ride_end,
NEXT_RIDE.rideTime AS next_ride_start,
TIMESTAMPDIFF(MINUTE, END_RIDE.rideTime, NEXT_RIDE.rideTime) AS minutes_of_rest
AFTER MATCH SKIP TO LAST NEXT_RIDE
PATTERN (START_RIDE M* END_RIDE NEXT_RIDE)
DEFINE
START_RIDE AS START_RIDE.isStart = true,
M AS M.rideId <> START_RIDE.rideId,
END_RIDE AS END_RIDE.isStart = false,
NEXT_RIDE AS NEXT_RIDE.isStart = true
);
The query matches the pattern mentioned above. The variable START_RIDE
detects the start event. M
defines a greedy set of ride events that don't belong to the same ride of the start event. The END_RIDE
detects the end event. The NEXT_RIDE
variable defines the following start event for computing the resting time. Once a match has been detected, we compute the difference in minutes and return it.
Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.