Skip to content

Latest commit

 

History

History
117 lines (76 loc) · 4.74 KB

spark-sql-StreamingQueryManager.adoc

File metadata and controls

117 lines (76 loc) · 4.74 KB

StreamingQueryManager — Streaming Query Management

Note
StreamingQueryManager is an experimental feature of Spark 2.0.0.

A StreamingQueryManager is the Management API for continuous queries per SQLContext.

Note
There is a single StreamingQueryManager instance per SQLContext session.

You can access StreamingQueryManager for the current SQLContext using SQLContext.streams method. It is lazily created when a SQLContext instance starts.

val queries = spark.streams

Initialization

StreamingQueryManager manages the following instances:

  • StateStoreCoordinatorRef (as stateStoreCoordinator)

  • StreamingQueryListenerBus (as listenerBus)

  • activeQueries which is a mutable mapping between query names and StreamingQuery objects.

StreamingQueryListenerBus

Caution
FIXME

startQuery

startQuery(name: String,
  checkpointLocation: String,
  df: DataFrame,
  sink: Sink,
  trigger: Trigger = ProcessingTime(0)): StreamingQuery

startQuery is a private[sql] method to start a StreamingQuery.

Note
It is called exclusively by DataFrameWriter.startStream.
Note
By default, trigger is ProcessingTime(0).

startQuery makes sure that activeQueries internal registry does not contain the query under name. It throws an IllegalArgumentException if it does.

It transforms the LogicalPlan of the input DataFrame df so all StreamingRelation "nodes" become StreamingExecutionRelation. It uses DataSource.createSource(metadataPath) where metadataPath is $checkpointLocation/sources/$nextSourceId. Otherwise, it returns the LogicalPlan untouched.

It finally creates StreamExecution and starts it. It also registers the StreamExecution instance in activeQueries internal registry.

Return All Active Continuous Queries per SQLContext

active: Array[StreamingQuery]

active method returns a collection of StreamingQuery instances for the current SQLContext.

Getting Active Continuous Query By Name

get(name: String): StreamingQuery

get method returns a StreamingQuery by name.

It may throw an IllegalArgumentException when no StreamingQuery exists for the name.

java.lang.IllegalArgumentException: There is no active query with name hello
  at org.apache.spark.sql.StreamingQueryManager$$anonfun$get$1.apply(StreamingQueryManager.scala:59)
  at org.apache.spark.sql.StreamingQueryManager$$anonfun$get$1.apply(StreamingQueryManager.scala:59)
  at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
  at scala.collection.AbstractMap.getOrElse(Map.scala:59)
  at org.apache.spark.sql.StreamingQueryManager.get(StreamingQueryManager.scala:58)
  ... 49 elided

StreamingQueryListener Management - Adding or Removing Listeners

  • addListener(listener: StreamingQueryListener): Unit adds listener to the internal listenerBus.

  • removeListener(listener: StreamingQueryListener): Unit removes listener from the internal listenerBus.

postListenerEvent

postListenerEvent(event: StreamingQueryListener.Event): Unit

postListenerEvent posts a StreamingQueryListener.Event to listenerBus.

StreamingQueryListener

Caution
FIXME

StreamingQueryListener is an interface for listening to query life cycle events, i.e. a query start, progress and termination events.

lastTerminatedQuery - internal barrier

Caution
FIXME Why is lastTerminatedQuery needed?

Used in:

  • awaitAnyTermination

  • awaitAnyTermination(timeoutMs: Long)

They all wait 10 millis before doing the check of lastTerminatedQuery being non-null.

It is set in:

  • resetTerminated() resets lastTerminatedQuery, i.e. sets it to null.

  • notifyQueryTermination(terminatedQuery: StreamingQuery) sets lastTerminatedQuery to be terminatedQuery and notifies all the threads that wait on awaitTerminationLock.

    It is called from StreamExecution.runBatches.