Skip to content

Latest commit

 

History

History
77 lines (53 loc) · 2.63 KB

spark-sql-streaming-StateStoreCoordinator.adoc

File metadata and controls

77 lines (53 loc) · 2.63 KB

StateStoreCoordinator — Tracking Locations of StateStores for StateStoreRDD

StateStoreCoordinator keeps track of StateStores loaded in Spark executors (across the nodes in a Spark cluster).

The main purpose of StateStoreCoordinator is for StateStoreRDD to get the location preferences for partitions (based on the location of the stores).

StateStoreCoordinator uses instances internal registry of StateStoreProviders by their ids and ExecutorCacheTaskLocations.

StateStoreCoordinator is a ThreadSafeRpcEndpoint RPC endpoint that manipulates instances registry through RPC messages.

Table 1. StateStoreCoordinator RPC Endpoint’s Messages and Message Handlers (in alphabetical order)
Message Message Handler

DeactivateInstances

Removes StateStoreProviderIds (from instances) with queryRunId as runId

You should see the following DEBUG message in the logs:

Deactivating instances related to checkpoint location [runId]: [comma-separated storeIdsToRemove]

GetLocation

Gives the location of StateStoreProviderId (from instances) with the host and an executor id on that host.

You should see the following DEBUG message in the logs:

Got location of the state store [id]: [executorId]

ReportActiveInstance

Registers StateStoreProviderId that is active on an executor (given host and port).

You should see the following DEBUG message in the logs:

Reported state store [id] is active at [executorId]

StopCoordinator

Stops StateStoreCoordinator RPC Endpoint

You should see the following DEBUG message in the logs:

StateStoreCoordinator stopped

VerifyIfInstanceActive

Verifies if StateStoreProviderId is registered (in instances) on executorId

You should see the following DEBUG message in the logs:

Verified that state store [id] is active: [response]
Tip

Enable INFO or DEBUG logging level for org.apache.spark.sql.execution.streaming.state.StateStoreCoordinator to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.streaming.state.StateStoreCoordinator=TRACE

Refer to Logging.