StreamingSymmetricHashJoinExec
is a binary physical operator that represents a Join
logical operator of two streaming queries at execution time.
Note
|
A binary physical operator ( Read up on BinaryExecNode (and physical operators in general) in The Internals of Spark SQL book. |
Note
|
Read up on Join Logical Operator in The Internals of Spark SQL book. |
StreamingSymmetricHashJoinExec
requires that the join type be Inner
, LeftOuter
, or RightOuter
with the same data types of the left and the right keys.
StreamingSymmetricHashJoinExec
is created exclusively when StreamingJoinStrategy execution planning strategy is requested to plan a logical query plan with a Join
logical operator of two streaming queries with equality predicates (EqualTo
and EqualNullSafe
)
StreamingSymmetricHashJoinExec
is a stateful physical operator that writes to a state store.
The output schema of StreamingSymmetricHashJoinExec
is…FIXME
The output partitioning of StreamingSymmetricHashJoinExec
is…FIXME
StreamingSymmetricHashJoinExec
takes the following to be created:
StreamingSymmetricHashJoinExec
initializes the internal properties.
StreamingSymmetricHashJoinExec
uses the performance metrics of StateStoreWriter.
shouldRunAnotherBatch(newMetadata: OffsetSeqMetadata): Boolean
Note
|
shouldRunAnotherBatch is part of the StateStoreWriter Contract to check whether MicroBatchExecution should run another batch (based on the updated metadata).
|
shouldRunAnotherBatch
…FIXME
Executing Physical Operator (Generating Recipe for Distributed Computation as RDD[InternalRow]) — doExecute
Method
doExecute(): RDD[InternalRow]
Note
|
doExecute is part of SparkPlan Contract to generate the runtime representation of an physical operator as a recipe for distributed computation over internal binary rows on Apache Spark (RDD[InternalRow] ).
|
doExecute
first requests the StreamingQueryManager
for the StateStoreCoordinatorRef to the StateStoreCoordinator
RPC endpoint (for the driver).
doExecute
then requests the SymmetricHashJoinStateManager
for the names of the state stores for the left and right side of the streaming join.
In the end, doExecute
requests the left and right physical operators to execute (generate an RDD) and then stateStoreAwareZipPartitions with processPartitions (and with the StateStoreCoordinatorRef
and the state stores).
processPartitions(
leftInputIter: Iterator[InternalRow],
rightInputIter: Iterator[InternalRow]): Iterator[InternalRow]
processPartitions
…FIXME
Note
|
processPartitions is used exclusively when StreamingSymmetricHashJoinExec physical operator is requested to execute.
|
Name | Description |
---|---|
|
Used exclusively to create a SymmetricHashJoinStateManager |
|
Used when |
|
|
|
|
|
Used exclusively to create a SymmetricHashJoinStateManager |