ContinuousDataSourceRDD
is a specialized RDD
(RDD[InternalRow]
) that is used exclusively for the only input RDD (with the input rows) of DataSourceV2ScanExec
leaf physical operator with a ContinuousReader.
ContinuousDataSourceRDD
is created exclusively when DataSourceV2ScanExec
leaf physical operator is requested for the input RDDs (which there is only one actually).
ContinuousDataSourceRDD
uses spark.sql.streaming.continuous.executorQueueSize configuration property for the size of the data queue.
ContinuousDataSourceRDD
uses spark.sql.streaming.continuous.executorPollIntervalMs configuration property for the epochPollIntervalMs.
ContinuousDataSourceRDD
takes the following to be created:
ContinuousDataSourceRDD
uses InputPartition
(of a ContinuousDataSourceRDDPartition
) for preferred host locations (where the input partition reader can run faster).
compute(
split: Partition,
context: TaskContext): Iterator[InternalRow]
Note
|
compute is part of the RDD Contract to compute a given partition.
|
compute
…FIXME
getPartitions: Array[Partition]
Note
|
getPartitions is part of the RDD Contract to specify the partitions to compute.
|
getPartitions
…FIXME