Receiver Input Streams (ReceiverInputDStreams
) are specialized input streams that use receivers to receive data (and hence the name which stands for an InputDStream
with a receiver).
Note
|
Receiver input streams run receivers as long-running tasks that occupy a core per stream. |
ReceiverInputDStream
abstract class defines the following abstract method that custom implementations use to create receivers:
def getReceiver(): Receiver[T]
The receiver is then sent to and run on workers (when ReceiverTracker is started).
Note
|
A fine example of a very minimalistic yet still useful implementation of
|
ReceiverInputDStream
uses ReceiverRateController
when spark.streaming.backpressure.enabled is enabled.
Note
|
Both, Read ReceiverTrackerEndpoint.startReceiver for more details. |
The source code of ReceiverInputDStream
is here at GitHub.
The abstract compute(validTime: Time): Option[RDD[T]]
method (from DStream) uses start time of DStreamGraph, i.e. the start time of StreamingContext, to check whether validTime
input parameter is really valid.
If the time to generate RDDs (validTime
) is earlier than the start time of StreamingContext, an empty BlockRDD
is generated.
Otherwise, ReceiverTracker is requested for all the blocks that have been allocated to this stream for this batch (using ReceiverTracker.getBlocksOfBatch
).
The number of records received for the batch for the input stream (as StreamInputInfo
aka input blocks information) is registered to InputInfoTracker (using InputInfoTracker.reportInfo
).
If all BlockIds have WriteAheadLogRecordHandle
, a WriteAheadLogBackedBlockRDD
is generated. Otherwise, a BlockRDD
is.
Caution
|
FIXME |
Back pressure for input dstreams with receivers can be configured using spark.streaming.backpressure.enabled setting.
Note
|
Back pressure is disabled by default. |