ReceiverInputDStreams - Input Streams with Receivers

Receiver Input Streams (ReceiverInputDStreams) are specialized input streams that use receivers to receive data (and hence the name which stands for an InputDStream with a receiver).

Note	Receiver input streams run receivers as long-running tasks that occupy a core per stream.

ReceiverInputDStream abstract class defines the following abstract method that custom implementations use to create receivers:

def getReceiver(): Receiver[T]

The receiver is then sent to and run on workers (when ReceiverTracker is started).

Note

A fine example of a very minimalistic yet still useful implementation of ReceiverInputDStream class is the pluggable input stream org.apache.spark.streaming.dstream.PluggableInputDStream (the sources on GitHub). It requires a Receiver to be given (by a developer) and simply returns it in getReceiver.

PluggableInputDStream is used by StreamingContext.receiverStream() method.

ReceiverInputDStream uses ReceiverRateController when spark.streaming.backpressure.enabled is enabled.

Note	Both, `start()` and `stop` methods are implemented in `ReceiverInputDStream`, but do nothing. `ReceiverInputDStream` management is left to ReceiverTracker. Read ReceiverTrackerEndpoint.startReceiver for more details.

The source code of ReceiverInputDStream is here at GitHub.

Generate RDDs (using compute method)

The abstract compute(validTime: Time): Option[RDD[T]] method (from DStream) uses start time of DStreamGraph, i.e. the start time of StreamingContext, to check whether validTime input parameter is really valid.

If the time to generate RDDs (validTime) is earlier than the start time of StreamingContext, an empty BlockRDD is generated.

Otherwise, ReceiverTracker is requested for all the blocks that have been allocated to this stream for this batch (using ReceiverTracker.getBlocksOfBatch).

The number of records received for the batch for the input stream (as StreamInputInfo aka input blocks information) is registered to InputInfoTracker (using InputInfoTracker.reportInfo).

If all BlockIds have WriteAheadLogRecordHandle, a WriteAheadLogBackedBlockRDD is generated. Otherwise, a BlockRDD is.

Back Pressure

Caution

FIXME

Back pressure for input dstreams with receivers can be configured using spark.streaming.backpressure.enabled setting.

Note	Back pressure is disabled by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-streaming-receiverinputdstreams.adoc

spark-streaming-receiverinputdstreams.adoc

ReceiverInputDStreams - Input Streams with Receivers

Generate RDDs (using compute method)

Back Pressure

Files

spark-streaming-receiverinputdstreams.adoc

Latest commit

History

spark-streaming-receiverinputdstreams.adoc

File metadata and controls

ReceiverInputDStreams - Input Streams with Receivers

Generate RDDs (using compute method)

Back Pressure