Skip to content

Latest commit

 

History

History
42 lines (24 loc) · 2.56 KB

spark-streaming-receivedblockhandlers.adoc

File metadata and controls

42 lines (24 loc) · 2.56 KB

ReceivedBlockHandlers

ReceivedBlockHandler represents how to handle the storage of blocks received by receivers.

Note
It is used by ReceiverSupervisorImpl (as the internal receivedBlockHandler).

ReceivedBlockHandler Contract

ReceivedBlockHandler is a private[streaming] trait. It comes with two methods:

  • storeBlock(blockId: StreamBlockId, receivedBlock: ReceivedBlock): ReceivedBlockStoreResult to store a received block as blockId.

  • cleanupOldBlocks(threshTime: Long) to clean up blocks older than threshTime.

Note
cleanupOldBlocks implies that there is a relation between blocks and the time they arrived.

Implementations of ReceivedBlockHandler Contract

There are two implementations of ReceivedBlockHandler contract:

BlockManagerBasedBlockHandler

BlockManagerBasedBlockHandler is the default ReceivedBlockHandler in Spark Streaming.

It uses BlockManager and a receiver’s StorageLevel.

cleanupOldBlocks is not used as blocks are cleared by some other means (FIXME)

putResult returns BlockManagerBasedStoreResult. It uses BlockManager.putIterator to store ReceivedBlock.

WriteAheadLogBasedBlockHandler

WriteAheadLogBasedBlockHandler is used when spark.streaming.receiver.writeAheadLog.enable is true.

It uses BlockManager, a receiver’s streamId and StorageLevel, SparkConf for additional configuration settings, Hadoop Configuration, the checkpoint directory.