It does cleanup of shuffles, RDDs and broadcasts.
Caution
|
FIXME What does the above sentence really mean? |
It uses a daemon Spark Context Cleaner thread that cleans RDD, shuffle, and broadcast states (using keepCleaning
method).
Caution
|
FIXME Review keepCleaning
|
ShuffleDependencies register themselves for cleanup using ContextCleaner.registerShuffleForCleanup
method.
ContextCleaner uses a Spark context.
-
spark.cleaner.referenceTracking
(default:true
) controls whether to enable or not ContextCleaner as a Spark context initializes. -
spark.cleaner.referenceTracking.blocking
(default:true
) controls whether the cleaning thread will block on cleanup tasks (other than shuffle, which is controlled by thespark.cleaner.referenceTracking.blocking.shuffle
parameter).It is
true
as a workaround to SPARK-3015 Removing broadcast in quick successions causes Akka timeout. -
spark.cleaner.referenceTracking.blocking.shuffle
(default:false
) controls whether the cleaning thread will block on shuffle cleanup tasks.It is
false
as a workaround to SPARK-3139 Akka timeouts from ContextCleaner when cleaning shuffles.