Skip to content

Latest commit

 

History

History
221 lines (148 loc) · 5.36 KB

spark-sql-streaming-CompactibleFileStreamLog.adoc

File metadata and controls

221 lines (148 loc) · 5.36 KB

CompactibleFileStreamLog Contract — Compactible Metadata Logs

CompactibleFileStreamLog is the extension of the HDFSMetadataLog contract for compactible metadata logs that compactLogs every defaultCompactInterval.

Table 1. CompactibleFileStreamLog Contract (Abstract Methods Only)
Method Description

compactLogs

compactLogs(logs: Seq[T]): Seq[T]

Used when CompactibleFileStreamLog is requested to compact and allFiles

defaultCompactInterval

defaultCompactInterval: Int

Used when…​FIXME

fileCleanupDelayMs

fileCleanupDelayMs: Long

Used when…​FIXME

isDeletingExpiredLog

isDeletingExpiredLog: Boolean

Used when…​FIXME

Table 2. CompactibleFileStreamLogs
CompactibleFileStreamLog Description

FileStreamSinkLog

FileStreamSourceLog

CompactibleFileStreamLog uses spark.sql.streaming.minBatchesToRetain configuration property for…​FIXME

CompactibleFileStreamLog takes the following to be created:

  • Version

  • SparkSession

  • Path of the metadata log directory

Note
CompactibleFileStreamLog is a Scala abstract class and cannot be created directly. It is created indirectly for the concrete CompactibleFileStreamLogs.

batchIdToPath Method

batchIdToPath(batchId: Long): Path
Note
batchIdToPath is part of the HDFSMetadataLog Contract to…​FIXME.

batchIdToPath…​FIXME

pathToBatchId Method

pathToBatchId(path: Path): Long
Note
pathToBatchId is part of the HDFSMetadataLog Contract to…​FIXME.

pathToBatchId…​FIXME

isBatchFile Method

isBatchFile(path: Path): Boolean
Note
isBatchFile is part of the HDFSMetadataLog Contract to…​FIXME.

isBatchFile…​FIXME

serialize Method

serialize(logData: Array[T], out: OutputStream): Unit
Note
serialize is part of the HDFSMetadataLog Contract to…​FIXME.

serialize…​FIXME

deserialize Method

deserialize(in: InputStream): Array[T]
Note
deserialize is part of the HDFSMetadataLog Contract to…​FIXME.

deserialize…​FIXME

Storing Metadata For Batch — add Method

add(batchId: Long, logs: Array[T]): Boolean
Note
add is part of the HDFSMetadataLog Contract to store metadata for a batch.

add…​FIXME

allFiles Method

allFiles(): Array[T]

allFiles…​FIXME

Note
allFiles is used when…​FIXME

compact Internal Method

compact(batchId: Long, logs: Array[T]): Boolean

compact…​FIXME

Note
compact is used exclusively when CompactibleFileStreamLog is requested to add.

deleteExpiredLog Internal Method

deleteExpiredLog(currentBatchId: Long): Unit

deleteExpiredLog…​FIXME

Note
deleteExpiredLog is used exclusively when CompactibleFileStreamLog is requested to store metadata for a batch.

getValidBatchesBeforeCompactionBatch Object Method

getValidBatchesBeforeCompactionBatch(
  compactionBatchId: Long,
  compactInterval: Int): Seq[Long]

getValidBatchesBeforeCompactionBatch…​FIXME

Note
getValidBatchesBeforeCompactionBatch is used when…​FIXME

isCompactionBatch Object Method

isCompactionBatch(batchId: Long, compactInterval: Int): Boolean

isCompactionBatch…​FIXME

Note

isCompactionBatch is used when:

Internal Properties

Name Description

compactInterval