Skip to content

Latest commit

 

History

History
74 lines (49 loc) · 2.45 KB

spark-sql-streaming-CommitLog.adoc

File metadata and controls

74 lines (49 loc) · 2.45 KB

CommitLog — HDFSMetadataLog for Offset Commit Log

CommitLog is an HDFSMetadataLog with CommitMetadata metadata.

CommitLog is created exclusively for the offset commit log of StreamExecution.

CommitLog uses CommitMetadata for the metadata with nextBatchWatermarkMs attribute (of type Long and the default 0).

CommitLog writes commit metadata to files with names that are offsets.

$ ls -tr [checkpoint-directory]/commits
0 1 2 3 4 5 6 7 8 9

$ cat [checkpoint-directory]/commits/8
v1
{"nextBatchWatermarkMs": 0}

CommitLog uses 1 for the version.

CommitLog (like the parent HDFSMetadataLog) takes the following to be created:

  • SparkSession

  • Path of the metadata log directory

Serializing Metadata (Writing Metadata to Persistent Storage) — serialize Method

serialize(
  metadata: CommitMetadata,
  out: OutputStream): Unit
Note
serialize is part of HDFSMetadataLog Contract to write a metadata in serialized format.

serialize writes out the version prefixed with v on a single line (e.g. v1) followed by the given CommitMetadata in JSON format.

Deserializing Metadata — deserialize Method

deserialize(in: InputStream): CommitMetadata
Note
deserialize is part of HDFSMetadataLog Contract to deserialize a metadata (from an InputStream).

deserialize simply reads (deserializes) two lines from the given InputStream for version and the nextBatchWatermarkMs attribute.

add Method

add(batchId: Long): Unit

add…​FIXME

Note
add is used when…​FIXME

add Method

add(batchId: Long, metadata: String): Boolean
Note
add is part of MetadataLog Contract to…​FIXME.

add…​FIXME