Skip to content

Latest commit

 

History

History
60 lines (41 loc) · 2.32 KB

spark-sql-streaming-StreamMetadata.adoc

File metadata and controls

60 lines (41 loc) · 2.32 KB

StreamMetadata

StreamMetadata is a metadata associated with a StreamingQuery (indirectly through StreamExecution).

StreamMetadata takes an ID to be created.

StreamMetadata is created exclusively when StreamExecution is created (with a randomly-generated 128-bit universally unique identifier (UUID)).

StreamMetadata can be persisted to and unpersisted from a JSON file. StreamMetadata uses json4s-jackson library for JSON persistence.

import org.apache.spark.sql.execution.streaming.StreamMetadata
import org.apache.hadoop.fs.Path
val metadataPath = new Path("metadata")

scala> :type spark
org.apache.spark.sql.SparkSession

val hadoopConf = spark.sessionState.newHadoopConf()
val sm = StreamMetadata.read(metadataPath, hadoopConf)

scala> :type sm
Option[org.apache.spark.sql.execution.streaming.StreamMetadata]

Unpersisting StreamMetadata (from JSON File) — read Factory Method

read(
  metadataFile: Path,
  hadoopConf: Configuration): Option[StreamMetadata]

read unpersists StreamMetadata from the given metadataFile file if available.

read returns a StreamMetadata if the metadata file was available and the content could be read in JSON format. Otherwise, read returns None.

Note
read uses org.json4s.jackson.Serialization.read for JSON deserialization.
Note
read is used exclusively when StreamExecution is created (and tries to read the metadata checkpoint file).

Persisting Metadata — write Object Method

write(
  metadata: StreamMetadata,
  metadataFile: Path,
  hadoopConf: Configuration): Unit

write persists the given StreamMetadata to the given metadataFile file in JSON format.

Note
write uses org.json4s.jackson.Serialization.write for JSON serialization.
Note
write is used exclusively when StreamExecution is created (and the metadata checkpoint file is not available).