ForeachBatchSink

ForeachBatchSink is a streaming sink that is used for the foreachBatch source.

ForeachBatchSink is created exclusively when DataStreamWriter is requested to start execution of the streaming query (with the foreachBatch source).

ForeachBatchSink uses ForeachBatchSink name.

import org.apache.spark.sql.Dataset
val q = spark.readStream
  .format("rate")
  .load
  .writeStream
  .foreachBatch { (output: Dataset[_], batchId: Long) => // <-- creates a ForeachBatchSink
    println(s"Batch ID: $batchId")
    output.show
  }
  .start
// q.stop

scala> println(q.lastProgress.sink.description)
ForeachBatchSink

Note	`ForeachBatchSink` was added in Spark 2.4.0 as part of SPARK-24565 Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame.

Creating ForeachBatchSink Instance

ForeachBatchSink takes the following when created:

Batch writer ((Dataset[T], Long) ⇒ Unit)
Encoder (ExpressionEncoder[T])

Adding Batch — `addBatch` Method

addBatch(batchId: Long, data: DataFrame): Unit

Note	`addBatch` is a part of Sink Contract to "add" a batch of data to the sink.

addBatch…FIXME

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-sql-streaming-ForeachBatchSink.adoc

spark-sql-streaming-ForeachBatchSink.adoc

ForeachBatchSink

Creating ForeachBatchSink Instance

Adding Batch — `addBatch` Method

Files

spark-sql-streaming-ForeachBatchSink.adoc

Latest commit

History

spark-sql-streaming-ForeachBatchSink.adoc

File metadata and controls

ForeachBatchSink

Creating ForeachBatchSink Instance

Adding Batch — addBatch Method

Adding Batch — `addBatch` Method