Predicate Pushdown

Caution

FIXME

When you execute where operator right after loading a data (into a Dataset), Spark SQL will push the "where" predicate down to the source using a corresponding SQL query with WHERE clause (or whatever is the proper language for the source).

This optimization is called predicate pushdown that pushes down the filtering to a data source engine (rather than dealing with it after the entire dataset has been loaded to Spark’s memory and filtering out records afterwards).

Given the following code:

val df = spark.read
  .format("jdbc")
  .option("url", "jdbc:...")
  .option("dbtable", "people")
  .load()
  .as[Person]
  .where(_.name === "Jacek")

Spark translates it to the following SQL query:

SELECT * FROM people WHERE name = 'Jacek'

Caution

FIXME Show the database logs with the query.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-sql-predicate-pushdown.adoc

spark-sql-predicate-pushdown.adoc

Predicate Pushdown

Files

spark-sql-predicate-pushdown.adoc

Latest commit

History

spark-sql-predicate-pushdown.adoc

File metadata and controls

Predicate Pushdown