Skip to content

Latest commit

 

History

History
28 lines (20 loc) · 953 Bytes

spark-sql-predicate-pushdown.adoc

File metadata and controls

28 lines (20 loc) · 953 Bytes

Predicate Pushdown

Caution
FIXME

When you execute where operator right after loading a data (into a Dataset), Spark SQL will push the "where" predicate down to the source using a corresponding SQL query with WHERE clause (or whatever is the proper language for the source).

This optimization is called predicate pushdown that pushes down the filtering to a data source engine (rather than dealing with it after the entire dataset has been loaded to Spark’s memory and filtering out records afterwards).

Given the following code:

val df = spark.read
  .format("jdbc")
  .option("url", "jdbc:...")
  .option("dbtable", "people")
  .load()
  .as[Person]
  .where(_.name === "Jacek")

Spark translates it to the following SQL query:

SELECT * FROM people WHERE name = 'Jacek'
Caution
FIXME Show the database logs with the query.