Caution
|
FIXME |
When you execute where operator right after loading a data (into a Dataset
), Spark SQL will push the "where" predicate down to the source using a corresponding SQL query with WHERE
clause (or whatever is the proper language for the source).
This optimization is called predicate pushdown that pushes down the filtering to a data source engine (rather than dealing with it after the entire dataset has been loaded to Spark’s memory and filtering out records afterwards).
Given the following code:
val df = spark.read
.format("jdbc")
.option("url", "jdbc:...")
.option("dbtable", "people")
.load()
.as[Person]
.where(_.name === "Jacek")
Spark translates it to the following SQL query:
SELECT * FROM people WHERE name = 'Jacek'
Caution
|
FIXME Show the database logs with the query. |