Skip to content

Latest commit

 

History

History
70 lines (57 loc) · 2.65 KB

spark-sql-Catalog.adoc

File metadata and controls

70 lines (57 loc) · 2.65 KB

Catalog

It is available as SparkSession.catalog attribute.

scala> spark
res1: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@4daee083

scala> spark.catalog
res2: org.apache.spark.sql.catalog.Catalog = org.apache.spark.sql.internal.CatalogImpl@1b42eb0f

scala> spark.catalog.listTables.show
+------------------+--------+-----------+---------+-----------+
|              name|database|description|tableType|isTemporary|
+------------------+--------+-----------+---------+-----------+
|my_permanent_table| default|       null|  MANAGED|      false|
|              strs|    null|       null|TEMPORARY|       true|
+------------------+--------+-----------+---------+-----------+

The one and only implementation of the Catalog contract is CatalogImpl.

Catalog Contract

package org.apache.spark.sql.catalog

abstract class Catalog {
  def currentDatabase: String
  def setCurrentDatabase(dbName: String): Unit
  def listDatabases(): Dataset[Database]
  def listTables(): Dataset[Table]
  def listTables(dbName: String): Dataset[Table]
  def listFunctions(): Dataset[Function]
  def listFunctions(dbName: String): Dataset[Function]
  def listColumns(tableName: String): Dataset[Column]
  def listColumns(dbName: String, tableName: String): Dataset[Column]
  def createExternalTable(tableName: String, path: String): DataFrame
  def createExternalTable(tableName: String, path: String, source: String): DataFrame
  def createExternalTable(
      tableName: String,
      source: String,
      options: Map[String, String]): DataFrame
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame
  def dropTempView(viewName: String): Unit
  def isCached(tableName: String): Boolean
  def cacheTable(tableName: String): Unit
  def uncacheTable(tableName: String): Unit
  def clearCache(): Unit
  def refreshTable(tableName: String): Unit
  def refreshByPath(path: String): Unit
}

CatalogImpl

CatalogImpl is the one and only Catalog that relies on a per-session SessionCatalog (through SparkSession) to obey the Catalog contract.

spark sql CatalogImpl
Figure 1. CatalogImpl uses SessionCatalog (through SparkSession)

It lives in org.apache.spark.sql.internal package.