Skip to content

Latest commit

 

History

History
109 lines (76 loc) · 2.67 KB

spark-sql-dataframe-row.adoc

File metadata and controls

109 lines (76 loc) · 2.67 KB

Row

Row is a data abstraction of an ordered collection of fields that can be accessed by an ordinal / an index (aka generic access by ordinal), a name (aka native primitive access) or using Scala’s pattern matching. A Row instance may or may not have a schema.

The traits of Row:

  • length or size - Row knows the number of elements (columns).

  • schema - Row knows the schema

Row belongs to org.apache.spark.sql.Row package.

import org.apache.spark.sql.Row

Field Access

Fields of a Row instance can be accessed by index (starting from 0) using apply or get.

scala> val row = Row(1, "hello")
row: org.apache.spark.sql.Row = [1,hello]

scala> row(1)
res0: Any = hello

scala> row.get(1)
res1: Any = hello
Note
Generic access by ordinal (using apply or get) returns a value of type Any.

You can query for fields with their proper types using getAs with an index

val row = Row(1, "hello")

scala> row.getAs[Int](0)
res1: Int = 1

scala> row.getAs[String](1)
res2: String = hello
Note

FIXME

row.getAs[String](null)

Schema

A Row instance can have a schema defined.

Note
Unless you are instantiating Row yourself (using Row Object), a Row has always a schema.
Note
It is RowEncoder to take care of assigning a schema to a Row when toDF on a Dataset or when instantiating DataFrame through DataFrameReader.

Row Object

Row companion object offers factory methods to create Row instances from a collection of elements (apply), a sequence of elements (fromSeq) and tuples (fromTuple).

scala> Row(1, "hello")
res0: org.apache.spark.sql.Row = [1,hello]

scala> Row.fromSeq(Seq(1, "hello"))
res1: org.apache.spark.sql.Row = [1,hello]

scala> Row.fromTuple((0, "hello"))
res2: org.apache.spark.sql.Row = [0,hello]

Row object can merge Row instances.

scala> Row.merge(Row(1), Row("hello"))
res3: org.apache.spark.sql.Row = [1,hello]

It can also return an empty Row instance.

scala> Row.empty == Row()
res4: Boolean = true

Pattern Matching on Row

Row can be used in pattern matching (since Row Object comes with unapplySeq).

scala> Row.unapplySeq(Row(1, "hello"))
res5: Some[Seq[Any]] = Some(WrappedArray(1, hello))

Row(1, "hello") match { case Row(key: Int, value: String) =>
  key -> value
}