Row
is a data abstraction of an ordered collection of fields that can be accessed by an ordinal / an index (aka generic access by ordinal), a name (aka native primitive access) or using Scala’s pattern matching. A Row
instance may or may not have a schema.
The traits of Row
:
-
length
orsize
-Row
knows the number of elements (columns). -
schema
-Row
knows the schema
Row
belongs to org.apache.spark.sql.Row
package.
import org.apache.spark.sql.Row
Fields of a Row
instance can be accessed by index (starting from 0
) using apply
or get
.
scala> val row = Row(1, "hello")
row: org.apache.spark.sql.Row = [1,hello]
scala> row(1)
res0: Any = hello
scala> row.get(1)
res1: Any = hello
Note
|
Generic access by ordinal (using apply or get ) returns a value of type Any .
|
You can query for fields with their proper types using getAs
with an index
val row = Row(1, "hello")
scala> row.getAs[Int](0)
res1: Int = 1
scala> row.getAs[String](1)
res2: String = hello
Note
|
FIXME row.getAs[String](null) |
A Row
instance can have a schema defined.
Note
|
Unless you are instantiating Row yourself (using Row Object), a Row has always a schema.
|
Note
|
It is RowEncoder to take care of assigning a schema to a Row when toDF on a Dataset or when instantiating DataFrame through DataFrameReader.
|
Row
companion object offers factory methods to create Row
instances from a collection of elements (apply
), a sequence of elements (fromSeq
) and tuples (fromTuple
).
scala> Row(1, "hello")
res0: org.apache.spark.sql.Row = [1,hello]
scala> Row.fromSeq(Seq(1, "hello"))
res1: org.apache.spark.sql.Row = [1,hello]
scala> Row.fromTuple((0, "hello"))
res2: org.apache.spark.sql.Row = [0,hello]
Row
object can merge Row
instances.
scala> Row.merge(Row(1), Row("hello"))
res3: org.apache.spark.sql.Row = [1,hello]
It can also return an empty Row
instance.
scala> Row.empty == Row()
res4: Boolean = true
Row
can be used in pattern matching (since Row Object comes with unapplySeq
).
scala> Row.unapplySeq(Row(1, "hello"))
res5: Some[Seq[Any]] = Some(WrappedArray(1, hello))
Row(1, "hello") match { case Row(key: Int, value: String) =>
key -> value
}