Skip to content

Latest commit

 

History

History
15 lines (9 loc) · 608 Bytes

spark-data-sources.adoc

File metadata and controls

15 lines (9 loc) · 608 Bytes

Data Sources in Spark

Spark can access data from many data sources, including Hadoop Distributed File System (HDFS), Cassandra, HBase, S3 and many more.

Spark offers different APIs to read data based upon the content and the storage.

There are two groups of data based upon the content:

  • binary

  • text

You can also group data by the storage: