$ HADOOP_HOME=YourHadoopDir HADOOP_CLASSPATH=emr-tablestore-1.4.2-SNAPSHOT.jar:tablestore-4.1.0-jar-with-dependencies.jar:joda-time-2.9.4.jar bin/hive
$ bin/spark-sql --master local --jars emr-tablestore-1.4.2-SNAPSHOT.jar,tablestore-4.1.0-jar-with-dependencies.jar
Let us prepare a pet
table as an example (which is picked from MySQL)
name | owner | species | sex | birth | death |
---|---|---|---|---|---|
Fluffy | Harold | cat | f | 1993-02-04 | |
Claws | Gwen | cat | m | 1994-03-17 | |
Buffy | Harold | dog | f | 1989-05-13 | |
Fang | Benny | dog | m | 1990-08-27 | |
Bowser | Diane | dog | m | 1979-08-31 | 1995-07-29 |
Chirpy | Gwen | bird | f | 1998-09-11 | |
Whistler | Gwen | bird | 1997-12-09 | ||
Slim | Benny | snake | m | 1996-04-29 | |
Puffball | Diane | hamster | f | 1999-03-30 |
First, we have to prepare an empty pet
table whose primary key is name
of string type.
Second, we create an external table in Hive/SparkSQL.
CREATE EXTERNAL TABLE pet
(name STRING, owner STRING, species STRING, sex STRING, birth STRING, death STRING)
STORED BY 'com.aliyun.openservices.tablestore.hive.TableStoreStorageHandler'
WITH SERDEPROPERTIES(
"tablestore.columns.mapping"="name,owner,species,sex,birth,death")
TBLPROPERTIES (
"tablestore.endpoint"="YourEndpoint",
"tablestore.access_key_id"="YourAccessKeyId",
"tablestore.access_key_secret"="YourAccessKeySecret",
"tablestore.table.name"="pet");
Third, we insert data into this external table.
INSERT INTO pet VALUES("Fluffy", "Harold", "cat", "f", "1993-02-04", null);
INSERT INTO pet VALUES("Claws", "Gwen", "cat", "m", "1994-03-17", null);
INSERT INTO pet VALUES("Buffy", "Harold", "dog", "f", "1989-05-13", null);
INSERT INTO pet VALUES("Fang", "Benny", "dog", "m", "1990-08-27", null);
INSERT INTO pet VALUES("Bowser", "Diane", "dog", "m", "1979-08-31", "1995-07-29");
INSERT INTO pet VALUES("Chirpy", "Gwen", "bird", "f", "1998-09-11", null);
INSERT INTO pet VALUES("Whistler", "Gwen", "bird", null, "1997-12-09", null);
INSERT INTO pet VALUES("Slim", "Benny", "snake", "m", "1996-04-29", null);
INSERT INTO pet VALUES("Puffball", "Diane", "hamster", "f", "1999-03-30", null);
Insertion into TableStore is updating related rows rather than replacing them. So one can map some columns of a table in TableStore to one Hive/SparkSQL table and the others to another Hive/SparkSQL table.
TableStore is an internet-scale NoSQL database.
As most of large-scale NoSQL database, TableStore does not support transactions.
Because of lack of transactions, Hive delete from
and update from
are not applicable on TableStore tables.
Then we can query this pet
table.
> SELECT * FROM pet;
Bowser Diane dog m 1979-08-31 1995-07-29
Buffy Harold dog f 1989-05-13 NULL
Chirpy Gwen bird f 1998-09-11 NULL
Claws Gwen cat m 1994-03-17 NULL
Fang Benny dog m 1990-08-27 NULL
Fluffy Harold cat f 1993-02-04 NULL
Puffball Diane hamster f 1999-03-30 NULL
Slim Benny snake m 1996-04-29 NULL
Whistler Gwen bird NULL 1997-12-09 NULL
> SELECT * FROM pet WHERE birth > "1995-01-01";
Chirpy Gwen bird f 1998-09-11 NULL
Puffball Diane hamster f 1999-03-30 NULL
Slim Benny snake m 1996-04-29 NULL
Whistler Gwen bird NULL 1997-12-09 NULL
-
WITH SERDEPROPERTIES
-
tablestore.columns.mapping
(optional)By default, field names of external tables (in lower case by rules of Hive) are the same as column names (names of primary key columns or attributes) in TableStore. Sometimes, we need them to be different (for examples, dealing with case-sensitivity or charsets). We can then set
tablestore.columns.mapping
, which is a comma-separated string, each item is a column name, in the same order of fields. Please notice: don't leave blank before or after comma.
-
-
TBLPROPERTIES
-
tablestore.endpoint
(required)The endpoint.
-
tablestore.instance
(optional)The instance. If missing, it will be the part between
http://
(orhttps://
) and the first dot. -
tablestore.table.name
(required)The table name in TableStore.
-
tablestore.access_key_id
,tablestore.access_key_secret
(required) andtablestore.security_token
(optional)credential and Security Token.
-
TableStore and Hive supports different sets of data types. Here is the conversion rules (from TableStore (rows) to Hive (columns)):
TINYINT | SMALLINT | INT | BIGINT | FLOAT | DOUBLE | BOOLEAN | STRING | BINARY | |
---|---|---|---|---|---|---|---|---|---|
INTEGER | Yes, loss of precision | Yes, loss of precision | Yes, loss of precision | Yes | Yes, loss of precision | Yes, loss of precision | |||
DOUBLE | Yes, loss of precision | Yes, loss of precision | Yes, loss of precision | Yes, loss of precision | Yes, loss of precision | Yes | |||
BOOLEAN | Yes | ||||||||
STRING | Yes | ||||||||
BINARY | Yes |