Adds documentation for HiveServer2 support

apache · Nov 19, 2024 · 7cf3c28 · 7cf3c28
1 parent 4f87279
commit 7cf3c28
Show file tree

Hide file tree

Showing 22 changed files with 818 additions and 304 deletions.
diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md
@@ -19,6 +19,7 @@
 1. Proxy: Add query parameters and check for mysql kill processId - [#33274](https://github.com/apache/shardingsphere/pull/33274)
 1. Agent: Simplify the use of Agent's Docker Image - [#33356](https://github.com/apache/shardingsphere/pull/33356)
 1. Build: Avoid using `-proc:full` when compiling ShardingSphere with OpenJDK23 - [#33681](https://github.com/apache/shardingsphere/pull/33681)
+1. Doc: Adds documentation for HiveServer2 support - [#33717](https://github.com/apache/shardingsphere/pull/33717)
 
 ### Bug Fixes
 

diff --git a/...ument/content/user-manual/shardingsphere-jdbc/graalvm-native-image/_index.cn.md b/...ument/content/user-manual/shardingsphere-jdbc/graalvm-native-image/_index.cn.md
@@ -289,86 +289,9 @@ Caused by: java.io.UnsupportedEncodingException: Codepage Cp1252 is not supporte
 
 ClickHouse 不支持 ShardingSphere 集成级别的本地事务，XA 事务和 Seata AT 模式事务，更多讨论位于 https://github.com/ClickHouse/clickhouse-docs/issues/2300 。
 
-7. 当需要通过 ShardingSphere JDBC 使用 Hive 方言时，受 https://issues.apache.org/jira/browse/HIVE-28445 影响，
-用户不应该使用 `classifier` 为 `standalone` 的 `org.apache.hive:hive-jdbc:4.0.1`，以避免依赖冲突。
-可能的配置例子如下，
-
-```xml
-<project>
-    <dependencies>
-       <dependency>
-         <groupId>org.apache.shardingsphere</groupId>
-         <artifactId>shardingsphere-jdbc</artifactId>
-         <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-            <groupId>org.apache.shardingsphere</groupId>
-            <artifactId>shardingsphere-infra-database-hive</artifactId>
-            <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-          <groupId>org.apache.shardingsphere</groupId>
-          <artifactId>shardingsphere-parser-sql-hive</artifactId>
-          <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-          <groupId>org.apache.hive</groupId>
-          <artifactId>hive-jdbc</artifactId>
-          <version>4.0.1</version>
-       </dependency>
-       <dependency>
-          <groupId>org.apache.hive</groupId>
-          <artifactId>hive-service</artifactId>
-          <version>4.0.1</version>
-       </dependency>
-       <dependency>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-client-api</artifactId>
-          <version>3.3.6</version>
-       </dependency>
-    </dependencies>
-</project>
-```
-
-这会导致大量的依赖冲突。
-如果用户不希望手动解决潜在的数千行的依赖冲突，可以使用 HiveServer2 JDBC Driver 的 `Thin JAR` 的第三方构建。
-可能的配置例子如下，
-
-```xml
-<project>
-    <dependencies>
-       <dependency>
-         <groupId>org.apache.shardingsphere</groupId>
-         <artifactId>shardingsphere-jdbc</artifactId>
-         <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-            <groupId>org.apache.shardingsphere</groupId>
-            <artifactId>shardingsphere-infra-database-hive</artifactId>
-            <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-          <groupId>org.apache.shardingsphere</groupId>
-          <artifactId>shardingsphere-parser-sql-hive</artifactId>
-          <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-          <groupId>io.github.linghengqian</groupId>
-          <artifactId>hive-server2-jdbc-driver-thin</artifactId>
-          <version>1.5.0</version>
-          <exclusions>
-             <exclusion>
-                <groupId>com.fasterxml.woodstox</groupId>
-                <artifactId>woodstox-core</artifactId>
-             </exclusion>
-          </exclusions>
-       </dependency>
-    </dependencies>
-</project>
-```
-
-受 https://github.com/grpc/grpc-java/issues/10601 影响，用户如果在项目中引入了 `org.apache.hive:hive-jdbc`，
+7. 受 https://github.com/grpc/grpc-java/issues/10601 影响，用户如果在项目中引入了 `org.apache.hive:hive-jdbc`，
 则需要在项目的 classpath 的 `META-INF/native-image/io.grpc/grpc-netty-shaded` 文件夹下创建包含如下内容的文件 `native-image.properties`，
+
 ```properties
 Args=--initialize-at-run-time=\
     io.grpc.netty.shaded.io.netty.channel.ChannelHandlerMask,\
@@ -400,55 +323,6 @@ Args=--initialize-at-run-time=\
     io.grpc.netty.shaded.io.netty.util.AttributeKey
 ```
 
-为了能够使用 `delete` 等 DML SQL 语句，当连接到 HiveServer2 时，
-用户应当考虑在 ShardingSphere JDBC 中仅使用支持 ACID 的表。`apache/hive` 提供了多种事务解决方案。
-
-第1种选择是使用 ACID 表，可能的建表流程如下。
-由于其过时的基于目录的表格式，用户可能不得不在 DML 语句执行前后进行等待，以让 HiveServer2 完成低效的 DML 操作。
-
-```sql
-set metastore.compactor.initiator.on=true;
-set metastore.compactor.cleaner.on=true;
-set metastore.compactor.worker.threads=5;
-
-set hive.support.concurrency=true;
-set hive.exec.dynamic.partition.mode=nonstrict;
-set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
-
-CREATE TABLE IF NOT EXISTS t_order
-(
-    order_id   BIGINT,
-    order_type INT,
-    user_id    INT    NOT NULL,
-    address_id BIGINT NOT NULL,
-    status     VARCHAR(50),
-    PRIMARY KEY (order_id) disable novalidate
-) CLUSTERED BY (order_id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional' = 'true');
-```
-
-第2种选择是使用 Iceberg 表，可能的建表流程如下。
-Apache Iceberg 表格式有望在未来几年取代传统的 Hive 表格式，
-参考 https://blog.cloudera.com/from-hive-tables-to-iceberg-tables-hassle-free/ 。
-
-```sql
-set iceberg.mr.schema.auto.conversion=true;
-
-CREATE TABLE IF NOT EXISTS t_order
-(
-    order_id   BIGINT,
-    order_type INT,
-    user_id    INT    NOT NULL,
-    address_id BIGINT NOT NULL,
-    status     VARCHAR(50),
-    PRIMARY KEY (order_id) disable novalidate
-) STORED BY ICEBERG STORED AS ORC TBLPROPERTIES ('format-version' = '2');
-```
-
-由于 HiveServer2 JDBC Driver 未实现 `java.sql.DatabaseMetaData#getURL()`，
-ShardingSphere 做了模糊处理，因此用户暂时仅可通过 HikariCP 连接 HiveServer2。
-
-HiveServer2 不支持 ShardingSphere 集成级别的本地事务，XA 事务和 Seata AT 模式事务，更多讨论位于 https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions 。
-
 8. 由于 https://github.com/oracle/graal/issues/7979 的影响，
 对应 `com.oracle.database.jdbc:ojdbc8` Maven 模块的 Oracle JDBC Driver 无法在 GraalVM Native Image 下使用。
 

diff --git a/...ument/content/user-manual/shardingsphere-jdbc/graalvm-native-image/_index.en.md b/...ument/content/user-manual/shardingsphere-jdbc/graalvm-native-image/_index.en.md
@@ -302,88 +302,10 @@ Possible configuration examples are as follows,
 ClickHouse does not support local transactions, XA transactions, and Seata AT mode transactions at the ShardingSphere integration level. 
 More discussion is at https://github.com/ClickHouse/clickhouse-docs/issues/2300 .
 
-7. When using the Hive dialect through ShardingSphere JDBC, affected by https://issues.apache.org/jira/browse/HIVE-28445 ,
-   users should not use `org.apache.hive:hive-jdbc:4.0.1` with `classifier` as `standalone` to avoid dependency conflicts.
-   Possible configuration examples are as follows,
-
-```xml
-<project>
-   <dependencies>
-      <dependency>
-         <groupId>org.apache.shardingsphere</groupId>
-         <artifactId>shardingsphere-jdbc</artifactId>
-         <version>${shardingsphere.version}</version>
-      </dependency>
-      <dependency>
-         <groupId>org.apache.shardingsphere</groupId>
-         <artifactId>shardingsphere-infra-database-hive</artifactId>
-         <version>${shardingsphere.version}</version>
-      </dependency>
-      <dependency>
-         <groupId>org.apache.shardingsphere</groupId>
-         <artifactId>shardingsphere-parser-sql-hive</artifactId>
-         <version>${shardingsphere.version}</version>
-      </dependency>
-      <dependency>
-         <groupId>org.apache.hive</groupId>
-         <artifactId>hive-jdbc</artifactId>
-         <version>4.0.1</version>
-      </dependency>
-      <dependency>
-         <groupId>org.apache.hive</groupId>
-         <artifactId>hive-service</artifactId>
-         <version>4.0.1</version>
-      </dependency>
-      <dependency>
-         <groupId>org.apache.hadoop</groupId>
-         <artifactId>hadoop-client-api</artifactId>
-         <version>3.3.6</version>
-      </dependency>
-   </dependencies>
-</project>
-```
-
-This can lead to a large number of dependency conflicts.
-If the user does not want to manually resolve potentially thousands of lines of dependency conflicts, 
-a third-party build of the HiveServer2 JDBC Driver `Thin JAR` can be used.
-An example of a possible configuration is as follows,
-
-```xml
-<project>
-    <dependencies>
-       <dependency>
-         <groupId>org.apache.shardingsphere</groupId>
-         <artifactId>shardingsphere-jdbc</artifactId>
-         <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-            <groupId>org.apache.shardingsphere</groupId>
-            <artifactId>shardingsphere-infra-database-hive</artifactId>
-            <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-          <groupId>org.apache.shardingsphere</groupId>
-          <artifactId>shardingsphere-parser-sql-hive</artifactId>
-          <version>${shardingsphere.version}</version>
-       </dependency>
-       <dependency>
-          <groupId>io.github.linghengqian</groupId>
-          <artifactId>hive-server2-jdbc-driver-thin</artifactId>
-          <version>1.5.0</version>
-          <exclusions>
-             <exclusion>
-                <groupId>com.fasterxml.woodstox</groupId>
-                <artifactId>woodstox-core</artifactId>
-             </exclusion>
-          </exclusions>
-       </dependency>
-    </dependencies>
-</project>
-```
-
-Affected by https://github.com/grpc/grpc-java/issues/10601 , should users incorporate `org.apache.hive:hive-service` into their project,
+7. Affected by https://github.com/grpc/grpc-java/issues/10601 , should users incorporate `org.apache.hive:hive-jdbc` into their project,
 it is imperative to create a file named `native-image.properties` within the directory `META-INF/native-image/io.grpc/grpc-netty-shaded` of the classpath,
 containing the following content,
+
 ```properties
 Args=--initialize-at-run-time=\
     io.grpc.netty.shaded.io.netty.channel.ChannelHandlerMask,\
@@ -415,57 +337,6 @@ Args=--initialize-at-run-time=\
     io.grpc.netty.shaded.io.netty.util.AttributeKey
 ```
 
-In order to be able to use DML SQL statements such as `delete`, when connecting to HiveServer2,
-users should consider using only ACID-supported tables in ShardingSphere JDBC. `apache/hive` provides a variety of transaction solutions.
-
-The first option is to use ACID tables, and the possible table creation process is as follows.
-Due to its outdated catalog-based table format, 
-users may have to wait before and after DML statement execution to let HiveServer2 complete the inefficient DML operations.
-
-```sql
-set metastore.compactor.initiator.on=true;
-set metastore.compactor.cleaner.on=true;
-set metastore.compactor.worker.threads=5;
-
-set hive.support.concurrency=true;
-set hive.exec.dynamic.partition.mode=nonstrict;
-set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
-
-CREATE TABLE IF NOT EXISTS t_order
-(
-    order_id   BIGINT,
-    order_type INT,
-    user_id    INT    NOT NULL,
-    address_id BIGINT NOT NULL,
-    status     VARCHAR(50),
-    PRIMARY KEY (order_id) disable novalidate
-) CLUSTERED BY (order_id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional' = 'true');
-```
-
-The second option is to use Iceberg table. The possible table creation process is as follows.
-Apache Iceberg table format is poised to replace the traditional Hive table format in the coming years, 
-see https://blog.cloudera.com/from-hive-tables-to-iceberg-tables-hassle-free/ .
-
-```sql
-set iceberg.mr.schema.auto.conversion=true;
-
-CREATE TABLE IF NOT EXISTS t_order
-(
-    order_id   BIGINT,
-    order_type INT,
-    user_id    INT    NOT NULL,
-    address_id BIGINT NOT NULL,
-    status     VARCHAR(50),
-    PRIMARY KEY (order_id) disable novalidate
-) STORED BY ICEBERG STORED AS ORC TBLPROPERTIES ('format-version' = '2');
-```
-
-Since HiveServer2 JDBC Driver does not implement `java.sql.DatabaseMetaData#getURL()`, 
-ShardingSphere has done some obfuscation, so users can only connect to HiveServer2 through HikariCP for now.
-
-HiveServer2 does not support local transactions, XA transactions, and Seata AT mode transactions at the ShardingSphere integration level. 
-More discussion is available at https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions .
-
 8. Due to https://github.com/oracle/graal/issues/7979 , 
 the Oracle JDBC Driver corresponding to the `com.oracle.database.jdbc:ojdbc8` Maven module cannot be used under GraalVM Native Image.