Skip to content

Commit

Permalink
Enable parquet suites from Spark UT (#11366)
Browse files Browse the repository at this point in the history
* add parquet column index  ut test

Signed-off-by: fejiang <[email protected]>

* change

Signed-off-by: fejiang <[email protected]>

* added parquet suite

Signed-off-by: fejiang <[email protected]>

* pom changed

Signed-off-by: fejiang <[email protected]>

* DeltaEncoding Suite

Signed-off-by: fejiang <[email protected]>

* enable more suites

Signed-off-by: fejiang <[email protected]>

* remove ignored case

Signed-off-by: fejiang <[email protected]>

* format

Signed-off-by: fejiang <[email protected]>

* added ignored cases

Signed-off-by: fejiang <[email protected]>

* change to parquet hadoop version

Signed-off-by: fejiang <[email protected]>

* remove parquet.version

Signed-off-by: fejiang <[email protected]>

* adding scope and classifier

Signed-off-by: fejiang <[email protected]>

* pom remove unused

Signed-off-by: fejiang <[email protected]>

* pom chang3 2.13

Signed-off-by: fejiang <[email protected]>

* add schema suite

Signed-off-by: fejiang <[email protected]>

* remove dataframe

Signed-off-by: fejiang <[email protected]>

* RapidsParquetThriftCompatibilitySuite

Signed-off-by: fejiang <[email protected]>

* ThriftCompaSuite added

Signed-off-by: fejiang <[email protected]>

* more suites but the RowIndexSuite one

Signed-off-by: fejiang <[email protected]>

* formatting issues

Signed-off-by: fejiang <[email protected]>

* exlude SPARK-36803:

Signed-off-by: fejiang <[email protected]>

* setting change

Signed-off-by: fejiang <[email protected]>

* setting change

Signed-off-by: fejiang <[email protected]>

* adjust order

Signed-off-by: fejiang <[email protected]>

* adjust settings

Signed-off-by: fejiang <[email protected]>

* adjust settings

Signed-off-by: fejiang <[email protected]>

* RapidsParquetThriftCompatibilitySuite settings

* known issue added

Signed-off-by: fejiang <[email protected]>

* format new line

Signed-off-by: fejiang <[email protected]>

* known issue added

Signed-off-by: fejiang <[email protected]>

* RapidsParquetDeltaByteArrayEncodingSuite

Signed-off-by: fejiang <[email protected]>

* RapidsParquetAvroCompatibilitySuite

Signed-off-by: fejiang <[email protected]>

* ParquetFiledIdSchemaSuite and Avro suite added

* pom Avro suite modified

* ParquetFileFormatSuite added

* RapidsParquetRebaseDatetimeSuite and QuerySuite added

* RapidsParquetSchemaPruningSuite added

* setting adjust

Signed-off-by: fejiang <[email protected]>

* setting adjust

Signed-off-by: fejiang <[email protected]>

* UT adjuct exclude added

Signed-off-by: fejiang <[email protected]>

* RapidsParquetThriftCompatibilitySuite adjust setting

Signed-off-by: fejiang <[email protected]>

* comment Create parquet table with compression

Signed-off-by: fejiang <[email protected]>

* SPARK_HOME NOT FOUND issue solved.

Signed-off-by: fejiang <[email protected]>

* enabling more suite

Signed-off-by: fejiang <[email protected]>

* remove exclude from RapidsParquetFieldIdIOSuite

Signed-off-by: fejiang <[email protected]>

* formate and remove parquet files

Signed-off-by: fejiang <[email protected]>

* comment setting

Signed-off-by: fejiang <[email protected]>

* pom modified and remove unnecess case

Signed-off-by: fejiang <[email protected]>

---------

Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Co-authored-by: fejiang <[email protected]>
  • Loading branch information
Feng-Jiang28 and fejiang authored Sep 24, 2024
1 parent 510ee83 commit a34f33e
Show file tree
Hide file tree
Showing 21 changed files with 565 additions and 1 deletion.
13 changes: 13 additions & 0 deletions scala2.13/tests/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,19 @@
<version>3.1.0.0-RC2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
<version>${parquet.hadoop.version}</version>
<scope>test</scope>
<classifier>tests</classifier>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>${parquet.hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
</profiles>
Expand Down
13 changes: 13 additions & 0 deletions tests/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,19 @@
<version>3.1.0.0-RC2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
<version>${parquet.hadoop.version}</version>
<scope>test</scope>
<classifier>tests</classifier>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>${parquet.hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
</profiles>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetAvroCompatibilitySuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetAvroCompatibilitySuite
extends ParquetAvroCompatibilitySuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetColumnIndexSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetColumnIndexSuite extends ParquetColumnIndexSuite with RapidsSQLTestsBaseTrait {
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetCompressionCodecPrecedenceSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetCompressionCodecPrecedenceSuite
extends ParquetCompressionCodecPrecedenceSuite
with RapidsSQLTestsBaseTrait {
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetDeltaByteArrayEncodingSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetDeltaByteArrayEncodingSuite
extends ParquetDeltaByteArrayEncodingSuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.{ParquetDeltaEncodingInteger, ParquetDeltaEncodingLong}
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetDeltaEncodingInteger
extends ParquetDeltaEncodingInteger
with RapidsSQLTestsBaseTrait {}

class RapidsParquetDeltaEncodingLong
extends ParquetDeltaEncodingLong
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetDeltaLengthByteArrayEncodingSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetDeltaLengthByteArrayEncodingSuite
extends ParquetDeltaLengthByteArrayEncodingSuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetFieldIdIOSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetFieldIdIOSuite extends ParquetFieldIdIOSuite with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetFieldIdSchemaSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetFieldIdSchemaSuite
extends ParquetFieldIdSchemaSuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatSuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetFileFormatSuite
extends ParquetFileFormatSuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.ParquetInteroperabilitySuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetInteroperabilitySuite
extends ParquetInteroperabilitySuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.execution.datasources.parquet.{ParquetPartitionDiscoverySuite, ParquetV1PartitionDiscoverySuite, ParquetV2PartitionDiscoverySuite}
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetPartitionDiscoverySuite
extends ParquetPartitionDiscoverySuite
with RapidsSQLTestsBaseTrait {}

class RapidsParquetV1PartitionDiscoverySuite
extends ParquetV1PartitionDiscoverySuite
with RapidsSQLTestsBaseTrait {}

class RapidsParquetV2PartitionDiscoverySuite
extends ParquetV2PartitionDiscoverySuite
with RapidsSQLTestsBaseTrait {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*** spark-rapids-shim-json-lines
{"spark": "330"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.rapids.suites

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.execution.datasources.parquet.ParquetProtobufCompatibilitySuite
import org.apache.spark.sql.rapids.utils.RapidsSQLTestsBaseTrait

class RapidsParquetProtobufCompatibilitySuite
extends ParquetProtobufCompatibilitySuite
with RapidsSQLTestsBaseTrait {
override protected def readResourceParquetFile(name: String): DataFrame = {
spark.read.parquet(testFile(name))
}
}
Loading

0 comments on commit a34f33e

Please sign in to comment.