Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45073: Update encryption test files to correct invalid repetition levels #65

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

adamreeve
Copy link
Contributor

See apache/arrow#45073. This fixes the encryption test files generated by Arrow C++ so that the repetition levels are correct. I tried to avoid changing other properties of these files like the compression and encodings used.

After applying the change to fix the repetition level (apache/arrow#45074), I generated the files by running the parquet-encryption-test tests with the following changes:

diff --git a/cpp/src/parquet/encryption/read_configurations_test.cc b/cpp/src/parquet/encryption/read_configurations_test.cc
index f450f9274c..67eb284272 100644
--- a/cpp/src/parquet/encryption/read_configurations_test.cc
+++ b/cpp/src/parquet/encryption/read_configurations_test.cc
@@ -253,7 +253,8 @@ TEST_P(TestDecryptionConfiguration, TestDecryption) {
   const char* param_file_name = std::get<1>(GetParam());
   // Decrypt parquet file that was generated in write_configurations_test.cc test.
   std::string tmp_file_name = "tmp_" + std::string(param_file_name);
-  std::string file_name = temp_dir->path().ToString() + tmp_file_name;
+  //std::string file_name = temp_dir->path().ToString() + tmp_file_name;
+  std::string file_name = "/home/adam/dev/arrow/cpp/submodules/parquet-testing/fixed-data/" + tmp_file_name;
   if (!fexists(file_name)) {
     std::stringstream ss;
     ss << "File " << file_name << " is missing from temporary dir.";
@@ -267,7 +268,7 @@ TEST_P(TestDecryptionConfiguration, TestDecryption) {
     CheckResults(file_name, decryption_config_num, encryption_config_num);
   }
   // Delete temporary test file.
-  ASSERT_EQ(std::remove(file_name.c_str()), 0);
+  //ASSERT_EQ(std::remove(file_name.c_str()), 0);
 
   // Decrypt parquet file that resides in parquet-testing/data directory.
   file_name = data_file(param_file_name);
diff --git a/cpp/src/parquet/encryption/test_encryption_util.cc b/cpp/src/parquet/encryption/test_encryption_util.cc
index cf863da60a..22537f6abd 100644
--- a/cpp/src/parquet/encryption/test_encryption_util.cc
+++ b/cpp/src/parquet/encryption/test_encryption_util.cc
@@ -207,7 +207,9 @@ void FileEncryptor::EncryptFile(
     std::string file,
     std::shared_ptr<parquet::FileEncryptionProperties> encryption_configurations) {
   WriterProperties::Builder prop_builder;
-  prop_builder.compression(parquet::Compression::UNCOMPRESSED);
+  prop_builder.version(ParquetVersion::PARQUET_2_4);
+  prop_builder.encoding(parquet::Encoding::RLE);
+  prop_builder.compression(parquet::Compression::SNAPPY);
   prop_builder.encryption(encryption_configurations);
   prop_builder.enable_write_page_index();
   std::shared_ptr<WriterProperties> writer_properties = prop_builder.build();
diff --git a/cpp/src/parquet/encryption/test_encryption_util.h b/cpp/src/parquet/encryption/test_encryption_util.h
index 9bfc774278..e0941a62f5 100644
--- a/cpp/src/parquet/encryption/test_encryption_util.h
+++ b/cpp/src/parquet/encryption/test_encryption_util.h
@@ -106,7 +106,7 @@ class FileEncryptor {
  private:
   std::shared_ptr<schema::GroupNode> SetupEncryptionSchema();
 
-  int num_rowgroups_ = 5;
+  int num_rowgroups_ = 1;
   int rows_per_rowgroup_ = 50;
   std::shared_ptr<schema::GroupNode> schema_;
 };
diff --git a/cpp/src/parquet/encryption/write_configurations_test.cc b/cpp/src/parquet/encryption/write_configurations_test.cc
index f27da82694..349113dffa 100644
--- a/cpp/src/parquet/encryption/write_configurations_test.cc
+++ b/cpp/src/parquet/encryption/write_configurations_test.cc
@@ -84,7 +84,8 @@ class TestEncryptionConfiguration : public ::testing::Test {
   void EncryptFile(
       std::shared_ptr<parquet::FileEncryptionProperties> encryption_configurations,
       std::string file_name) {
-    std::string file = temp_dir->path().ToString() + file_name;
+    //std::string file = temp_dir->path().ToString() + file_name;
+    std::string file = "/home/adam/dev/arrow/cpp/submodules/parquet-testing/fixed-data/" + file_name;
     encryptor_.EncryptFile(file, encryption_configurations);
   }
 };

@wgtmac
Copy link
Member

wgtmac commented Dec 19, 2024

Have you changed the compression type? It seems that all files have at least 80% size reduction.

@adamreeve
Copy link
Contributor Author

Have you changed the compression type? It seems that all files have at least 80% size reduction.

No, they were using Snappy before and so I changed the writing code to keep using Snappy. I'm not sure why there's a size reduction. I'll see if I can figure out why.

@adamreeve
Copy link
Contributor Author

It looks like the size difference is caused by this change which stopped writing column metadata at the end of chunks: apache/arrow#43428

If I revert that commit then the file sizes match exactly.

@wgtmac
Copy link
Member

wgtmac commented Dec 19, 2024

Thanks for the investigation! That makes sense.

@wgtmac
Copy link
Member

wgtmac commented Dec 19, 2024

BTW, these files are used by parquet-java: https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java. We need to make sure that these tests won't fail after migrating to the new files.

@adamreeve
Copy link
Contributor Author

adamreeve commented Dec 19, 2024

I tested updating parquet-java to reference my fork of parquet-testing with the new commit SHA and it looks like it worked fine: https://github.com/adamreeve/parquet-java/actions/runs/12407983545

Relevant lines from the log:

2024-12-19T08:12:59.4785089Z [INFO] Running org.apache.parquet.hadoop.ITTestEncryptionOptions
2024-12-19T08:12:59.6635701Z [main] INFO org.apache.parquet.hadoop.InterOpTester - Download interOp file: https://github.com/adamreeve/parquet-testing/raw//ec7e69c/data/uniform_encryption.parquet.encrypted
2024-12-19T08:13:00.0637962Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER
2024-12-19T08:13:00.0640212Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/uniform_encryption.parquet.encrypted UNIFORM_ENCRYPTION
2024-12-19T08:13:00.1636551Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:00.1637374Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:00.1637974Z [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.snappy]
2024-12-19T08:13:00.1638638Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 2 ms. row count = 50
2024-12-19T08:13:00.1639634Z [main] INFO org.apache.parquet.hadoop.InterOpTester - Download interOp file: https://github.com/adamreeve/parquet-testing/raw//ec7e69c/data/encrypt_columns_and_footer.parquet.encrypted
2024-12-19T08:13:00.2638376Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER
2024-12-19T08:13:00.2640507Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER
2024-12-19T08:13:00.3639003Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:00.3640020Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:00.3640918Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 2 ms. row count = 50
2024-12-19T08:13:00.3642429Z [main] INFO org.apache.parquet.hadoop.InterOpTester - Download interOp file: https://github.com/adamreeve/parquet-testing/raw//ec7e69c/data/encrypt_columns_plaintext_footer.parquet.encrypted
2024-12-19T08:13:00.4641416Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER
2024-12-19T08:13:00.4643628Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_plaintext_footer.parquet.encrypted ENCRYPT_COLUMNS_PLAINTEXT_FOOTER
2024-12-19T08:13:00.4645541Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:00.4646870Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:00.4648116Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 2 ms. row count = 50
2024-12-19T08:13:00.5644818Z [main] INFO org.apache.parquet.hadoop.InterOpTester - Download interOp file: https://github.com/adamreeve/parquet-testing/raw//ec7e69c/data/encrypt_columns_and_footer_aad.parquet.encrypted
2024-12-19T08:13:00.6643518Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER
2024-12-19T08:13:00.6645145Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_aad.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_AAD
2024-12-19T08:13:00.6646433Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:00.6647341Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:00.6648182Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 1 ms. row count = 50
2024-12-19T08:13:00.6649429Z [main] INFO org.apache.parquet.hadoop.InterOpTester - Download interOp file: https://github.com/adamreeve/parquet-testing/raw//ec7e69c/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
2024-12-19T08:13:00.8645206Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER
2024-12-19T08:13:00.8647007Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE
2024-12-19T08:13:00.8648598Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: AAD prefix used for file encryption, but not stored in file and not supplied in decryption properties
2024-12-19T08:13:00.8650129Z [main] INFO org.apache.parquet.hadoop.InterOpTester - Download interOp file: https://github.com/adamreeve/parquet-testing/raw//ec7e69c/data/encrypt_columns_and_footer_ctr.parquet.encrypted
2024-12-19T08:13:01.0648089Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER
2024-12-19T08:13:01.0651309Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_ctr.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_CTR
2024-12-19T08:13:01.0653408Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.0654724Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.0655936Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 2 ms. row count = 50
2024-12-19T08:13:01.1670932Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER_AAD
2024-12-19T08:13:01.1688853Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/uniform_encryption.parquet.encrypted UNIFORM_ENCRYPTION
2024-12-19T08:13:01.1691227Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: AAD Prefix set in decryption properties, but was not used for file encryption
2024-12-19T08:13:01.1693178Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER_AAD
2024-12-19T08:13:01.1694960Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER
2024-12-19T08:13:01.1696922Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: AAD Prefix set in decryption properties, but was not used for file encryption
2024-12-19T08:13:01.1698581Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER_AAD
2024-12-19T08:13:01.1700420Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_plaintext_footer.parquet.encrypted ENCRYPT_COLUMNS_PLAINTEXT_FOOTER
2024-12-19T08:13:01.1702742Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: AAD Prefix set in decryption properties, but was not used for file encryption
2024-12-19T08:13:01.1704430Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER_AAD
2024-12-19T08:13:01.1706255Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_aad.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_AAD
2024-12-19T08:13:01.1708080Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.1709420Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.1710660Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 1 ms. row count = 50
2024-12-19T08:13:01.1720595Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER_AAD
2024-12-19T08:13:01.1723229Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE
2024-12-19T08:13:01.1725243Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.1726565Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.1727807Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 1 ms. row count = 50
2024-12-19T08:13:01.1729180Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_KEY_RETRIEVER_AAD
2024-12-19T08:13:01.1730950Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_ctr.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_CTR
2024-12-19T08:13:01.1733172Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: AAD Prefix set in decryption properties, but was not used for file encryption
2024-12-19T08:13:01.1734783Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_EXPLICIT_KEYS
2024-12-19T08:13:01.1736374Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/uniform_encryption.parquet.encrypted UNIFORM_ENCRYPTION
2024-12-19T08:13:01.1738042Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.1739348Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.1740570Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 0 ms. row count = 50
2024-12-19T08:13:01.1742366Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_EXPLICIT_KEYS
2024-12-19T08:13:01.1744132Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER
2024-12-19T08:13:01.1745921Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.1747231Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.1748443Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 0 ms. row count = 50
2024-12-19T08:13:01.1749773Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_EXPLICIT_KEYS
2024-12-19T08:13:01.1751567Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_plaintext_footer.parquet.encrypted ENCRYPT_COLUMNS_PLAINTEXT_FOOTER
2024-12-19T08:13:01.2645740Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.2647074Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.2648335Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 1 ms. row count = 50
2024-12-19T08:13:01.2649733Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_EXPLICIT_KEYS
2024-12-19T08:13:01.2651526Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_aad.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_AAD
2024-12-19T08:13:01.2653755Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.2655483Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.2656733Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 0 ms. row count = 50
2024-12-19T08:13:01.2658108Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_EXPLICIT_KEYS
2024-12-19T08:13:01.2660050Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE
2024-12-19T08:13:01.2662701Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: AAD prefix used for file encryption, but not stored in file and not supplied in decryption properties
2024-12-19T08:13:01.2664453Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration DECRYPT_WITH_EXPLICIT_KEYS
2024-12-19T08:13:01.2666251Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_ctr.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_CTR
2024-12-19T08:13:01.2668106Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.2669442Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.2670709Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 0 ms. row count = 50
2024-12-19T08:13:01.2672419Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration NO_DECRYPTION
2024-12-19T08:13:01.2674017Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/uniform_encryption.parquet.encrypted UNIFORM_ENCRYPTION
2024-12-19T08:13:01.2676142Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: Trying to read file with encrypted footer. No keys available
2024-12-19T08:13:01.2677649Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration NO_DECRYPTION
2024-12-19T08:13:01.2679328Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER
2024-12-19T08:13:01.2681232Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: Trying to read file with encrypted footer. No keys available
2024-12-19T08:13:01.2682848Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration NO_DECRYPTION
2024-12-19T08:13:01.2684577Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_plaintext_footer.parquet.encrypted ENCRYPT_COLUMNS_PLAINTEXT_FOOTER
2024-12-19T08:13:01.2686483Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - RecordReader initialized will read a total of 50 records.
2024-12-19T08:13:01.2687823Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next block
2024-12-19T08:13:01.2689061Z [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 0 ms. row count = 50
2024-12-19T08:13:01.2690335Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration NO_DECRYPTION
2024-12-19T08:13:01.2692129Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_aad.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_AAD
2024-12-19T08:13:01.2694008Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: Trying to read file with encrypted footer. No keys available
2024-12-19T08:13:01.2695459Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration NO_DECRYPTION
2024-12-19T08:13:01.2697671Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE
2024-12-19T08:13:01.2699791Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: Trying to read file with encrypted footer. No keys available
2024-12-19T08:13:01.2701256Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - ==> Decryption configuration NO_DECRYPTION
2024-12-19T08:13:01.2703167Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - --> Read file target/parquet-testing/data/encrypt_columns_and_footer_ctr.parquet.encrypted ENCRYPT_COLUMNS_AND_FOOTER_CTR
2024-12-19T08:13:01.2705125Z [main] INFO org.apache.parquet.hadoop.TestEncryptionOptions - Exception as expected: Trying to read file with encrypted footer. No keys available

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the verification!

cc @pitrou @mapleFU

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind change a "corrupt" file to the bad_file?

Also cc @ggershinsky

@adamreeve
Copy link
Contributor Author

Would you mind change a "corrupt" file to the bad_file?

Just to confirm I've understood what you mean, do you mean add a file to the bad_data directory that has invalid repetition levels? I guess adding an unencrypted file would be best as this problem isn't really related to encryption?

@mapleFU
Copy link
Member

mapleFU commented Jan 3, 2025

You're right, lets move forward

@mapleFU mapleFU merged commit c7cf137 into apache:master Jan 3, 2025
@adamreeve
Copy link
Contributor Author

OK thanks. I'm happy to add a bad_data file too as a separate pull request. Would it also make sense to add some validation of the repetition levels to Arrow C++ and raise an error for such files?

@adamreeve adamreeve deleted the encryption-repetition-fix branch January 3, 2025 02:54
@mapleFU
Copy link
Member

mapleFU commented Jan 3, 2025

Would it also make sense to add some validation of the repetition levels to Arrow C++ and raise an error for such files?

I think so, we can do it separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants