From f2a62b4d75e67664f3a864ed79d63c4f36403c7b Mon Sep 17 00:00:00 2001 From: Adam Reeve Date: Fri, 3 Jan 2025 15:52:27 +1300 Subject: [PATCH 1/2] Add example file with bad repetition levels --- bad_data/ARROW-GH-45185.parquet | Bin 0 -> 1225 bytes bad_data/README.md | 2 ++ 2 files changed, 2 insertions(+) create mode 100644 bad_data/ARROW-GH-45185.parquet diff --git a/bad_data/ARROW-GH-45185.parquet b/bad_data/ARROW-GH-45185.parquet new file mode 100644 index 0000000000000000000000000000000000000000..5c06d5a0805b1e6ae98f7ffdf47bde1bb49a0df8 GIT binary patch literal 1225 zcmY+^4@?_X90%~%QVL^?mC1EI+OcDkuFMv$8)d+R9f(jnD94{UoQf^92$Mlq7=Jc5 zn@A#2+`6SR=$2XCa3B~OP1HFE-B4GjF5TQ8Hj~+C{KEk0vWOx6iQh9(^yOVXckf;9 z{c?HV$LCk*DGB9|MK7TOB8sdai%r&l^(WzpF%4c8L3VEcM`Fl%Do#c~K6_oS1X^zY z))omJ&j+8BLYJn$C`IS@UZ+{H1o~F46~#dR=&w_PLF*S+mco$qjkB>ZJhbQ3 zGDylzFUJAdd{X;z$Q-*@BZDj#D2|8R_s3bGPX5K+^nK8hUL;O{j;o2&g08N)uPEqF z`gl|hy%T-?E1|Dp*FgpJFDvMvVennKB@u?UPaBmm{QJ0G1<9M9#8p66+!&DrnVH-f zLH6*9YpWrb`QsP^`LEBul?*K|_n|e=kz(&$3tbnpt@lHBOPnbMdK13epoYFPA1K#B ze@(AM1A`IPS;5eN;d&|zm&S~*hveSv(+>by-+OK?WO94Dbdde_iP|*Cy`bNk4tcFK zgN2sMw^VxQXdj8(0A0%Np9S6i>KhxO*RJztKwpe_&;b1-*AHjHVAasxO)#|ZTwN9n zpQtLyhNLjnkOO2>f2`UJnS)oP4?=e1={Z4ea^GYw;B*=zYJV^dabbyd<*_`bqNYhheb)m#88bGF=P{hUZS)EQaK0>vjM~IQfm6=UnsQzb4z!ux*)cLX z-2VCO-tmyHQwR(lFQyej!v1c;?LQ@oX>5F2CBG9(dBj2nC<(1K5QVh0uED;`N{KO$ zB2g@DY;McUc&fh9-eeob`ASarC zYgim5v=)a95f&x*VK_qMqsu9U{LeTlj|iW%siOg+t4VcOoxAtg+jMmfhtBGBwsbJ< ic4up2OEaU-PGi$pU8OmnuQZlb(JCRTH*oPd#Qp-P1w8lw literal 0 HcmV?d00001 diff --git a/bad_data/README.md b/bad_data/README.md index 52a4818..0a030a0 100644 --- a/bad_data/README.md +++ b/bad_data/README.md @@ -31,3 +31,5 @@ These are files used for reproducing various bugs that have been reported. * ARROW-GH-41317.parquet: test case of https://github.com/apache/arrow/issues/41317 where all columns have not the same size. * ARROW-GH-43605.parquet: dictionary index page uses rle encoding but 0 as rle bit-width. +* ARROW-GH-45185.parquet: test case of https://github.com/apache/arrow/issues/45185 + where repetition levels start with a 1 instead of 0. From 273c9444e38c24df97c191938eb984a31c4bca00 Mon Sep 17 00:00:00 2001 From: Adam Reeve Date: Thu, 9 Jan 2025 13:33:23 +1300 Subject: [PATCH 2/2] Simplify and improve test file * Reduce row count * Use int32 values * Disable dictionary encoding and statistics * Use correct list structure with logical type annotation --- bad_data/ARROW-GH-45185.parquet | Bin 1225 -> 264 bytes 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/bad_data/ARROW-GH-45185.parquet b/bad_data/ARROW-GH-45185.parquet index 5c06d5a0805b1e6ae98f7ffdf47bde1bb49a0df8..dea95fb1d9d92b89f4185a82fb1f58d1813aca27 100644 GIT binary patch literal 264 zcmXw#!Ab)$5QZl)P1oMUon-?#EW$1dZfOq%1uwm|u;^mz1C)|OOYN@ds(p+;h{ryl ze*_`l%q0KJFq3-vI%eW6Z|^xqpjlfYx&Xa5A>m>HL&1bw5ESSE4f-Ggmw=V2CX_1Q zG#yJqD2^;yS5FWpDpkA9AqXsUO8ai--rc^-Z>pc`>`IS@UZ+{H1o~F46~#dR=&w_PLF*S+mco$qjkB>ZJhbQ3 zGDylzFUJAdd{X;z$Q-*@BZDj#D2|8R_s3bGPX5K+^nK8hUL;O{j;o2&g08N)uPEqF z`gl|hy%T-?E1|Dp*FgpJFDvMvVennKB@u?UPaBmm{QJ0G1<9M9#8p66+!&DrnVH-f zLH6*9YpWrb`QsP^`LEBul?*K|_n|e=kz(&$3tbnpt@lHBOPnbMdK13epoYFPA1K#B ze@(AM1A`IPS;5eN;d&|zm&S~*hveSv(+>by-+OK?WO94Dbdde_iP|*Cy`bNk4tcFK zgN2sMw^VxQXdj8(0A0%Np9S6i>KhxO*RJztKwpe_&;b1-*AHjHVAasxO)#|ZTwN9n zpQtLyhNLjnkOO2>f2`UJnS)oP4?=e1={Z4ea^GYw;B*=zYJV^dabbyd<*_`bqNYhheb)m#88bGF=P{hUZS)EQaK0>vjM~IQfm6=UnsQzb4z!ux*)cLX z-2VCO-tmyHQwR(lFQyej!v1c;?LQ@oX>5F2CBG9(dBj2nC<(1K5QVh0uED;`N{KO$ zB2g@DY;McUc&fh9-eeob`ASarC zYgim5v=)a95f&x*VK_qMqsu9U{LeTlj|iW%siOg+t4VcOoxAtg+jMmfhtBGBwsbJ< ic4up2OEaU-PGi$pU8OmnuQZlb(JCRTH*oPd#Qp-P1w8lw