Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to check compatibility with fastparquet #9366

Merged
merged 23 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
74d31f1
WIP: Initial stab at fastparquet tests.
mythrocks Sep 29, 2023
d3394dc
Date tests. Plus minor refactor.
mythrocks Sep 29, 2023
27ab6d9
Date/Time tests.
mythrocks Sep 29, 2023
b309918
Added tests for reading data written with fastparquet.
mythrocks Oct 2, 2023
45000f9
Tests for reading GPU-written files.
mythrocks Oct 2, 2023
cad94bb
Added failing tests for arrays, struct.
mythrocks Oct 5, 2023
725c316
Clarification of failure conditions.
mythrocks Oct 5, 2023
1b141c2
Workaround tests for timestamps.
mythrocks Oct 5, 2023
641fa14
Workaround tests for dates.
mythrocks Oct 5, 2023
ef9c5f1
Miscellaneous fixes:
mythrocks Oct 5, 2023
782e076
Test descriptions.
mythrocks Oct 5, 2023
411612e
Workaround tests for STRUCT, ARRAY, etc.
mythrocks Oct 5, 2023
3624fac
Added xfails for struct/array.
mythrocks Oct 5, 2023
6bcff7a
Updated with concrete fastparquet version.
mythrocks Oct 5, 2023
4a67ef0
Fixed up some xfail messages.
mythrocks Oct 6, 2023
b6b7d19
Fixed another xfail message.
mythrocks Oct 9, 2023
c9f7e2c
Extend date/time margins to Pandas.Timestamp.min and Pandas.Timestamp…
mythrocks Oct 9, 2023
fceafc2
Added dependency to CI scripts, Docker images.
mythrocks Oct 9, 2023
b10a321
Change in tack: Install fastparquet explicitly.
mythrocks Oct 9, 2023
f34eec7
Per #8789, reverted change for Centos Dockerfile.
mythrocks Oct 10, 2023
126b9f4
Removed fastparquet from UDF tests.
mythrocks Oct 10, 2023
fa356f8
Optionally skips fastparquet tests.
mythrocks Oct 10, 2023
c072dc5
Merge remote-tracking branch 'origin/branch-23.12' into fastparquet-c…
mythrocks Oct 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,9 @@ For manual installation, you need to setup your environment:
tests across multiple CPUs to speed up test execution
- findspark
: Adds pyspark to sys.path at runtime
- [fastparquet](https://fastparquet.readthedocs.io)
: A Python library (independent of Apache Spark) for reading/writing Parquet. Used in the
integration tests for checking Parquet read/write compatibility with the RAPIDS plugin.

You can install all the dependencies using `pip` by running the following command:

Expand Down
3 changes: 2 additions & 1 deletion integration_tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ sre_yield
pandas
pyarrow
pytest-xdist >= 2.0.0
findspark
findspark
fastparquet >= 2023.8.0
331 changes: 331 additions & 0 deletions integration_tests/src/main/python/fastparquet_compatibility_test.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.integration.centos
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ RUN export CUDA_VER=`echo ${CUDA_VER} | cut -d '.' -f 1,2` && \
mamba install -y -c conda-forge sre_yield && \
conda clean -ay
# install pytest plugins for xdist parallel run
RUN python -m pip install findspark pytest-xdist pytest-order
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet
mythrocks marked this conversation as resolved.
Show resolved Hide resolved

# Set default java as 1.8.0
ENV JAVA_HOME "/usr/lib/jvm/java-1.8.0-openjdk"
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.integration.rocky
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ RUN export CUDA_VER=`echo ${CUDA_VER} | cut -d '.' -f 1,2` && \
conda install -y -c conda-forge sre_yield && \
conda clean -ay
# install pytest plugins for xdist parallel run
RUN python -m pip install findspark pytest-xdist pytest-order
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet

# Set default java as 1.8.0
ENV JAVA_HOME "/usr/lib/jvm/java-1.8.0-openjdk"
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.integration.ubuntu
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ RUN export CUDA_VER=`echo ${CUDA_VER} | cut -d '.' -f 1,2` && \
conda install -y -c conda-forge sre_yield && \
conda clean -ay
# install pytest plugins for xdist parallel run
RUN python -m pip install findspark pytest-xdist pytest-order
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet

RUN apt install -y inetutils-ping expect

Expand Down
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.ubuntu
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ RUN update-java-alternatives --set /usr/lib/jvm/java-1.8.0-openjdk-amd64

RUN ln -sfn /usr/bin/python3.8 /usr/bin/python
RUN ln -sfn /usr/bin/python3.8 /usr/bin/python3
RUN python -m pip install pytest sre_yield requests pandas pyarrow findspark pytest-xdist pre-commit pytest-order
RUN python -m pip install pytest sre_yield requests pandas pyarrow findspark pytest-xdist pre-commit pytest-order fastparquet

# libnuma1 and libgomp1 are required by ucx packaging
RUN apt install -y inetutils-ping expect wget libnuma1 libgomp1
Expand Down
1 change: 1 addition & 0 deletions jenkins/databricks/init_cudf_udf.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ REQUIRED_PACKAGES=(
pytest-xdist
requests
sre_yield
fastparquet
)

${base}/envs/cudf-udf/bin/mamba install -y \
Expand Down
2 changes: 1 addition & 1 deletion jenkins/databricks/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,4 @@ PYTHON_VERSION=$(${PYSPARK_PYTHON} -c 'import sys; print("python{}.{}".format(sy
# Set the path of python site-packages, and install packages here.
PYTHON_SITE_PACKAGES="$HOME/.local/lib/${PYTHON_VERSION}/site-packages"
# Use "python -m pip install" to make sure pip matches with python.
$PYSPARK_PYTHON -m pip install --target $PYTHON_SITE_PACKAGES pytest sre_yield requests pandas pyarrow findspark pytest-xdist pytest-order
$PYSPARK_PYTHON -m pip install --target $PYTHON_SITE_PACKAGES pytest sre_yield requests pandas pyarrow findspark pytest-xdist pytest-order fastparquet
Loading