Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Some tests failed locally for array indexing with invalid and zero indices #9411

Closed
thirtiseven opened this issue Oct 10, 2023 · 2 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists test Only impacts tests

Comments

@thirtiseven
Copy link
Collaborator

Describe the bug
test_array_item_ansi_fail_invalid_index, test_array_element_at_ansi_fail_invalid_index and test_array_element_at_zero_index_fail failed when running

./integration_tests/run_pyspark_from_build.sh -k 'array_test.py'

locally with latest code under 341.

My TEST_PARALLEL is 4.

Independent case run and TEST_PARALLEL=1 can pass.

Steps/Code to reproduce bug

./integration_tests/run_pyspark_from_build.sh -k 'array_test.py'
==================================================================== FAILURES =====================================================================
__________________________________________________ test_array_item_ansi_fail_invalid_index[100] ___________________________________________________
[gw1] linux -- Python 3.8.3 /home/haoyangl/.pyenv/versions/3.8.3/bin/python

index = 100

    @pytest.mark.parametrize('index', [-2, 100, array_neg_index_gen, array_out_index_gen], ids=idfn)
    def test_array_item_ansi_fail_invalid_index(index):
        message = "SparkArrayIndexOutOfBoundsException" if (is_databricks104_or_later() or is_spark_330_or_later()) else "java.lang.ArrayIndexOutOfBoundsException"
        if isinstance(index, int):
            test_func = lambda spark: unary_op_df(spark, ArrayGen(int_gen)).select(col('a')[index]).collect()
        else:
            test_func = lambda spark: two_col_df(spark, ArrayGen(int_gen), index).selectExpr('a[b]').collect()
>       assert_gpu_and_cpu_error(
            test_func,
            conf=ansi_enabled_conf,
            error_message=message)

../../src/main/python/array_test.py:154:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../src/main/python/asserts.py:631: in assert_gpu_and_cpu_error
    assert_spark_exception(lambda: with_cpu_session(df_fun, conf), error_message)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function assert_gpu_and_cpu_error.<locals>.<lambda> at 0x7fa219696430>, error_message = 'SparkArrayIndexOutOfBoundsException'

    def assert_spark_exception(func, error_message):
        """
        Assert that a specific Java exception is thrown
        :param func: a function to be verified
        :param error_message: a string such as the one produce by java.lang.Exception.toString
        :return: Assertion failure if no exception matching error_message has occurred.
        """
        with pytest.raises(Exception) as excinfo:
>           func()
E           Failed: DID NOT RAISE <class 'Exception'>

../../src/main/python/asserts.py:618: Failed
_______________________________________________ test_array_element_at_ansi_fail_invalid_index[100] ________________________________________________
[gw1] linux -- Python 3.8.3 /home/haoyangl/.pyenv/versions/3.8.3/bin/python

index = 100

    @pytest.mark.parametrize('index', [100, array_out_index_gen], ids=idfn)
    def test_array_element_at_ansi_fail_invalid_index(index):
        message = "ArrayIndexOutOfBoundsException" if is_before_spark_330() else "SparkArrayIndexOutOfBoundsException"
        if isinstance(index, int):
            test_func = lambda spark: unary_op_df(spark, ArrayGen(int_gen)).select(
                element_at(col('a'), index)).collect()
        else:
            test_func = lambda spark: two_col_df(spark, ArrayGen(int_gen), index).selectExpr(
                'element_at(a, b)').collect()
        # For 3.3.0+ strictIndexOperator should not affect element_at
        # `strictIndexOperator` has been removed in Spark3.4+ and Databricks11.3+
        test_conf = ansi_enabled_conf if (is_spark_340_or_later() or is_databricks113_or_later()) else \
            copy_and_update(ansi_enabled_conf, {'spark.sql.ansi.strictIndexOperator': 'false'})
>       assert_gpu_and_cpu_error(
            test_func,
            conf=test_conf,
            error_message=message)

../../src/main/python/array_test.py:266:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../src/main/python/asserts.py:631: in assert_gpu_and_cpu_error
    assert_spark_exception(lambda: with_cpu_session(df_fun, conf), error_message)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function assert_gpu_and_cpu_error.<locals>.<lambda> at 0x7fa24d993c10>, error_message = 'SparkArrayIndexOutOfBoundsException'

    def assert_spark_exception(func, error_message):
        """
        Assert that a specific Java exception is thrown
        :param func: a function to be verified
        :param error_message: a string such as the one produce by java.lang.Exception.toString
        :return: Assertion failure if no exception matching error_message has occurred.
        """
        with pytest.raises(Exception) as excinfo:
>           func()
E           Failed: DID NOT RAISE <class 'Exception'>

../../src/main/python/asserts.py:618: Failed
_________________________________________________ test_array_element_at_zero_index_fail[False-0] __________________________________________________
[gw1] linux -- Python 3.8.3 /home/haoyangl/.pyenv/versions/3.8.3/bin/python

index = 0, ansi_enabled = False

    @pytest.mark.parametrize('index', [0, array_zero_index_gen], ids=idfn)
    @pytest.mark.parametrize('ansi_enabled', [False, True], ids=idfn)
    def test_array_element_at_zero_index_fail(index, ansi_enabled):
        if is_spark_340_or_later():
            message = "org.apache.spark.SparkRuntimeException: [INVALID_INDEX_OF_ZERO] The index 0 is invalid"
        elif is_databricks113_or_later():
            message = "org.apache.spark.SparkRuntimeException: [ELEMENT_AT_BY_INDEX_ZERO] The index 0 is invalid"
        else:
            message = "SQL array indices start at 1"

        if isinstance(index, int):
            test_func = lambda spark: unary_op_df(spark, ArrayGen(int_gen)).select(
                element_at(col('a'), index)).collect()
        else:
            test_func = lambda spark: two_col_df(spark, ArrayGen(int_gen), index).selectExpr(
                'element_at(a, b)').collect()
>       assert_gpu_and_cpu_error(
            test_func,
            conf={'spark.sql.ansi.enabled':ansi_enabled},
            error_message=message)

../../src/main/python/array_test.py:298:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../src/main/python/asserts.py:631: in assert_gpu_and_cpu_error
    assert_spark_exception(lambda: with_cpu_session(df_fun, conf), error_message)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function assert_gpu_and_cpu_error.<locals>.<lambda> at 0x7fa212bc3820>
error_message = 'org.apache.spark.SparkRuntimeException: [INVALID_INDEX_OF_ZERO] The index 0 is invalid'

    def assert_spark_exception(func, error_message):
        """
        Assert that a specific Java exception is thrown
        :param func: a function to be verified
        :param error_message: a string such as the one produce by java.lang.Exception.toString
        :return: Assertion failure if no exception matching error_message has occurred.
        """
        with pytest.raises(Exception) as excinfo:
>           func()
E           Failed: DID NOT RAISE <class 'Exception'>

../../src/main/python/asserts.py:618: Failed

Expected behavior
They should pass.

Environment details (please complete the following information)
Latest code of branch-23.10, Spark 3.4.1

@thirtiseven thirtiseven added bug Something isn't working ? - Needs Triage Need team to review and classify labels Oct 10, 2023
@pxLi pxLi added the test Only impacts tests label Oct 10, 2023
@thirtiseven thirtiseven changed the title [BUG] Some tests failures locally for array indexing with invalid and zero indices [BUG] Some tests failed locally for array indexing with invalid and zero indices Oct 10, 2023
@pxLi
Copy link
Collaborator

pxLi commented Oct 10, 2023

seems side effects of other cases to affect the test configs.

I can repro locally with parallelism as 4 to run array_test only, CI run did not expose this due to specific ordering (lucky)

@jlowe
Copy link
Member

jlowe commented Oct 10, 2023

This is a duplicate of #8652

@jlowe jlowe added the duplicate This issue or pull request already exists label Oct 10, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 10, 2023
@mattahrens mattahrens closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists test Only impacts tests
Projects
None yet
Development

No branches or pull requests

4 participants