Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] timestamp scan error #8069

Open
FelixYBW opened this issue Nov 28, 2024 · 3 comments
Open

[VL] timestamp scan error #8069

FelixYBW opened this issue Nov 28, 2024 · 3 comments
Labels
bug Something isn't working triage

Comments

@FelixYBW
Copy link
Contributor

Backend

VL (Velox)

Bug description

E20241125 02:26:31.518885 94417 Exceptions.h:66] Line: /gluten/ep/build-velox/build/velox_ep/./velox/dwio/common/IntDecoder.h:448, Function:readInt, Expression:  , Source: RUNTIME, ErrorCode: NOT_IMPLEMENTED
E20241125 02:26:31.526342 94415 Exceptions.h:66] Line: /gluten/ep/build-velox/build/velox_ep/./velox/dwio/common/IntDecoder.h:448, Function:readInt, Expression:  , Source: RUNTIME, ErrorCode: NOT_IMPLEMENTED
E20241125 02:26:31.533787 94415 Exceptions.h:66] Line: /gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp:598, Function:operator(), Expression:  Operator::getOutput failed for [operator: ValueStream, plan node ID: 0]: Error during calling Java code from native code: org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: NOT_IMPLEMENTED
Retriable: False
Context: Split [Hive: s3://pinterest-dosquebradas/retention-7days/bistaging/content_supply_extended/partitioned_core_content_full_base_recent_v5/0000/0111/1110/00011000/hash_key=3/dt=2024-11-17/00032-871-5b2a5f36-fd8d-4aeb-a033-a4afcd7db7ba-00004.parquet 4 - 139029] Task Gluten_Stage_1_TID_2_VTID_2
Additional Context: Operator: ValueStream[0] 0
Function: readInt
File: /gluten/ep/build-velox/build/velox_ep/./velox/dwio/common/IntDecoder.h
Line: 448
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorENS1_22CompileTimeEmptyStringEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox4dwio6common10IntDecoderILb1EE7readIntInEET_v
# 4  _ZN8facebook5velox7parquet10PageReader11callDecoderINS0_4dwio6common13ColumnVisitorInNS0_6common10AlwaysTrueENS5_15ExtractToReaderELb1EEELi0EEEvPKmRbT_
# 5  _ZN8facebook5velox7parquet10PageReader15readWithVisitorINS0_4dwio6common13ColumnVisitorInNS0_6common10AlwaysTrueENS5_15ExtractToReaderELb1EEEEEvRT_
# 6  _ZN8facebook5velox4dwio6common28SelectiveIntegerColumnReader10readHelperINS0_7parquet19IntegerColumnReaderENS0_6common10AlwaysTrueELb1ENS2_15ExtractToReaderEEEvPNS7_6FilterERKN5folly5RangeIPKiEET2_
# 7  _ZN8facebook5velox7parquet21TimestampColumnReader4readElRKN5folly5RangeIPKiEEPKm
# 8  _ZN8facebook5velox4dwio6common12ColumnLoader12loadInternalEN5folly5RangeIPKiEEPNS0_9ValueHookEiPSt10shared_ptrINS0_10BaseVectorEE
# 9  _ZN8facebook5velox12VectorLoader4loadEN5folly5RangeIPKiEEPNS0_9ValueHookEiPSt10shared_ptrINS0_10BaseVectorEE
# 10 _ZNK8facebook5velox10LazyVector18loadVectorInternalEv
# 11 _ZN8facebook5velox10LazyVector12loadedVectorEv
# 12 _ZN6gluten24WholeStageResultIterator4nextEv
# 13 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 14 0x000000003b6e2427

@rui-mo Do you know the reason? should I create a sample file?

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@FelixYBW FelixYBW added bug Something isn't working triage labels Nov 28, 2024
@rui-mo
Copy link
Contributor

rui-mo commented Nov 28, 2024

@FelixYBW It looks like the INT64 timestamp case, which hasn't been supported. Are we able to check the physical type of the timestamp column perhaps by using parquet-tools?

@FelixYBW
Copy link
Contributor Author

FelixYBW commented Nov 28, 2024

Yes, but can we fallback if a parquet has INT64 timestamp?

@zml1206
Copy link
Contributor

zml1206 commented Nov 29, 2024

Yes, but can we fallback if a parquet has INT64 timestamp?

At present, we can only completely fallback timestamp of parquet, and cannot distinguish whether it is int64 or not.
set spark.gluten.sql.parquet.timestampType.scan.fallback.enabled=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants