Issue with reading timestamps from spark Delta tables #3155

kejtos · 2025-01-23T08:39:35Z

Environment

Delta-rs version: 0.24.0

Binding: Python

Environment:

Cloud provider: Azure Databricks
Runtime: DBR 12.2 LTS
Driver: Standard_DS3_v2

Bug

What happened:

The following code on Databricks:

from deltalake import DeltaTable

DeltaTable('path', storage_options={'allow_unsafe_rename': 'true'}).to_pyarrow_table()

results in the following error:
ComputeError: ArrowInvalid: Casting from timestamp[ns] to timestamp[us, tz=UTC] would lose data: -number

What you expected to happen:

Reading the timestamp in the original format, or coerce the casting (I do not need such precision anyway).

How to reproduce it:
Run any code on databricks DeltaTable that reads timestamp[ns] columns. E.g.,

from deltalake import DeltaTable

DeltaTable('path', storage_options={'allow_unsafe_rename': 'true'}).to_pyarrow_table()

or

from deltalake import DeltaTable
import duckdb

delta_table = DeltaTable('path', storage_options={'allow_unsafe_rename': 'true'}).to_pyarrow_dataset()
quack = duckdb.arrow(delta_table)
print(quack.select("*"))

or

import polars as pl

delta_table = pl.read_table('path', storage_options={'allow_unsafe_rename': 'true'}) # use_pyarrow=True did not help

The text was updated successfully, but these errors were encountered:

ion-elgreco · 2025-01-23T09:17:43Z

@kejtos this because spark still uses an old type for timestamps (int96) which is discouraged. You should set a proper spark config, to prevent your tables getting the wrong timestamp.

SparkSession.config("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS")

To properly read int96 types as microsecond instead is to use parquet_read_options

delta-rs/python/deltalake/table.py

Line 1131 in 523c6d7

    
                       parquet_read_options: Optional read options for Parquet. Use this to handle INT96 to timestamp conversion for edge cases like 0001-01-01 or 9999-12-31

DeltaTable(tmp_path)
        .to_pyarrow_dataset(
            parquet_read_options=ParquetReadOptions(coerce_int96_timestamp_unit="us")
        )

kejtos added the bug Something isn't working label Jan 23, 2025

ion-elgreco changed the title ~~Issue with reading timestamps from databricks Delta tables~~ Issue with reading timestamps from spark Delta tables Jan 23, 2025

ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with reading timestamps from spark Delta tables #3155

Issue with reading timestamps from spark Delta tables #3155

kejtos commented Jan 23, 2025

ion-elgreco commented Jan 23, 2025 •

edited

Loading

Issue with reading timestamps from spark Delta tables #3155

Issue with reading timestamps from spark Delta tables #3155

Comments

kejtos commented Jan 23, 2025

Environment

Bug

ion-elgreco commented Jan 23, 2025 • edited Loading

ion-elgreco commented Jan 23, 2025 •

edited

Loading