Skip to content

Commit

Permalink
Merge branch 'alamb/parquet_coverter' of github.com:alamb/arrow-rs in…
Browse files Browse the repository at this point in the history
…to alamb/parquet_coverter
  • Loading branch information
alamb committed Dec 6, 2024
2 parents cec4f8d + 1535b42 commit 1841283
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions parquet/src/arrow/schema/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ pub(crate) fn add_encoded_arrow_schema_to_metadata(schema: &Schema, props: &mut
/// let parquet_schema = ArrowToParquetSchemaConverter::new(&arrow_schema)
/// .build()
/// .unwrap();
/// //
///
/// let expected_parquet_schema = SchemaDescriptor::new(
/// Arc::new(
/// Type::group_type_builder("arrow_schema")
Expand Down Expand Up @@ -280,13 +280,13 @@ impl<'a> ArrowToParquetSchemaConverter<'a> {
}
}

/// Should arrow types be coerced into parquet native types (default false).
/// Should arrow types be coerced into parquet native types (default `false`).
///
/// Setting this option to `true` will result in parquet files that can be
/// read by more readers, but may lose precision for arrow types such as
/// [`DataType::Date64`] which have no direct corresponding Parquet type.
///
/// By default, does not coerce to native parquet types. Enabling type
/// By default, this converter does not coerce to native parquet types. Enabling type
/// coercion allows for meaningful representations that do not require
/// downstream readers to consider the embedded Arrow schema, and can allow
/// for greater compatibility with other Parquet implementations. However,
Expand All @@ -297,11 +297,14 @@ impl<'a> ArrowToParquetSchemaConverter<'a> {
/// Some Arrow types such as `Date64`, `Timestamp` and `Interval` have no
/// corresponding Parquet logical type. Thus, they can not be losslessly
/// round-tripped when stored using the appropriate Parquet logical type.
///
/// For example, some Date64 values may be truncated when stored with
/// parquet's native 32 bit date type. For [`List`] and [`Map`] types, some
/// parquet's native 32 bit date type.
///
/// For [`List`] and [`Map`] types, some
/// Parquet readers expect certain schema elements to have specific names
/// (earlier versions of the spec was somewhat ambiguous on this point).
/// (earlier versions of the spec were somewhat ambiguous on this point).
/// Type coercion will use the names prescribed by the Parquet specification,
/// potentially losing naming metadata from the Arrow schema.
///
/// [`List`]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists
/// [`Map`]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps
Expand Down

0 comments on commit 1841283

Please sign in to comment.