Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into feature/record-batc…
Browse files Browse the repository at this point in the history
…h-flatten
  • Loading branch information
nglime committed Dec 5, 2024
2 parents d1b3260 + 83dda05 commit a12082c
Show file tree
Hide file tree
Showing 220 changed files with 3,771 additions and 1,499 deletions.
2 changes: 1 addition & 1 deletion .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ github:
features:
issues: true
protected_branches:
master:
main:
required_status_checks:
# require branches to be up-to-date before merging
strict: true
Expand Down
2 changes: 1 addition & 1 deletion .github/actions/setup-builder/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ runs:
using: "composite"
steps:
- name: Cache Cargo
uses: actions/cache@v3
uses: actions/cache@v4
with:
# these represent dependencies downloaded by cargo
# and thus do not depend on the OS, arch nor rust version.
Expand Down
4 changes: 2 additions & 2 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ updates:
schedule:
interval: daily
open-pull-requests-limit: 10
target-branch: master
target-branch: main
labels: [ auto-dependencies, arrow ]
- package-ecosystem: cargo
directory: "/object_store"
schedule:
interval: daily
open-pull-requests-limit: 10
target-branch: master
target-branch: main
labels: [ auto-dependencies, object_store ]
- package-ecosystem: "github-actions"
directory: "/"
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@
The CI is structured so most tests are run in specific workflows:
`arrow.yml` for `arrow`, `parquet.yml` for `parquet` and so on.

The basic idea is to run all tests on pushes to master (to ensure we
keep master green) but run only the individual workflows on PRs that
The basic idea is to run all tests on pushes to main (to ensure we
keep main green) but run only the individual workflows on PRs that
change files that could affect them.
2 changes: 1 addition & 1 deletion .github/workflows/arrow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ on:
# always trigger
push:
branches:
- master
- main
pull_request:
paths:
- .github/**
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/arrow_flight.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- arrow-array/**
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/audit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- '**/Cargo.toml'
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs and changes to master
# trigger for all PRs and changes to main
on:
push:
branches:
- master
- main
pull_request:

env:
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs and changes to master
# trigger for all PRs and changes to main
on:
push:
branches:
- master
- main
pull_request:

jobs:
Expand Down Expand Up @@ -70,8 +70,8 @@ jobs:
path: target/doc

deploy:
# Only deploy if a push to master
if: github.ref_name == 'master' && github.event_name == 'push'
# Only deploy if a push to main
if: github.ref_name == 'main' && github.event_name == 'push'
needs: docs
permissions:
contents: write
Expand All @@ -90,7 +90,7 @@ jobs:
cp .asf.yaml ./website/build/.asf.yaml
- name: Deploy to gh-pages
uses: peaceiris/[email protected]
if: github.event_name == 'push' && github.ref_name == 'master'
if: github.event_name == 'push' && github.ref_name == 'main'
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: website/build
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- .github/**
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/miri.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- .github/**
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/object_store.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- object_store/**
Expand Down Expand Up @@ -138,7 +138,7 @@ jobs:

- name: Setup LocalStack (AWS emulation)
run: |
echo "LOCALSTACK_CONTAINER=$(docker run -d -p 4566:4566 localstack/localstack:3.8.1)" >> $GITHUB_ENV
echo "LOCALSTACK_CONTAINER=$(docker run -d -p 4566:4566 localstack/localstack:4.0.3)" >> $GITHUB_ENV
echo "EC2_METADATA_CONTAINER=$(docker run -d -p 1338:1338 amazon/amazon-ec2-metadata-mock:v1.9.2 --imdsv2)" >> $GITHUB_ENV
aws --endpoint-url=http://localhost:4566 s3 mb s3://test-bucket
aws --endpoint-url=http://localhost:4566 dynamodb create-table --table-name test-table --key-schema AttributeName=path,KeyType=HASH AttributeName=etag,KeyType=RANGE --attribute-definitions AttributeName=path,AttributeType=S AttributeName=etag,AttributeType=S --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
Expand All @@ -164,7 +164,7 @@ jobs:
- name: Run object_store tests (AWS native conditional put)
run: cargo test --features=aws
env:
AWS_CONDITIONAL_PUT: etag-put-if-not-exists
AWS_CONDITIONAL_PUT: etag
AWS_COPY_IF_NOT_EXISTS: multipart

- name: GCS Output
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/parquet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- arrow/**
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/parquet_derive.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs that touch certain files and changes to master
# trigger for all PRs that touch certain files and changes to main
on:
push:
branches:
- master
- main
pull_request:
paths:
- parquet/**
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

# trigger for all PRs and changes to master
# trigger for all PRs and changes to main
on:
push:
branches:
- master
- main
pull_request:

jobs:
Expand Down
63 changes: 63 additions & 0 deletions CHANGELOG-old.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,69 @@

# Historical Changelog

## [53.2.0](https://github.com/apache/arrow-rs/tree/53.2.0) (2024-10-21)

[Full Changelog](https://github.com/apache/arrow-rs/compare/53.1.0...53.2.0)

**Implemented enhancements:**

- Implement arrow\_json encoder for Decimal128 & Decimal256 DataTypes [\#6605](https://github.com/apache/arrow-rs/issues/6605) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]
- Support DataType::FixedSizeList in make\_builder within struct\_builder.rs [\#6594](https://github.com/apache/arrow-rs/issues/6594) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]
- Support DataType::Dictionary in `make_builder` within struct\_builder.rs [\#6589](https://github.com/apache/arrow-rs/issues/6589) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]
- Interval parsing from string - accept "mon" and "mons" token [\#6548](https://github.com/apache/arrow-rs/issues/6548) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]
- `AsyncArrowWriter` API to get the total size of a written parquet file [\#6530](https://github.com/apache/arrow-rs/issues/6530) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)]
- `append_many` for Dictionary builders [\#6529](https://github.com/apache/arrow-rs/issues/6529) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]
- Missing tonic `GRPC_STATUS` with tonic 0.12.1 [\#6515](https://github.com/apache/arrow-rs/issues/6515) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)]
- Add example of how to use parquet metadata reader APIs for a local cache [\#6504](https://github.com/apache/arrow-rs/issues/6504) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)]
- Remove reliance on `raw-entry` feature of Hashbrown [\#6498](https://github.com/apache/arrow-rs/issues/6498) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)]
- Improve page index metadata loading in `SerializedFileReader::new_with_options` [\#6491](https://github.com/apache/arrow-rs/issues/6491) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)]
- Release arrow-rs / parquet minor version `53.1.0` \(October 2024\) [\#6340](https://github.com/apache/arrow-rs/issues/6340) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]

**Fixed bugs:**

- Compilation fail where `c_char = u8` [\#6571](https://github.com/apache/arrow-rs/issues/6571) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]
- Arrow flight CI test failing on `master` [\#6568](https://github.com/apache/arrow-rs/issues/6568) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)]

**Documentation updates:**

- Minor: Document SIMD rationale and tips [\#6554](https://github.com/apache/arrow-rs/pull/6554) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([alamb](https://github.com/alamb))

**Closed issues:**

- Casting to and from unions [\#6247](https://github.com/apache/arrow-rs/issues/6247) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)]

**Merged pull requests:**

- Minor: more comments for `RecordBatch.get_array_memory_size()` [\#6607](https://github.com/apache/arrow-rs/pull/6607) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([2010YOUY01](https://github.com/2010YOUY01))
- Implement arrow\_json encoder for Decimal128 & Decimal256 [\#6606](https://github.com/apache/arrow-rs/pull/6606) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([phillipleblanc](https://github.com/phillipleblanc))
- Add support for building FixedSizeListBuilder in struct\_builder's mak… [\#6595](https://github.com/apache/arrow-rs/pull/6595) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([kszlim](https://github.com/kszlim))
- Add limited support for dictionary builders in `make_builders` for stru… [\#6593](https://github.com/apache/arrow-rs/pull/6593) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([kszlim](https://github.com/kszlim))
- Fix CI with new valid certificates and add script for future usage [\#6585](https://github.com/apache/arrow-rs/pull/6585) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)] ([itsjunetime](https://github.com/itsjunetime))
- Update proc-macro2 requirement from =1.0.87 to =1.0.88 [\#6579](https://github.com/apache/arrow-rs/pull/6579) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)] ([dependabot[bot]](https://github.com/apps/dependabot))
- Fix clippy complaints [\#6573](https://github.com/apache/arrow-rs/pull/6573) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)] ([itsjunetime](https://github.com/itsjunetime))
- Use c\_char instead of i8 to compile on platforms where c\_char = u8 [\#6572](https://github.com/apache/arrow-rs/pull/6572) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([itsjunetime](https://github.com/itsjunetime))
- Bump pyspark from 3.3.1 to 3.3.2 in /parquet/pytest [\#6564](https://github.com/apache/arrow-rs/pull/6564) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([dependabot[bot]](https://github.com/apps/dependabot))
- `unsafe` improvements [\#6551](https://github.com/apache/arrow-rs/pull/6551) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([ssbr](https://github.com/ssbr))
- Update README.md [\#6550](https://github.com/apache/arrow-rs/pull/6550) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)] ([Abdullahsab3](https://github.com/Abdullahsab3))
- Fix string '0' cast to decimal with scale 0 [\#6547](https://github.com/apache/arrow-rs/pull/6547) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([findepi](https://github.com/findepi))
- Add finish to `AsyncArrowWriter::finish` [\#6543](https://github.com/apache/arrow-rs/pull/6543) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([etseidl](https://github.com/etseidl))
- Add append\_nulls to dictionary builders [\#6542](https://github.com/apache/arrow-rs/pull/6542) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([adriangb](https://github.com/adriangb))
- Improve UnionArray::is\_nullable [\#6540](https://github.com/apache/arrow-rs/pull/6540) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([tustvold](https://github.com/tustvold))
- Allow to read parquet binary column as UTF8 type [\#6539](https://github.com/apache/arrow-rs/pull/6539) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([goldmedal](https://github.com/goldmedal))
- Use HashTable instead of raw\_entry\_mut [\#6537](https://github.com/apache/arrow-rs/pull/6537) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([tustvold](https://github.com/tustvold))
- Add append\_many to dictionary arrays to allow adding repeated values [\#6534](https://github.com/apache/arrow-rs/pull/6534) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([adriangb](https://github.com/adriangb))
- Adds documentation and example recommending Vec\<ArrayRef\> over ChunkedArray [\#6527](https://github.com/apache/arrow-rs/pull/6527) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([efredine](https://github.com/efredine))
- Update proc-macro2 requirement from =1.0.86 to =1.0.87 [\#6526](https://github.com/apache/arrow-rs/pull/6526) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)] ([dependabot[bot]](https://github.com/apps/dependabot))
- Add `ColumnChunkMetadataBuilder` clear APIs [\#6523](https://github.com/apache/arrow-rs/pull/6523) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([alamb](https://github.com/alamb))
- Update sysinfo requirement from 0.31.2 to 0.32.0 [\#6521](https://github.com/apache/arrow-rs/pull/6521) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([dependabot[bot]](https://github.com/apps/dependabot))
- Update Tonic to 0.12.3 [\#6517](https://github.com/apache/arrow-rs/pull/6517) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] [[arrow-flight](https://github.com/apache/arrow-rs/labels/arrow-flight)] ([cisaacson](https://github.com/cisaacson))
- Detect missing page indexes while reading Parquet metadata [\#6507](https://github.com/apache/arrow-rs/pull/6507) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([etseidl](https://github.com/etseidl))
- Use ParquetMetaDataReader to load page indexes in `SerializedFileReader::new_with_options` [\#6506](https://github.com/apache/arrow-rs/pull/6506) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([etseidl](https://github.com/etseidl))
- Improve parquet `MetadataFetch` and `AsyncFileReader` docs [\#6505](https://github.com/apache/arrow-rs/pull/6505) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([alamb](https://github.com/alamb))
- fix arrow-json encoding with dictionary including nulls [\#6503](https://github.com/apache/arrow-rs/pull/6503) [[arrow](https://github.com/apache/arrow-rs/labels/arrow)] ([samuelcolvin](https://github.com/samuelcolvin))
- Update brotli requirement from 6.0 to 7.0 [\#6499](https://github.com/apache/arrow-rs/pull/6499) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([dependabot[bot]](https://github.com/apps/dependabot))
- Benchmark both scenarios, with records skipped and without skipping, for delta-bin-packed primitive arrays with half nulls. [\#6489](https://github.com/apache/arrow-rs/pull/6489) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([wiedld](https://github.com/wiedld))
- Add round trip tests for reading/writing parquet metadata [\#6463](https://github.com/apache/arrow-rs/pull/6463) [[parquet](https://github.com/apache/arrow-rs/labels/parquet)] ([alamb](https://github.com/alamb))
## [53.1.0](https://github.com/apache/arrow-rs/tree/53.1.0) (2024-10-02)

[Full Changelog](https://github.com/apache/arrow-rs/compare/53.0.0...53.1.0)
Expand Down
Loading

0 comments on commit a12082c

Please sign in to comment.