Skip to content

Latest commit

 

History

History
715 lines (555 loc) · 55.5 KB

CHANGELOG.md

File metadata and controls

715 lines (555 loc) · 55.5 KB

DataFusion Python Changelog

42.0.0 (2024-10-06)

This release consists of 20 commits from 6 contributors. See credits at the end of this changelog for more information.

Implemented enhancements:

  • feat: expose between #868 (mesejo)
  • feat: make register_csv accept a list of paths #883 (mesejo)
  • feat: expose http object store #885 (mesejo)

Fixed bugs:

  • fix: Calling count on a pyarrow dataset results in an error #843 (Michael-J-Ward)

Other:

  • Upgrade datafusion #867 (emgeee)
  • Feature/aggregates as windows #871 (timsaucer)
  • Fix regression on register_udaf #878 (timsaucer)
  • build(deps): upgrade setup-protoc action and protoc version number #873 (Michael-J-Ward)
  • build(deps): bump prost-types from 0.13.2 to 0.13.3 #881 (dependabot[bot])
  • build(deps): bump prost from 0.13.2 to 0.13.3 #882 (dependabot[bot])
  • chore: remove XFAIL from passing tests #884 (Michael-J-Ward)
  • Add user defined window function support #880 (timsaucer)
  • build(deps): bump syn from 2.0.77 to 2.0.79 #886 (dependabot[bot])
  • fix example of reading parquet from s3 #896 (sir-sigurd)
  • release-testing #889 (Michael-J-Ward)
  • chore(bench): fix create_tables.sql for tpch benchmark #897 (Michael-J-Ward)
  • Add physical and logical plan conversion to and from protobuf #892 (timsaucer)
  • Feature/instance udfs #890 (timsaucer)
  • chore(ci): remove Mambaforge variant from CI #894 (Michael-J-Ward)
  • Use OnceLock to store TokioRuntime #895 (Michael-J-Ward)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     7	Michael J Ward
     5	Tim Saucer
     3	Daniel Mesejo
     3	dependabot[bot]
     1	Matt Green
     1	Sergey Fedoseev

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

41.0.0 (2024-09-09)

This release consists of 19 commits from 6 contributors. See credits at the end of this changelog for more information.

Implemented enhancements:

  • feat: enable list of paths for read_csv #824 (mesejo)
  • feat: better exception and message for table not found #851 (mesejo)
  • feat: make cast accept built-in Python types #858 (mesejo)

Other:

  • chore: Prepare for 40.0.0 release #801 (andygrove)
  • Add typing-extensions dependency to pyproject #805 (timsaucer)
  • Upgrade deps to datafusion 41 #802 (Michael-J-Ward)
  • Fix SessionContext init with only SessionConfig #827 (jcrist)
  • build(deps): upgrade actions/{upload,download}-artifact@v3 to v4 #829 (Michael-J-Ward)
  • Run ruff format in CI #837 (timsaucer)
  • Add PyCapsule support for Arrow import and export #825 (timsaucer)
  • Feature/expose when function #836 (timsaucer)
  • Add Window Functions for use with function builder #808 (timsaucer)
  • chore: fix typos #844 (mesejo)
  • build(ci): use proper mac runners #841 (Michael-J-Ward)
  • Set of small features #839 (timsaucer)
  • chore: fix docstrings, typos #852 (mesejo)
  • chore: Use datafusion re-exported dependencies #856 (emgeee)
  • add guidelines on separating python and rust code #860 (Michael-J-Ward)
  • Update Aggregate functions to take builder parameters #859 (timsaucer)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     7  Tim Saucer
     5  Daniel Mesejo
     4  Michael J Ward
     1  Andy Grove
     1  Jim Crist-Harif
     1  Matt Green

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

40.0.0 (2024-08-09)

This release consists of 18 commits from 4 contributors. See credits at the end of this changelog for more information.

  • Update changelog for 39.0.0 #742 (andygrove)
  • build(deps): bump uuid from 1.8.0 to 1.9.1 #744 (dependabot[bot])
  • build(deps): bump mimalloc from 0.1.42 to 0.1.43 #745 (dependabot[bot])
  • build(deps): bump syn from 2.0.67 to 2.0.68 #746 (dependabot[bot])
  • Tsaucer/find window fn #747 (timsaucer)
  • Python wrapper classes for all user interfaces #750 (timsaucer)
  • Expose array sort #764 (timsaucer)
  • Upgrade protobuf and remove GH Action googletest-installer #773 (Michael-J-Ward)
  • Upgrade Datafusion 40 #771 (Michael-J-Ward)
  • Bugfix: Calling count with None arguments #768 (timsaucer)
  • Add in user example that compares a two different approaches to UDFs #770 (timsaucer)
  • Add missing exports for wrapper modules #782 (timsaucer)
  • Add PyExpr to_variant conversions #793 (Michael-J-Ward)
  • Add missing expressions to wrapper export #795 (timsaucer)
  • Doc/cross reference #791 (timsaucer)
  • Re-Enable num_centroids to approx_percentile_cont #798 (Michael-J-Ward)
  • UDAF process all state variables #799 (timsaucer)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

     9	Tim Saucer
     4	Michael J Ward
     3	dependabot[bot]
     2	Andy Grove

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.

39.0.0 (2024-06-25)

Merged pull requests:

  • ci: add substrait feature to linux builds #720 (Michael-J-Ward)
  • Docs deploy action #721 (Michael-J-Ward)
  • update deps #723 (Michael-J-Ward)
  • Upgrade maturin #725 (Michael-J-Ward)
  • Upgrade datafusion 39 #728 (Michael-J-Ward)
  • use ScalarValue::to_pyarrow to convert to python object #731 (Michael-J-Ward)
  • Pyo3 Bound<'py, T> api #734 (Michael-J-Ward)
  • github test action: drop python 3.7, add python 3.12 #736 (Michael-J-Ward)
  • Pyarrow filter pushdowns #735 (Michael-J-Ward)
  • build(deps): bump syn from 2.0.66 to 2.0.67 #738 (dependabot[bot])
  • Pyo3 refactorings #740 (Michael-J-Ward)
  • UDAF sum workaround #741 (Michael-J-Ward)

38.0.1 (2024-05-25)

Implemented enhancements:

  • feat: add python bindings for ends_with function #693 (richtia)
  • feat: expose named_struct in python #700 (Michael-J-Ward)

Merged pull requests:

  • Add document about basics of working with expressions #668 (timsaucer)
  • chore: Update Python release process now that DataFusion is TLP #674 (andygrove)
  • Fix Docs #676 (Michael-J-Ward)
  • Add examples from TPC-H #666 (timsaucer)
  • fix conda nightly builds, attempt 2 #689 (Michael-J-Ward)
  • Upgrade to datafusion 38 #691 (Michael-J-Ward)
  • chore: update to maturin's recommended project layout for rust/python… #695 (Michael-J-Ward)
  • chore: update cargo deps #698 (Michael-J-Ward)
  • feat: add python bindings for ends_with function #693 (richtia)
  • feat: expose named_struct in python #700 (Michael-J-Ward)
  • Website fixes #702 (Michael-J-Ward)

37.1.0 (2024-05-08)

Implemented enhancements:

  • feat: add execute_stream and execute_stream_partitioned #610 (mesejo)

Documentation updates:

  • docs: update docs CI to install python-311 requirements #661 (Michael-J-Ward)

Merged pull requests:

  • Switch to Ruff for Python linting #529 (andygrove)
  • Remove sql-on-pandas/polars/cudf examples #602 (andygrove)
  • build(deps): bump object_store from 0.9.0 to 0.9.1 #611 (dependabot[bot])
  • More missing array funcs #605 (judahrand)
  • feat: add execute_stream and execute_stream_partitioned #610 (mesejo)
  • build(deps): bump uuid from 1.7.0 to 1.8.0 #615 (dependabot[bot])
  • Bind SQLOptions and relative ctx method #567 #588 (giacomorebecchi)
  • bugfix: no panic on empty table #613 (mesejo)
  • Expose register_listing_table #618 (henrifroese)
  • Expose unnest feature #641 (timsaucer)
  • Update domain names and paths in asf yaml #643 (andygrove)
  • use python 3.11 to publish docs #645 (andygrove)
  • docs: update docs CI to install python-311 requirements #661 (Michael-J-Ward)
  • Upgrade Datafusion to v37.1.0 #669 (Michael-J-Ward)

36.0.0 (2024-03-02)

Implemented enhancements:

  • feat: Add flatten array function #562 (mobley-trent)

Documentation updates:

  • docs: Add ASF attribution #580 (simicd)

Merged pull requests:

  • Allow PyDataFrame to be used from other projects #582 (andygrove)
  • docs: Add ASF attribution #580 (simicd)
  • Add array functions #560 (ongchi)
  • feat: Add flatten array function #562 (mobley-trent)

35.0.0 (2024-01-20)

Merged pull requests:

  • build(deps): bump syn from 2.0.41 to 2.0.43 #559 (dependabot[bot])
  • build(deps): bump tokio from 1.35.0 to 1.35.1 #558 (dependabot[bot])
  • build(deps): bump async-trait from 0.1.74 to 0.1.77 #556 (dependabot[bot])
  • build(deps): bump pyo3 from 0.20.0 to 0.20.2 #557 (dependabot[bot])

34.0.0 (2023-12-28)

Merged pull requests:

  • Adjust visibility of crate private members & Functions #537 (jdye64)
  • Update json.rst #538 (ray-andrew)
  • Enable mimalloc local_dynamic_tls feature #540 (jdye64)
  • Enable substrait feature to be built by default in CI, for nightlies … #544 (jdye64)

33.0.0 (2023-11-16)

Merged pull requests:

  • First pass at getting architectured builds working #350 (charlesbluca)
  • Remove libprotobuf dep #527 (jdye64)

32.0.0 (2023-10-21)

Implemented enhancements:

  • feat: expose PyWindowFrame #509 (dlovell)
  • add Binary String Functions;encode,decode #494 (jiangzhx)
  • add bit_and,bit_or,bit_xor,bool_add,bool_or #496 (jiangzhx)
  • add first_value last_value #498 (jiangzhx)
  • add regr_* functions #499 (jiangzhx)
  • Add random missing bindings #522 (jdye64)
  • Allow for multiple input files per table instead of a single file #519 (jdye64)
  • Add support for window function bindings #521 (jdye64)

Merged pull requests:

  • Prepare 31.0.0 release #500 (andygrove)
  • Improve release process documentation #505 (andygrove)
  • add Binary String Functions;encode,decode #494 (jiangzhx)
  • build(deps): bump mimalloc from 0.1.38 to 0.1.39 #502 (dependabot[bot])
  • build(deps): bump syn from 2.0.32 to 2.0.35 #503 (dependabot[bot])
  • build(deps): bump syn from 2.0.35 to 2.0.37 #506 (dependabot[bot])
  • Use latest DataFusion #511 (andygrove)
  • add bit_and,bit_or,bit_xor,bool_add,bool_or #496 (jiangzhx)
  • use DataFusion 32 #515 (andygrove)
  • add first_value last_value #498 (jiangzhx)
  • build(deps): bump regex-syntax from 0.7.5 to 0.8.1 #517 (dependabot[bot])
  • build(deps): bump pyo3-build-config from 0.19.2 to 0.20.0 #516 (dependabot[bot])
  • add regr_* functions #499 (jiangzhx)
  • Add random missing bindings #522 (jdye64)
  • build(deps): bump rustix from 0.38.18 to 0.38.19 #523 (dependabot[bot])
  • Allow for multiple input files per table instead of a single file #519 (jdye64)
  • Add support for window function bindings #521 (jdye64)
  • Small clippy fix #524 (andygrove)

31.0.0 (2023-09-12)

Full Changelog

Implemented enhancements:

  • feat: add case function (#447) #448 (mesejo)
  • feat: add compression options #456 (mesejo)
  • feat: add register_json #458 (mesejo)
  • feat: add basic compression configuration to write_parquet #459 (mesejo)
  • feat: add example of reading parquet from s3 #460 (mesejo)
  • feat: add register_avro and read_table #461 (mesejo)
  • feat: add missing scalar math functions #465 (mesejo)

Documentation updates:

  • docs: include pre-commit hooks section in contributor guide #455 (mesejo)

Merged pull requests:

  • Build Linux aarch64 wheel #443 (gokselk)
  • feat: add case function (#447) #448 (mesejo)
  • enhancement(docs): Add user guide (#432) #445 (mesejo)
  • docs: include pre-commit hooks section in contributor guide #455 (mesejo)
  • feat: add compression options #456 (mesejo)
  • Upgrade to DF 28.0.0-rc1 #457 (andygrove)
  • feat: add register_json #458 (mesejo)
  • feat: add basic compression configuration to write_parquet #459 (mesejo)
  • feat: add example of reading parquet from s3 #460 (mesejo)
  • feat: add register_avro and read_table #461 (mesejo)
  • feat: add missing scalar math functions #465 (mesejo)
  • build(deps): bump arduino/setup-protoc from 1 to 2 #452 (dependabot[bot])
  • Revert "build(deps): bump arduino/setup-protoc from 1 to 2 (#452)" #474 (viirya)
  • Minor: fix wrongly copied function description #497 (viirya)
  • Upgrade to Datafusion 31.0.0 #491 (judahrand)
  • Add isnan and iszero #495 (judahrand)

30.0.0

  • Skipped due to a breaking change in DataFusion

29.0.0

  • Skipped

28.0.0 (2023-07-25)

Implemented enhancements:

  • feat: expose offset in python API #437 (cpcloud)

Merged pull requests:

  • File based input utils #433 (jdye64)
  • Upgrade to 28.0.0-rc1 #434 (andygrove)
  • Introduces utility for obtaining SqlTable information from a file like location #398 (jdye64)
  • feat: expose offset in python API #437 (cpcloud)
  • Use DataFusion 28 #439 (andygrove)

27.0.0 (2023-07-03)

Merged pull requests:

  • LogicalPlan.to_variant() make public #412 (jdye64)
  • Prepare 27.0.0 release #423 (andygrove)

26.0.0 (2023-06-11)

Full Changelog

Merged pull requests:

  • Add Expr::Case when_then_else support to rex_call_operands function #388 (jdye64)
  • Introduce BaseSessionContext abstract class #390 (jdye64)
  • CRUD Schema support for BaseSessionContext #392 (jdye64)
  • CRUD Table support for BaseSessionContext #394 (jdye64)

25.0.0 (2023-05-23)

Full Changelog

Merged pull requests:

  • Prepare 24.0.0 Release #376 (andygrove)
  • build(deps): bump uuid from 1.3.1 to 1.3.2 #359 (dependabot[bot])
  • build(deps): bump mimalloc from 0.1.36 to 0.1.37 #361 (dependabot[bot])
  • build(deps): bump regex-syntax from 0.6.29 to 0.7.1 #334 (dependabot[bot])
  • upgrade maturin to 0.15.1 #379 (Jimexist)
  • Expand Expr to include RexType basic support #378 (jdye64)
  • Add Python script for generating changelog #383 (andygrove)

24.0.0 (2023-05-09)

Full Changelog

Documentation updates:

  • Fix link to user guide #354 (andygrove)

Merged pull requests:

  • Add interface to serialize Substrait plans to Python Bytes. #344 (kylebrooks-8451)
  • Add partition_count property to ExecutionPlan. #346 (kylebrooks-8451)
  • Remove unsendable from all Rust pyclass types. #348 (kylebrooks-8451)
  • Fix link to user guide #354 (andygrove)
  • Fix SessionContext execute. #353 (kylebrooks-8451)
  • Pub mod expr in lib.rs #357 (jdye64)
  • Add benchmark derived from TPC-H #355 (andygrove)
  • Add db-benchmark #365 (andygrove)
  • First pass of documentation in mdBook #364 (MrPowers)
  • Add 'pub' and '#[pyo3(get, set)]' to DataTypeMap #371 (jdye64)
  • Fix db-benchmark #369 (andygrove)
  • Docs explaining how to view query plans #373 (andygrove)
  • Improve db-benchmark #372 (andygrove)
  • Make expr member of PyExpr public #375 (jdye64)

23.0.0 (2023-04-23)

Full Changelog

Merged pull requests:

  • Improve API docs, README, and examples for configuring context #321 (andygrove)
  • Osx build linker args #330 (jdye64)
  • Add requirements file for python 3.11 #332 (r4ntix)
  • mac arm64 build #338 (andygrove)
  • Add conda.yaml baseline workflow file #281 (jdye64)
  • Prepare for 23.0.0 release #335 (andygrove)
  • Reuse the Tokio Runtime #341 (kylebrooks-8451)

22.0.0 (2023-04-10)

Full Changelog

Merged pull requests:

  • Fix invalid build yaml #308 (andygrove)
  • Try fix release build #309 (andygrove)
  • Fix release build #310 (andygrove)
  • Enable datafusion-substrait protoc feature, to remove compile-time dependency on protoc #312 (andygrove)
  • Fix Mac/Win release builds in CI #313 (andygrove)
  • install protoc in docs workflow #314 (andygrove)
  • Fix documentation generation in CI #315 (andygrove)
  • Source wheel fix #319 (andygrove)

21.0.0 (2023-03-30)

Full Changelog

Merged pull requests:

  • minor: Fix minor warning on unused import #289 (viirya)
  • feature: Implement describe() method #293 (simicd)
  • fix: Printed results not visible in debugger & notebooks #296 (simicd)
  • add package.include and remove wildcard dependency #295 (andygrove)
  • Update main branch name in docs workflow #303 (andygrove)
  • Upgrade to DF 21 #301 (andygrove)

20.0.0 (2023-03-17)

Full Changelog

Implemented enhancements:

  • Empty relation bindings #208 (jdye64)
  • wrap display_name and canonical_name functions #214 (jdye64)
  • Add PyAlias bindings #216 (jdye64)
  • Add bindings for scalar_variable #218 (jdye64)
  • Bindings for LIKE type expressions #220 (jdye64)
  • Bool expr bindings #223 (jdye64)
  • Between bindings #229 (jdye64)
  • Add bindings for GetIndexedField #227 (jdye64)
  • Add bindings for case, cast, and trycast #232 (jdye64)
  • add remaining expr bindings #233 (jdye64)
  • feature: Additional export methods #236 (simicd)
  • Add Python wrapper for LogicalPlan::Union #240 (iajoiner)
  • feature: Create dataframe from pandas, polars, dictionary, list or pyarrow Table #242 (simicd)
  • Add Python wrappers for LogicalPlan::Join and LogicalPlan::CrossJoin #246 (iajoiner)
  • feature: Set table name from ctx functions #260 (simicd)
  • Explain bindings #264 (jdye64)
  • Extension bindings #266 (jdye64)
  • Subquery alias bindings #269 (jdye64)
  • Create memory table #271 (jdye64)
  • Create view bindings #273 (jdye64)
  • Re-export Datafusion dependencies #277 (jdye64)
  • Distinct bindings #275 (jdye64)
  • Drop table bindings #283 (jdye64)
  • Bindings for LogicalPlan::Repartition #285 (jdye64)
  • Expand Rust return type support for Arrow DataTypes in ScalarValue #287 (jdye64)

Documentation updates:

  • docs: Example of calling Python UDF & UDAF in SQL #258 (simicd)

Merged pull requests:

  • Minor docs updates #210 (andygrove)
  • Empty relation bindings #208 (jdye64)
  • wrap display_name and canonical_name functions #214 (jdye64)
  • Add PyAlias bindings #216 (jdye64)
  • Add bindings for scalar_variable #218 (jdye64)
  • Bindings for LIKE type expressions #220 (jdye64)
  • Bool expr bindings #223 (jdye64)
  • Between bindings #229 (jdye64)
  • Add bindings for GetIndexedField #227 (jdye64)
  • Add bindings for case, cast, and trycast #232 (jdye64)
  • add remaining expr bindings #233 (jdye64)
  • Pre-commit hooks #228 (jdye64)
  • Implement new release process #149 (andygrove)
  • feature: Additional export methods #236 (simicd)
  • Add Python wrapper for LogicalPlan::Union #240 (iajoiner)
  • feature: Create dataframe from pandas, polars, dictionary, list or pyarrow Table #242 (simicd)
  • Fix release instructions #238 (andygrove)
  • Add Python wrappers for LogicalPlan::Join and LogicalPlan::CrossJoin #246 (iajoiner)
  • docs: Example of calling Python UDF & UDAF in SQL #258 (simicd)
  • feature: Set table name from ctx functions #260 (simicd)
  • Upgrade to DataFusion 19 #262 (andygrove)
  • Explain bindings #264 (jdye64)
  • Extension bindings #266 (jdye64)
  • Subquery alias bindings #269 (jdye64)
  • Create memory table #271 (jdye64)
  • Create view bindings #273 (jdye64)
  • Re-export Datafusion dependencies #277 (jdye64)
  • Distinct bindings #275 (jdye64)
  • build(deps): bump actions/checkout from 2 to 3 #244 (dependabot[bot])
  • build(deps): bump actions/upload-artifact from 2 to 3 #245 (dependabot[bot])
  • build(deps): bump actions/download-artifact from 2 to 3 #243 (dependabot[bot])
  • Use DataFusion 20 #278 (andygrove)
  • Drop table bindings #283 (jdye64)
  • Bindings for LogicalPlan::Repartition #285 (jdye64)
  • Expand Rust return type support for Arrow DataTypes in ScalarValue #287 (jdye64)

0.8.0 (2023-02-22)

Full Changelog

Implemented enhancements:

  • Add support for cuDF physical execution engine #202
  • Make it easier to create a Pandas dataframe from DataFusion query results #139

Fixed bugs:

  • Build error: could not compile thiserror due to 2 previous errors #69

Closed issues:

  • Integrate with the new object_store crate #22

Merged pull requests:

0.8.0-rc1 (2023-02-17)

Full Changelog

Implemented enhancements:

  • Add bindings for datafusion_common::DFField #184
  • Add bindings for DFSchema/DFSchemaRef #181
  • Add bindings for datafusion_expr Projection #179
  • Add bindings for TableScan struct from datafusion_expr::TableScan #177
  • Add a "mapping" struct for types #172
  • Improve string representation of datafusion classes (dataframe, context, expression, ...) #158
  • Add DataFrame count method #151
  • [REQUEST] Github Actions Improvements #146
  • Change default branch name from master to main #144
  • Bump pyo3 to 0.18.0 #140
  • Add script for Python linting #134
  • Add Python bindings for substrait module #132
  • Expand unit tests for built-in functions #128
  • support creating arrow-datafusion-python conda environment #122
  • Build Python source distribution in GitHub workflow #81
  • EPIC: Add all functions to python binding functions #72

Fixed bugs:

  • Build is broken #161
  • Out of memory when sorting #157
  • window_lead test appears to be non-deterministic #135
  • Reading csv does not work #130
  • Github actions produce a lot of warnings #94
  • ASF source release tarball has wrong directory name #90
  • Python Release Build failing after upgrading to maturin 14.2 #87
  • Maturin build hangs on Linux ARM64 #84
  • Cannot install on Mac M1 from source tarball from testpypi #82
  • ImportPathMismatchError when running pytest locally #77

Closed issues:

  • Publish documentation for Python bindings #39
  • Add Python binding for approx_median #32
  • Release version 0.7.0 #7

0.7.0-rc2 (2022-11-26)

Full Changelog

Full Changelog

Merged pull requests:

0.5.1 (2022-03-15)

Full Changelog

0.5.1-rc1 (2022-03-15)

Full Changelog

0.5.0 (2022-03-10)

Full Changelog

0.5.0-rc2 (2022-03-10)

Full Changelog

Closed issues:

  • Add support for Ballista #37
  • Implement DataFrame.explain #35

0.5.0-rc1 (2022-03-09)

Full Changelog

Closed issues:

  • Investigate exposing additional optimizations #28
  • Use custom allocator in Python build #27
  • Why is pandas a requirement? #24
  • Unable to build #18
  • Setup CI against multiple Python version #6