Skip to content

Releases: xarray-contrib/flox

v0.8.5

30 Nov 23:07
12cbef9
Compare
Choose a tag to compare

Add scipy as required dependency

See 0.8.4 release notes

v0.8.4

30 Nov 21:50
666d45e
Compare
Choose a tag to compare

What's Changed

Another round of absolutely massive performance improvements for method="cohorts". This should bigly improve many Xarray groupby workloads (which use cohorts by default). Resampling in particular should be much better

Benchmarks improvements undersell the changes, since the core loop is approximately quadratic. Graph construction time for the example in https://xarray.dev/blog/flox with "cohorts" specified now drops from 30s to 3s 😱

| Before [15324a7a] <v0.8.3>   | After [666d45e9] <main>   |   Ratio | Benchmark (Parameter)                                   |
|------------------------------|---------------------------|---------|---------------------------------------------------------|

#### Larger cohorts, lower tasks
| 5180                         | 3600                      |    0.69 | cohorts.NWMMidwest.track_num_tasks                      |
| 4891                         | 3385                      |    0.69 | cohorts.NWMMidwest.track_num_tasks_optimized            |
| 505                          | 345                       |    0.68 | cohorts.NWMMidwest.track_num_layers                     |

#### Much faster algorithm for detecting cohorts, that should scale better.
| 3.19±0.07ms                  | 2.42±0.05ms               |    0.76 | cohorts.ERA5Google.time_find_group_cohorts              |
| 1.04±0.01ms                  | 782±70μs                  |    0.75 | cohorts.PerfectMonthly.time_find_group_cohorts          |
| 1.06±0.01ms                  | 781±70μs                  |    0.73 | cohorts.PerfectMonthlyRechunked.time_find_group_cohorts |
| 29.7±2ms                     | 12.6±0.9ms                |    0.43 | cohorts.ERA5DayOfYear.time_find_group_cohorts           |
| 7.76±1ms                     | 2.90±0.2ms                |    0.37 | cohorts.ERA5MonthHour.time_find_group_cohorts           |
| 8.17±0.8ms                   | 2.75±0.2ms                |    0.34 | cohorts.ERA5MonthHourRechunked.time_find_group_cohorts  |
| 242±5ms                      | 47.3±2ms                  |    0.2  | cohorts.NWMMidwest.time_find_group_cohorts              |
| 28.8±3ms                     | 4.11±0.3ms                |    0.14 | cohorts.ERA5DayOfYearRechunked.time_find_group_cohorts  |

##### Total time is not too different, we have some overhead in constructing the graphs
| 162±5ms                      | 144±9ms                   |    0.89 | cohorts.ERA5DayOfYearRechunked.time_graph_construct     |
| 20.7±0.2ms                   | 18.3±0.4ms                |    0.89 | cohorts.ERA5Google.time_graph_construct                 |
| 3.21±0.2ms                   | 2.40±0.04ms               |    0.75 | cohorts.PerfectMonthly.time_graph_construct             |
| 181±10ms                     | 129±10ms                  |    0.71 | cohorts.NWMMidwest.time_graph_construct                 |

Changes

Full Changelog: v0.8.3...v0.8.4

v0.8.3

24 Nov 23:11
15324a7
Compare
Choose a tag to compare

What's Changed

  • Fix reordering of dataarray dimensions inside dataset by @eendebakpt in #289

New Contributors

Full Changelog: v0.8.2...v0.8.3

v0.8.2

09 Nov 04:30
19db5b3
Compare
Choose a tag to compare

Major performance improvements (yet again!) Thanks to @max-sixty for prompting these.

What's Changed

  • Properly dispatch to numbagg when we can by @dcherian in #282
  • Actually optimize out multiple "nanlen" by @dcherian in #283
  • Set order='F' when raveling group_idx after broadcast by @dcherian in #286

Full Changelog: v0.8.1...v0.8.2

v0.8.1

15 Oct 15:35
c15572e
Compare
Choose a tag to compare

Fix packaging of v0.8.0

See v0.8.0 release notes for all changes

v0.8.0

15 Oct 04:51
fecd9a6
Compare
Choose a tag to compare

What's Changed

Major performance improvements!!!

  1. Support numbagg throughengine="numbagg" for many common nan-skipping reductions in #72. Using numbagg appears to be a major speedup (2x-3x in general, 6X for nanmean). Special thanks to @max-sixty for major work on numbagg's grouped aggregations! Here are timings for reducing a 2D array along the last axis with ordered group labels.

    func engine
    nansum flox 70.3±0.2ms
    numpy 122±0.2ms
    numbagg 18.4±0.04ms
    nanmean flox 144±0.4ms
    numpy 196±0.5ms
    numbagg 23.7±0.2ms
    nanmax flox 93.4±0.8ms
    numpy 953±2ms
    numbagg 20.3±0.2ms
    count flox 59.8±1ms
    numpy 114±0.2ms
    numbagg 29.3±0.1ms
  2. Support engine=None in #266. This will

    • Use numbagg if available
    • If not, use flox if the group labels are sorted
    • Fallback to numpy otherwise.
      Thanks to @mathause for kicking off this work.
  3. Significant speed to detecting "cohorts" of groups in #272

Other Major Changes

  1. Test and support for python 3.12 (note numba does not support 3.12 yet)
  2. Bump minimum numpy version to 1.22.
  3. New Aggregations : Support quantile, median, mode with method="blockwise". by @dcherian in #269
  4. Add multidimensional bins demo notebook by @dcherian in #203 . This is useful for prediction/forecasting problems.

Minor Changes

New Contributors

Full Changelog: v0.7.2...v0.8.0

v0.7.2

11 May 20:34
096c080
Compare
Choose a tag to compare

New reductions: nanfirst, nanlast, nanargmax, nanargmin .

Please test before using in production

What's Changed

Full Changelog: v0.7.1...v0.7.2

v0.7.1

08 May 17:57
4164712
Compare
Choose a tag to compare

What's Changed

Bugfix!

Full Changelog: v0.7.0...v0.7.1

v0.7.0

05 May 12:51
622ddb2
Compare
Choose a tag to compare

What's Changed

Optimizations and performance improvements.

Full Changelog: v0.6.10...v0.7.0

v0.6.10

26 Mar 20:14
24dc7fd
Compare
Choose a tag to compare

Small performance improvement.

What's Changed

Full Changelog: v0.6.9...v0.6.10