Releases · xarray-contrib/flox

30 Nov 23:07

dcherian

v0.8.5

12cbef9

v0.8.5

Add scipy as required dependency

See 0.8.4 release notes

Assets 2

30 Nov 21:50

dcherian

v0.8.4

666d45e

v0.8.4

What's Changed

Another round of absolutely massive performance improvements for method="cohorts". This should bigly improve many Xarray groupby workloads (which use cohorts by default). Resampling in particular should be much better

Benchmarks improvements undersell the changes, since the core loop is approximately quadratic. Graph construction time for the example in https://xarray.dev/blog/flox with "cohorts" specified now drops from 30s to 3s 😱

| Before [15324a7a] <v0.8.3>   | After [666d45e9] <main>   |   Ratio | Benchmark (Parameter)                                   |
|------------------------------|---------------------------|---------|---------------------------------------------------------|

#### Larger cohorts, lower tasks
| 5180                         | 3600                      |    0.69 | cohorts.NWMMidwest.track_num_tasks                      |
| 4891                         | 3385                      |    0.69 | cohorts.NWMMidwest.track_num_tasks_optimized            |
| 505                          | 345                       |    0.68 | cohorts.NWMMidwest.track_num_layers                     |

#### Much faster algorithm for detecting cohorts, that should scale better.
| 3.19±0.07ms                  | 2.42±0.05ms               |    0.76 | cohorts.ERA5Google.time_find_group_cohorts              |
| 1.04±0.01ms                  | 782±70μs                  |    0.75 | cohorts.PerfectMonthly.time_find_group_cohorts          |
| 1.06±0.01ms                  | 781±70μs                  |    0.73 | cohorts.PerfectMonthlyRechunked.time_find_group_cohorts |
| 29.7±2ms                     | 12.6±0.9ms                |    0.43 | cohorts.ERA5DayOfYear.time_find_group_cohorts           |
| 7.76±1ms                     | 2.90±0.2ms                |    0.37 | cohorts.ERA5MonthHour.time_find_group_cohorts           |
| 8.17±0.8ms                   | 2.75±0.2ms                |    0.34 | cohorts.ERA5MonthHourRechunked.time_find_group_cohorts  |
| 242±5ms                      | 47.3±2ms                  |    0.2  | cohorts.NWMMidwest.time_find_group_cohorts              |
| 28.8±3ms                     | 4.11±0.3ms                |    0.14 | cohorts.ERA5DayOfYearRechunked.time_find_group_cohorts  |

##### Total time is not too different, we have some overhead in constructing the graphs
| 162±5ms                      | 144±9ms                   |    0.89 | cohorts.ERA5DayOfYearRechunked.time_graph_construct     |
| 20.7±0.2ms                   | 18.3±0.4ms                |    0.89 | cohorts.ERA5Google.time_graph_construct                 |
| 3.21±0.2ms                   | 2.40±0.04ms               |    0.75 | cohorts.PerfectMonthly.time_graph_construct             |
| 181±10ms                     | 129±10ms                  |    0.71 | cohorts.NWMMidwest.time_graph_construct                 |

Changes

More cohorts speedups by @dcherian in #290
typing fixes. by @dcherian in #292
Use set containment instead of perfect subsets by @dcherian in #291

Full Changelog: v0.8.3...v0.8.4

Contributors

dcherian

Assets 2

24 Nov 23:11

dcherian

v0.8.3

15324a7

v0.8.3

What's Changed

Fix reordering of dataarray dimensions inside dataset by @eendebakpt in #289

New Contributors

@eendebakpt made their first contribution in #289

Full Changelog: v0.8.2...v0.8.3

Contributors

eendebakpt

Assets 2

09 Nov 04:30

dcherian

v0.8.2

19db5b3

v0.8.2

Major performance improvements (yet again!) Thanks to @max-sixty for prompting these.

What's Changed

Properly dispatch to numbagg when we can by @dcherian in #282
Actually optimize out multiple "nanlen" by @dcherian in #283
Set order='F' when raveling group_idx after broadcast by @dcherian in #286

Full Changelog: v0.8.1...v0.8.2

Contributors

dcherian and max-sixty

Assets 2

15 Oct 15:35

dcherian

v0.8.1

c15572e

v0.8.1

Fix packaging of v0.8.0

See v0.8.0 release notes for all changes

Assets 2

15 Oct 04:51

dcherian

v0.8.0

fecd9a6

v0.8.0

What's Changed

Major performance improvements!!!

Support numbagg throughengine="numbagg" for many common nan-skipping reductions in #72. Using numbagg appears to be a major speedup (2x-3x in general, 6X for nanmean). Special thanks to @max-sixty for major work on numbagg's grouped aggregations! Here are timings for reducing a 2D array along the last axis with ordered group labels.

func	engine
nansum	flox	70.3±0.2ms
	numpy	122±0.2ms
	numbagg	18.4±0.04ms
nanmean	flox	144±0.4ms
	numpy	196±0.5ms
	numbagg	23.7±0.2ms
nanmax	flox	93.4±0.8ms
	numpy	953±2ms
	numbagg	20.3±0.2ms
count	flox	59.8±1ms
	numpy	114±0.2ms
	numbagg	29.3±0.1ms

Support engine=None in #266. This will
- Use numbagg if available
- If not, use flox if the group labels are sorted
- Fallback to numpy otherwise.
  Thanks to @mathause for kicking off this work.
Significant speed to detecting "cohorts" of groups in #272

Other Major Changes

Test and support for python 3.12 (note numba does not support 3.12 yet)
Bump minimum numpy version to 1.22.
New Aggregations : Support quantile, median, mode with method="blockwise". by @dcherian in #269
Add multidimensional bins demo notebook by @dcherian in #203 . This is useful for prediction/forecasting problems.

Minor Changes

Delete resample_reduce by @dcherian in #246
Fix test failure on i386 by @avalentino in #248
typing fixes by @Illviljan in #235, #253
replace the deprecated provision-with-micromamba with setup-micromamba by @keewis in #258
compatibility with numpy>=2.0 by @keewis in #257
convert datetime: micro-optimizations by @mathause in #261

New Contributors

@keewis made their first contribution in #258
@mathause made their first contribution in #261

Full Changelog: v0.7.2...v0.8.0

Contributors

avalentino, dcherian, and 4 other contributors

Assets 2

11 May 20:34

dcherian

v0.7.2

096c080

v0.7.2

New reductions: nanfirst, nanlast, nanargmax, nanargmin .

Please test before using in production

What's Changed

Support nanfirst, nanlast with simple combine algo by @dcherian in #240
Enable nanargmax, nanargmin by @dcherian in #171

Full Changelog: v0.7.1...v0.7.2

Contributors

dcherian

Assets 2

08 May 17:57

dcherian

v0.7.1

4164712

v0.7.1

What's Changed

Bugfix!

Check method only for dask reductions. by @dcherian in #241

Full Changelog: v0.7.0...v0.7.1

Contributors

dcherian

Assets 2

05 May 12:51

dcherian

v0.7.0

622ddb2

v0.7.0

What's Changed

Optimizations and performance improvements.

Always factorize early by @dcherian in #234
Handle min_count=0 by @dcherian in #238
Optimize broadcasting by @dcherian in #230

Full Changelog: v0.6.10...v0.7.0

Contributors

dcherian

Assets 2

26 Mar 20:14

dcherian

v0.6.10

24dc7fd

v0.6.10

Small performance improvement.

What's Changed

Always reindex=True for all numpy inputs by @dcherian in #228

Full Changelog: v0.6.9...v0.6.10

Contributors

dcherian

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Changes

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Major performance improvements!!!

Other Major Changes

Minor Changes

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: xarray-contrib/flox

v0.8.5

v0.8.4

What's Changed

Changes

Contributors

v0.8.3

What's Changed

New Contributors

Contributors

v0.8.2

What's Changed

Contributors

v0.8.1

v0.8.0

What's Changed

Major performance improvements!!!

Other Major Changes

Minor Changes

New Contributors

Contributors

v0.7.2

What's Changed

Contributors

v0.7.1

What's Changed

Contributors

v0.7.0

What's Changed

Contributors

v0.6.10

What's Changed

Contributors