Add WaveformBrowser compatibility to DataLoader #484

iguinn · 2023-05-01T21:00:56Z

Implemented DataLoader.browse, added DataLoader.load_iterator, and made necessary changes to LH5Iterator and WaveformBrowser to make these work well.

Details:

Improved ability of LH5Iterator to work across tiers and groups. It can now take a list of groups (enabling different groups from the same file). It can also "friend" together multiple LH5Iterators to read them in parallel, much like friending TChains in ROOT
lgdo.expand_path can now work with a base_path. This fixed some bugs associated with using LH5_Store with a base path
Added load_iterator to DataLoader. This constructs and returns an LH5Iterator across all tiers/groups/files. This can be used for iterating over files in chunks (although I see @gipert working on something similar). Right now this is only implemented for hits mode. This also doesn't solve the problem of needing to separately load the entry list first (some sort of lazy entry_list would be really nice here). Since the WaveformBrowser is built around LH5 iterator, this is useful to have.
WaveformBrowser can now accept an LH5Iterator as an input as an alternative to a file list and group list
Implemented DataLoader.browse to return a WaveformBrowser object. Note that this ignores the field_mask created by the DataLoader in favor of constructing it from the requested lines to draw.
Made LH5Iterator initialize faster by being "lazy" about mapping files. Before, a map of file lengths and entries in each file was constructed on start up; with large numbers of files, this caused a long hang from opening up all of the files and reading the size of each group. Now, the maps are constructed as you iterate through files; this gets rid of the hang on start up (although you still see a hang whenever it reaches a new file). This is a quality of life improvement for waveform browsing mainly.

… initialization

codecov · 2023-05-01T21:10:56Z

Codecov Report

Patch coverage: 70.05% and project coverage change: +0.23 🎉

Comparison is base (ecff792) 47.93% compared to head (ab95295) 48.17%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     legend-exp/pygama#484      +/-   ##
==========================================
+ Coverage   47.93%   48.17%   +0.23%     
==========================================
  Files         104      104              
  Lines       12314    12443     +129     
==========================================
+ Hits         5903     5994      +91     
- Misses       6411     6449      +38

Impacted Files	Coverage Δ
src/pygama/flow/data_loader.py	`75.93% <38.46%> (-3.58%)`	⬇️
src/pygama/vis/waveform_browser.py	`71.03% <64.28%> (+1.03%)`	⬆️
src/pygama/lgdo/lh5_store.py	`86.23% <80.53%> (+0.05%)`	⬆️
src/pygama/lgdo/compression/radware.py	`21.53% <100.00%> (ø)`
src/pygama/lgdo/compression/varlen.py	`49.60% <100.00%> (ø)`
src/pygama/lgdo/lgdo_utils.py	`96.36% <100.00%> (+0.36%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

gipert · 2023-05-02T08:15:13Z

Hi Ian, very nice!

Don't you think we should replace .next() with your .load_iterator()?

some sort of lazy entry_list would be really nice here

I totally agree.

gipert · 2023-05-02T19:10:20Z

Oh, seems like Numba throws a new warning in the latest release...

iguinn · 2023-05-03T01:31:41Z

The error that is currently causing the checks to fail is due to a new warning that appears in numba 0.57.0 (released yesterday). I've submitted a bug report for it: numba/numba#8936. For now, I've added this to the list of numba versions to avoid in setup.

iguinn · 2023-05-03T02:07:34Z

Ok, now that all checks are passing again:

next() and load_iterator() currently aren't exactly the same, since next() returns data formatted as requested by the user, while get_iterator() returns only an LH5Iterator object (which returns lgdo.Table, i_entry, n_entries). It seems like next() will be easier for users, while LH5Iterator might be more useful as an internal function (it would be very useful for skimming files). Another option we could consider is to make the LH5Iterator one of the options for the output of load(). @gracesong312 or @jasondet do you have any thoughts?

Either way, if there are no objections I would suggest merging this quickly, as I think @erin717 is planning on using browse() in her tutorial.

gipert · 2023-05-03T09:39:26Z

In my view, DataLoader should always return LGDOs, which can be later viewed in alternative formats by the user (there is already Table.to_dataframe(), but we should add more functionality in the future, see legend-exp/legend-pydataobj#4). Looks a cleaner interface to me.

In this view, I like very much the idea that .load() returns an LGDO and .next() returns an LH5Iterator chunking the same LGDO.

Anyways, we can always change things at some point in the future, if you like the idea. You can go ahead and merge from my point of view!

iguinn and others added 20 commits April 29, 2023 09:36

Added ability to friend LH5 files from different tiers with LH5Iterator

cef0a3e

Added test for LH5Iterator friending

d57aa9d

Fixed bug in test

fcbe330

Gave expand_path a base_path argument

af8de28

Fixed test

91fe30a

Another fix to test

280829d

Fixed another test typo

a817b32

Another fix to test

42b2e59

Merge branch 'main' of https://github.com/legend-exp/pygama

983698f

Fix to base_path support

70e0eea

Added load_iterator to DataLoader

4aedfd9

Added ability to feed waveform_browser an LH5Iterator directly

915ac2b

Implemented brouse in DataLoader

d64e027

Fixed bug when trying to draw multiple waveforms with WaveformBrowser

89e34f4

LH5Iterator implements typing.Iterator

b068baa

LH5Iterator is now lazy about reading file sizes to increase speed of…

16fa350

… initialization

Fixed bug

21a1aec

Fixed aux_vals

8714b9c

Merge branch 'main' of https://github.com/legend-exp/pygama

5d1ff82

style: pre-commit fixes

7fc5e33

iguinn and others added 4 commits May 1, 2023 14:32

Fixed some flake8 complaints

8e53d45

style: pre-commit fixes

1ab4a64

Fixed flake8 complaint

77a804d

style: pre-commit fixes

c171182

gipert added vis Data visualization flow High-level data management labels May 2, 2023

gipert added this to the v1.3 milestone May 2, 2023

Merge branch 'main' into main

9d6ab3b

Merge branch 'legend-exp:main' into main

f95b338

iguinn added 2 commits May 2, 2023 14:57

Added nopython=True to compression algorithms

a0ac52b

Remove numba 0.57 from allowed versions

c579b7e

iguinn and others added 2 commits May 2, 2023 18:43

Added test for DataLoader.browse()

fd8f20f

style: pre-commit fixes

ab95295

iguinn merged commit f22d9f6 into legend-exp:main May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WaveformBrowser compatibility to DataLoader #484

Add WaveformBrowser compatibility to DataLoader #484

iguinn commented May 1, 2023

codecov bot commented May 1, 2023 •

edited

Loading

gipert commented May 2, 2023

gipert commented May 2, 2023

iguinn commented May 3, 2023

iguinn commented May 3, 2023

gipert commented May 3, 2023

Add WaveformBrowser compatibility to DataLoader #484

Add WaveformBrowser compatibility to DataLoader #484

Conversation

iguinn commented May 1, 2023

codecov bot commented May 1, 2023 • edited Loading

Codecov Report

gipert commented May 2, 2023

gipert commented May 2, 2023

iguinn commented May 3, 2023

iguinn commented May 3, 2023

gipert commented May 3, 2023

codecov bot commented May 1, 2023 •

edited

Loading