Add LGDO format conversion utilities #4

gipert · 2022-09-22T16:32:54Z

We should implement a method for each LGDO to convert underlying data to third-party formats like NumPy, Pandas, AwkwardArray. I'm thinking about something like:

lgdo_obj.convert(fmt="pandas.DataFrame", copy=False)

Where fmt could take pandas.DataFrame, numpy.ndarray, awkward.Array.

This way, we would store the conversion code along with the LGDO implementation and make it easier to jump between data representations (like in load_nda(), load_pd(), build_tcm(), the DataLoader, etc).

We need of course to make a distinction between copy and zero-copy conversions.

The text was updated successfully, but these errors were encountered:

gipert · 2022-10-03T11:12:27Z

I propose then to deprecate load_nda() (and load_pd()) in favor of:

store.read_object("obj", "file.lh5").convert(fmt="numpy.ndarray")

Which would return the same.

This new convert() functions should also handle units at some point. With numpy.ndarray, we could just use Pint's NumPy support and that should work. With pandas.DataFrame, we could use pint-pandas – but I'm not sure whether the package is fully functional.

MoritzNeuberger · 2023-10-30T11:39:38Z

I am confused about how the return type annotation would work in this case. Can you have a single function with multiple types of output depending on the input parameters?

gipert · 2023-10-30T13:06:24Z

Yes, it would look like this:

def convert(...) -> pandas.DataFrame | numpy.NDArray | ...:
    pass

MoritzNeuberger · 2023-11-02T15:11:15Z

Over the last few days, I have been playing around with implementing this feature. For the most part, it is straightforward, although a few questions arose:

VectorOfVectors:

To convert it to a numpy.ndarray, I now first convert it to an aoesa using to_aoesa and use its convert function. to_aoesa also uses np.empty to implement the nda, and when preserve_dtype is set to True we also have the problem that the previously empty entries are filled with random values. I assume it is not preferable to have preserve_dtype set to False in which case these values would be set to nan.

Struct/Table:

The implementation in numpy.ndarray is not easy either. For now, I solved it by returning a dict containing the key and value entries of Struct/Table in two separate numpy arrays. What would be a better way to implement this?

WaveformTable/encoded data:

Does it need convert?

copy:

I have implemented this option wherever possible. That is, always for pandas.DataFrame and when necessary for numpy.ndarrays. AFAIK awkward arrays usually do not copy?

ToDos:

Figure out how to implement units with pint. Is this possible with awkward arrays?
Write tests

MoritzNeuberger · 2023-11-02T15:15:11Z

I think it would be easier to see in code. I will prepare a PR with the status as it is at the moment.

…utilities The idea is to add a `convert` function to each LGDO datatype that converts the underlying data to a third-party datatype. These are `pandas.DataFrame`, `numpy.ndarray` and `awkward.Array`. Additionally, you have the option to control whether `convert` copies data or not. At the moment, these issues are still open: [ ] How to use `to_aoesa` to convert VectorOfVectors to `numpy.ndarray`? [ ] How to implement the conversion of structures/tables to `numpy.ndarray`? [ ] How to implement the `convert' function for WaveformTable and encoded data? [ ] Find out how to implement units with pint. Is it possible for awkward arrays? [ ] Write many, many tests.

gipert added the enhancement New feature or request label Sep 22, 2022

gipert self-assigned this Jan 12, 2023

gipert mentioned this issue Jan 30, 2023

DataLoader: handling of non-scalar data legend-exp/pygama#448

Merged

gipert mentioned this issue May 3, 2023

Add WaveformBrowser compatibility to DataLoader legend-exp/pygama#484

Merged

gipert transferred this issue from legend-exp/pygama May 23, 2023

gipert added this to the v2 milestone Oct 25, 2023

gipert removed their assignment Oct 25, 2023

gipert linked a pull request Nov 3, 2023 that will close this issue

Add LGDO format conversion utilities #30

Merged

5 tasks

gipert closed this as completed in #30 Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LGDO format conversion utilities #4

Add LGDO format conversion utilities #4

gipert commented Sep 22, 2022 •

edited

Loading

gipert commented Oct 3, 2022 •

edited

Loading

MoritzNeuberger commented Oct 30, 2023

gipert commented Oct 30, 2023

MoritzNeuberger commented Nov 2, 2023 •

edited

Loading

MoritzNeuberger commented Nov 2, 2023

Add LGDO format conversion utilities #4

Add LGDO format conversion utilities #4

Comments

gipert commented Sep 22, 2022 • edited Loading

gipert commented Oct 3, 2022 • edited Loading

MoritzNeuberger commented Oct 30, 2023

gipert commented Oct 30, 2023

MoritzNeuberger commented Nov 2, 2023 • edited Loading

MoritzNeuberger commented Nov 2, 2023

gipert commented Sep 22, 2022 •

edited

Loading

gipert commented Oct 3, 2022 •

edited

Loading

MoritzNeuberger commented Nov 2, 2023 •

edited

Loading