Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LGDO format conversion utilities #30

Merged
merged 54 commits into from
Dec 30, 2023

Conversation

MoritzNeuberger
Copy link
Contributor

@MoritzNeuberger MoritzNeuberger commented Nov 2, 2023

The idea is to add a convert function to each LGDO datatype that converts the underlying data to a third-party datatype. These are pandas.DataFrame, numpy.ndarray and awkward.Array. Additionally, you have the option to control whether convert copies data or not.

This version is not feature-complete. These issues are still open:

  • Use to_aoesa to convert VectorOfVectors to numpy.ndarray?
    --> in the future, use awkward arrays for pandas conversion,
  • How to implement the conversion of structures/tables to numpy.ndarray?
    --> solved it by not implementing it and throwing a TypeError
  • How to implement the convert function for WaveformTable and encoded data?
    --> just inherits from Tabel which has a generic function looping over columns, converting them and merging them.
  • Find out how to implement units with pint. Is it possible for awkward arrays?
    --> partially possible, not awkward implementations and implementations relying on awkward such as some pandas conversions
  • Write many, many tests.

MoritzNeuberger and others added 5 commits November 2, 2023 16:37
…utilities

The idea is to add a `convert` function to each LGDO datatype that converts the underlying data to a third-party datatype.
These are `pandas.DataFrame`, `numpy.ndarray` and `awkward.Array`.
Additionally, you have the option to control whether `convert` copies data or not.

At the moment, these issues are still open:

[ ] How to use `to_aoesa` to convert VectorOfVectors to `numpy.ndarray`?
[ ] How to implement the conversion of structures/tables to `numpy.ndarray`?
[ ] How to implement the `convert' function for WaveformTable and encoded data?
[ ] Find out how to implement units with pint. Is it possible for awkward arrays?
[ ] Write many, many tests.
…berger/legend-pydataobj into issue_4_lgdo_format_conversion
Copy link

codecov bot commented Nov 3, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (772fe17) 74.87% compared to head (f5a317c) 75.49%.
Report is 21 commits behind head on main.

Files Patch % Lines
src/lgdo/types/table.py 78.78% 7 Missing ⚠️
src/lgdo/types/encoded.py 87.09% 4 Missing ⚠️
src/lgdo/compression/radware.py 33.33% 2 Missing ⚠️
src/lgdo/compression/varlen.py 50.00% 1 Missing ⚠️
src/lgdo/types/array.py 95.83% 1 Missing ⚠️
src/lgdo/types/fixedsizearray.py 66.66% 1 Missing ⚠️
src/lgdo/types/lgdo.py 83.33% 1 Missing ⚠️
src/lgdo/types/scalar.py 50.00% 1 Missing ⚠️
src/lgdo/types/struct.py 88.88% 1 Missing ⚠️
src/lgdo/types/waveformtable.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #30      +/-   ##
==========================================
+ Coverage   74.87%   75.49%   +0.61%     
==========================================
  Files          27       28       +1     
  Lines        2209     2379     +170     
==========================================
+ Hits         1654     1796     +142     
- Misses        555      583      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gipert gipert added the types LGDO types label Nov 3, 2023
@gipert gipert added this to the v2 milestone Nov 3, 2023
@gipert gipert linked an issue Nov 3, 2023 that may be closed by this pull request
@gipert gipert changed the title Attempt to implement Issue #4 - Add LGDO format conversion utilities Add LGDO format conversion utilities Nov 3, 2023
@gipert gipert marked this pull request as draft November 6, 2023 15:50
@MoritzNeuberger
Copy link
Contributor Author

MoritzNeuberger commented Nov 7, 2023

Results of some discussion

VectorOfVectors: For now, we made a conversion to pandas.DataFrame, and numpy.ndarray raise a TypeError, saying that this conversion is not supported for VectorOfVectors. In the future, we can use the awkward.Array backend to convert to numpy.ndarray using to_numpy in allow_missing mode.

Struct: The conversion of Struct to numpy.ndarray raises a TypeError. The conversion to numpy.ndarray is not clear as you can not trivially contain the column names inside the ndarry. An idea would be to return a tuple with 1st entry, the ndarray containing the values, and 2nd a list of key names. The issue is that the columns are not sorted, and a clear order must be kept. The conversion to pandas.DataFrame and awkward.Array is implemented. However, it only works when the columns are of equal length, meaning it works only for a Table structure.

Table: Same as Struct, though here the conversion to pandas.DataFrame and awkward.Array should not cause issues.

WaveformTable: For pandas.DataFrame and awkward.Array, turn the lgdo objects of the class into numpy arrays, then enter them in a dict and hand this dict to the standard converter. For numpy.ndarray, define a specific output format in the form of a ndarray, where the columns of the ndarray correspond to the array of the internal data.

Encoded: Try to keep the return behavior of convert similar for ArrayOfEncodedEqualSizedArrays and VectorOfEncodedVectors. Similar approach as WaveformTable.

Units: So far, pint is only supported for numpy.ndarray and pandas.DataFrame. We will add an option to convert to keep the units or not called with_units. In the case of awkward.Array, this will raise an error such that people are aware that this combination is not possible.

Copy: We will remove the option for the user to control whether to copy or not. By default, as good as possible, the zero-copy option will be preferred, and it will be up to the user to copy the data if necessary. Of a zero-copy option is not available for a certain data type, it will be indicated in the docstring.

@gipert gipert requested review from jasondet and iguinn December 1, 2023 20:17
@gipert
Copy link
Member

gipert commented Dec 1, 2023

@jasondet @iguinn could you please have a look at this? I would like to merge it soon.

@gipert gipert marked this pull request as ready for review December 1, 2023 20:21
@gipert
Copy link
Member

gipert commented Dec 1, 2023

@MoritzNeuberger we forgot about units in Struct and derived...

@gipert
Copy link
Member

gipert commented Dec 2, 2023

@MoritzNeuberger we forgot about units in Struct and derived...

Nevermind, I did not look carefully...

@gipert
Copy link
Member

gipert commented Dec 5, 2023

Also missing: mention to view_as() in the Jupyter notebook.

setup.cfg Outdated Show resolved Hide resolved
@gipert
Copy link
Member

gipert commented Dec 30, 2023

I need to start using this around. I will merge and tag an alpha release

@gipert gipert merged commit 021f397 into legend-exp:main Dec 30, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types LGDO types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add LGDO format conversion utilities
3 participants