-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update vov description for flattened_data other than array<1>
#8
Conversation
@@ -73,16 +73,16 @@ Flat ``n``-dimensional arrays are stored as ``n``-dimensional HDF5 datasets. | |||
|
|||
A vector of vectors of unqual sizes is stored as an HDF5 group that contains two datasets: | |||
|
|||
* A 1-dimensional dataset `flattened_data` that stores the concatenation of all vectors into a single vector. | |||
* A 1-dimensional dataset `cumulative_length` that stores the cumulative sum of the length of all vectors. | |||
* An array-like dataset `flattened_data` that stores the concatenation of all vectors into a single vector. Can be `*array<n>{...}`, `table{...}`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flattened data should always be a one-dimensional vector, I think. If we want to support vectors of multi-dimensional arrays of non-equal size, we need an additional dataset that provides the size of dim 2 to n of each member arrays (in Julia, in ArraysOfArrays.VectorOfArrays
we use an additional vector kernel_size
for this).
@jasondet what is again the use case you have in mind? I'm not sure your proposed changes include this, but I would like to have an additional dimension in
I also believe, like @oschulz says, that this should not be represented by a structure of nested LGDOs, but should rather be a single LGDO. Would also make it easier to read in as an Awkward Array in legend-pydataobj (we could then rewrite @oschulz do you have a proposal in mind? |
For reference, this is how Awkward folks recommend writing arrays to disk: https://awkward-array.org/doc/main/reference/generated/ak.to_buffers.html#ak-to-buffers |
So do you want vectors-of-vectors-of-vectors (this we can do already), or a vector of two-dimensional arrays of varying size (this we don't cover yet)? |
vectors-of-vectors-of-vectors. Do we really support this already? How? I'm confused |
Yes, we do - the Julia event-tier files use it (multiple hits in multiple LAr-channels in multiple events). It worked almost out of the box (had to do one small bugfix). The LH5 datatype is simply So our vectors-of-vectors are just "naturally" nestable. |
To clarify this a bit more: A vector-of-vectors-of-vectors, in this scheme (and I think this is natural) is a vector-of-(vectors-of-vectors). And so is can simply be constructed using a vector-of-vectors as it's flattened content. In Julia, we use |
Well, it's actually not documented: https://legend-exp.github.io/legend-data-format-specs/dev/hdf5/#Vector-of-vectors. Anyways, i will put this legend-pydataobj development in my ToDo list. We need to switch to Awkward arrays for this. |
True, the example there was only for vector-of-vector-of-reals. I've added how nesting works to the documentation now. |
Nice thanks! |
@jasondet should we close this for now, then? |
yeah should be okay thx oli |
No description provided.