Inefficiency when reading vector<vector<float>> #327

lwpiotr · 2021-04-07T15:23:11Z

lwpiotr
Apr 7, 2021

Hello,

I'm new to uproot and either I found that I don't know how to use it optimally, or I found a significant inefficiency.
In my TTree there are many branches, but almost all of the data is concentrated in a vector, where trace is a class with 6 vector inside. ROOT TTree essentially splits it into 6 branches with vector<vector>. This is not what TTree was designed for and ROOT needs to read the whole vector at once, unfortunately (see https://root-forum.cern.ch/t/the-optimal-way-to-store-variable-tracks-count-in-a-ttree if interested), but with uproot I've stumbled upon a very significant slowdown.

The TTree has 1001 events, vectors are variable, but in every entry their dimension is roughly 180*1000. The TTree is 4.5 GB large and almost all of it are those vectors. Compression is off.

Reading just 1 of those 6 vector<vector> in a single entry takes ROOT roughly 0.07s. Uproot3 takes 3 seconds, Uproot4 takes 1s.

The code is (where event=500):
root: entries = t.Draw("traces.SimSignal_X", "", "goff", 1, event-1)
uproot3: energy_root = t.array("traces.SimSignal_X")[event-1]
uproot4: energy_root = t["traces.SimSignal_X"].array(entry_start=event-1, entry_stop=event)

However, when I try to read the 3 of the vector<vector> and do some multiplication, it takes ROOT around 2 s, uproot3 9 s, and with uproot4 it is so long that I gave up on waiting. The code is:
root: entries = t.Draw("traces[100].SimSignal_X[150]*traces[100].SimSignal_Y[150]*traces[100].SimSignal_Z[150]*traces[10].SimSignal_X[150]*traces[50].SimSignal_Z[150]", "", "goff")
uproot3:

a = t.arrays(["traces.SimSignal_X", "traces.SimSignal_Y", "traces.SimSignal_Z"])
energy_root = np.array(a[b"traces.SimSignal_X"][:][150][100],copy=False)*np.array(a[b"traces.SimSignal_Y"][:][150][100],copy=False)*np.array(a[b"traces.SimSignal_Z"][:][150][100],copy=False)*np.array(a[b"traces.SimSignal_X"][:][150][10],copy=False)*np.array(a[b"traces.SimSignal_Z"][:][150][50], copy=False)

uproot4:
I actually finally gave up on a less demanding example, which was also running endlessly:
energy_root1 = t["traces.SimSignal_X"].array()[:,0,150]

The TTree printout for the relevant part is:

*............................................................................*
*Br   95 :traces    : Int_t traces_                                          *
*Entries :     1001 : Total  Size=      97909 bytes  File Size  =      20020 *
*Baskets :      143 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   96 :traces.SimEfield_X : vector<float> SimEfield_X[traces_]            *
*Entries :     1001 : Total  Size=  726715213 bytes  File Size  =  726703274 *
*Baskets :      575 : Basket Size=    1783808 bytes  Compression=   1.00     *
*............................................................................*
*Br   97 :traces.SimEfield_Y : vector<float> SimEfield_Y[traces_]            *
*Entries :     1001 : Total  Size=  726715213 bytes  File Size  =  726703274 *
*Baskets :      575 : Basket Size=    1783808 bytes  Compression=   1.00     *
*............................................................................*
*Br   98 :traces.SimEfield_Z : vector<float> SimEfield_Z[traces_]            *
*Entries :     1001 : Total  Size=  726715213 bytes  File Size  =  726703274 *
*Baskets :      575 : Basket Size=    1783808 bytes  Compression=   1.00     *
*............................................................................*
*Br   99 :traces.SimSignal_X : vector<float> SimSignal_X[traces_]            *
*Entries :     1001 : Total  Size=  726714509 bytes  File Size  =  726702570 *
*Baskets :      575 : Basket Size=    1783808 bytes  Compression=   1.00     *
*............................................................................*
*Br  100 :traces.SimSignal_Y : vector<float> SimSignal_Y[traces_]            *
*Entries :     1001 : Total  Size=  726714509 bytes  File Size  =  726702570 *
*Baskets :      575 : Basket Size=    1783808 bytes  Compression=   1.00     *
*............................................................................*
*Br  101 :traces.SimSignal_Z : vector<float> SimSignal_Z[traces_]            *
*Entries :     1001 : Total  Size=  726714509 bytes  File Size  =  726702570 *
*Baskets :      575 : Basket Size=    1783808 bytes  Compression=   1.00     *
*............................................................................*

Am I doing something wrong, especially comparing uproot3 to uproot4, or did I stumble upon some uproot problem? I will be grateful for any advice.

jpivarski · 2021-04-07T15:52:57Z

jpivarski
Apr 7, 2021
Maintainer

The slowness of more-than-one nesting of variable-length lists is a known issue. The problem is that it's serialized in a different way (non-columnar), so the NumPy tricks that work for std::vector<float> don't work for std::vector<std::vector<float>>. The latter needs to fall back to pure Python code, which is a huge bottleneck.

Here's where I talked about this problem at 2019 CHEP (Figure 3 is based on a hacked performance study): https://arxiv.org/abs/2001.06307 and this year: https://arxiv.org/abs/2102.13516 (Figure 1 is using AwkwardForth, a new formalism that is yet to be integrated into Uproot).

Uproot 3 could read these data faster because it delayed deserialization: it made Awkward 0 ObjectArrays, breaking the Awkward Array formalism so that you couldn't slice it like other arrays. Having a different public interface because of an internal difference in ROOT serialization was itself a problem: I ended up having to explain/apologize for this interface difference a lot. In Awkward 1, it was a goal to have all arrays behave the same way, regardless of where they came from. However, that means that Uproot 4 must deserialize the std::vector<std::vector<...>> up-front, not delayed until they're used in a calculation.

Since your code selects one element, it only deserializes that one in Uproot 3, whereas Uproot 4 has to deserialize everything to give you that one. That's why the speed is different.

I'm working on file-writing at the moment, but integrating AwkwardForth so that Uproot 4 will be able to deserialize these objects as fast as ROOT (Figure 1 of that new paper). AwkwardForth is a few times slower than compiled, optimized C++, but many times faster than pure Python and about as fast as the data transfer from RAM to CPU (so computation is not the primary bottleneck, especially if there's any decompression involved).

2 replies

lwpiotr Apr 7, 2021
Author

Thanks! I don't need this solution now, especially that due to very small performance gain when using ROOT compared to HDF5 in our case we decided to go with HDF5. However, some time ago when I tried only uproot3 I stumbled somewhere upon the explanation that you gave about and a suggestion that the looping will be done in C++ in uproot4. Thus I was surprised to see a worse performance in uproot4 and decided to report. But now I understand. Thanks again.

Btw. I understand that uproot does not do any magic that would allow it to read just a part of vector<vector> instead of the whole vector?

jpivarski Apr 7, 2021
Maintainer

Btw. I understand that uproot does not do any magic that would allow it to read just a part of vector instead of the whole vector?

You can select only one entry, but you've already done that.

The thing I was talking about with someday compiling these loops is what has evolved into AwkwardForth, but still there's that last step of integrating AwkwardForth into Uproot. (Uproot has to generate Forth code for each type of data to deserialize, instead of generating Python code.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficiency when reading vector<vector<float>> #327

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Inefficiency when reading vector<vector<float>> #327

lwpiotr Apr 7, 2021

Replies: 1 comment · 2 replies

jpivarski Apr 7, 2021 Maintainer

lwpiotr Apr 7, 2021 Author

jpivarski Apr 7, 2021 Maintainer

lwpiotr
Apr 7, 2021

Replies: 1 comment 2 replies

jpivarski
Apr 7, 2021
Maintainer

lwpiotr Apr 7, 2021
Author

jpivarski Apr 7, 2021
Maintainer