-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with reading files though coffea-casa xcache for large scale analysis #420
Comments
cc: @sihyunjeon |
@sihyunjeon Can you try to run again your test? I tried your reproducer and it works for me |
well... the problem is that once it's CACHED and works well, somebody else cannot reproduce it. One has to just dump large dataset in one go and observe. :( I'll see what else I can find |
Ok I got it :) how big should be a dataset? |
Roughly that json file size i gave as an example, was not working if you look at the ipynb file (but now it seems like it does). Maybe a question I can ask is -- is there somewhat a way to execute the ipynb lines in a safer way?
I took some parameters rather randomly, not sure if there is somewhat better setting to avoid such problem.. |
Many thanks for all your efforts!
@ikrommyd do you have a suggestion about best parameters settings for NanoAODs for preprocessing? |
For an actual analysis, typically |
@ikrommyd hey thanks, like 100K event, split into 10 chunks -> 10K events per chunk. |
Yeah so
|
Hi @oshadura https://github.com/sihyunjeon/test_coffea-casa/blob/main/2025_01_10/Untitled1.ipynb I've put another round of test there. It gives me error and also everytime with slightly different messages (probably depending on which sample gets read first).
is one example, the other one in the notebook is
I checked xrdcp to those files and they do exist "somewhere" but something breaks when trying to read |
User reported issue with running their analysis on coffea.casa with the large amounts of data samples:
OSError: File did not vector_read properly: [ERROR] Operation expired
The reproducer is https://github.com/sihyunjeon/test_coffea-casa
What it does is:
Now when it runs on all files given in the json (~500 files) it fails with the error message you see at the very end of ipynb file (the vector read error).
If you uncomment "# break # FIXME" in In[4], it will run on only 3 files and this has no issues on running ipynb.
The text was updated successfully, but these errors were encountered: