Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Fix read_bgen performance issues #9

Open
eric-czech opened this issue Aug 18, 2020 · 2 comments
Open

Fix read_bgen performance issues #9

eric-czech opened this issue Aug 18, 2020 · 2 comments

Comments

@eric-czech
Copy link
Collaborator

This wasn't happening before updating to bgen-reader 4.0.5, but now I can no longer read files without memory seemingly leaking with no bound. Memory usage when running bgen-reader directly is very low, see limix/bgen-reader-py#30 (comment). I'm trying to read the same file in both cases so I think there must be something going on related to @horta's changes that make what we're doing in this repo problematic now.

@eric-czech
Copy link
Collaborator Author

Ahh it looks like this is actually a mistake I made in chunking. I didn't know this, but if you rechunk an array defined with da.from_array, it still (apparently) tries to read the array chunks with the original shape first which in this case is a problem because the reader pulls all the samples into memory before slicing them off. Memory usage is fine if the chunks are passed directly to da.from_array instead.

Now I've just got to figure out why what takes ~20 mins with bgen-reader takes ~2 hrs with our wrapper around it.

@eric-czech eric-czech changed the title Fix read_bgen memory leak Fix read_bgen performance issues Aug 18, 2020
@eric-czech
Copy link
Collaborator Author

Note: #12 improves memory usage but I'll close this when we can avoid some of the overhead with individual variant reads, as mentioned in limix/bgen-reader-py#30.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant