You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been bench marking the loading of SRA records using the VDB API to stream sequence data (no quality or other info) on AWS. Similar to the fasterq-dump strategy, I am attempting to read each SRA record in parallel, but using the Message Passing Interface (MPI) instead of just threads. Each MPI rank opens and reads a non-overlapping slice of an SRA record.
For a number of parallel MPI ranks gets larger than about 32, I've noticed that VDBManagerPathType() starts returning kptNotFound for about 10% of the MPI processes. I've been able to work around this by retrying the call to VDBManagerPathType() after waiting 5 seconds. Is there a good way to read an SRA record in parallel, ideally using 100's of independent, but concurrent, processes? I am interested in extracting reads from an SRA file as fast as AWS will allow.
I was assuming that the data is stored in an S3 bucket and that parallel access would be okay. I'm not exactly sure where the data is being stored, since the srapath command returns: https://locate.ncbi.nlm.nih.gov/sdlr/sdlr.fcgi?jwt=<long string of characters removed>.
The text was updated successfully, but these errors were encountered:
I have been bench marking the loading of SRA records using the VDB API to stream sequence data (no quality or other info) on AWS. Similar to the
fasterq-dump
strategy, I am attempting to read each SRA record in parallel, but using the Message Passing Interface (MPI) instead of just threads. Each MPI rank opens and reads a non-overlapping slice of an SRA record.For a number of parallel MPI ranks gets larger than about 32, I've noticed that
VDBManagerPathType()
starts returningkptNotFound
for about 10% of the MPI processes. I've been able to work around this by retrying the call toVDBManagerPathType()
after waiting 5 seconds. Is there a good way to read an SRA record in parallel, ideally using 100's of independent, but concurrent, processes? I am interested in extracting reads from an SRA file as fast as AWS will allow.I was assuming that the data is stored in an S3 bucket and that parallel access would be okay. I'm not exactly sure where the data is being stored, since the
srapath
command returns:https://locate.ncbi.nlm.nih.gov/sdlr/sdlr.fcgi?jwt=<long string of characters removed>
.The text was updated successfully, but these errors were encountered: