Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sra download seems to be broken now - some of the ftp links lead nowhere #145

Open
sergpolly opened this issue Nov 13, 2019 · 11 comments
Open

Comments

@sergpolly
Copy link
Member

@Marlies1993 was running a distiller with some SRA-s as an input and the pipeline kept crashing at the sra step...
After closer inspection it appears that some of the links of this form: https://github.com/mirnylab/distiller-nf/blob/01f6f7bbc4b1edfc3634c131f709b08a40164c74/distiller.nf#L176
are broken ...

for example, take SRR027959 from 2009 hic paper:

venevs@vangogh ➜  ~ wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR027/SRR027959/SRR027959.sra                 
--2019-11-13 18:12:32--  ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR027/SRR027959/SRR027959.sra
           => ‘SRR027959.sra.1’
Resolving ftp-trace.ncbi.nlm.nih.gov (ftp-trace.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::11
Connecting to ftp-trace.ncbi.nlm.nih.gov (ftp-trace.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /sra/sra-instant/reads/ByRun/sra/SRR/SRR027/SRR027959 ... 
No such directory ‘sra/sra-instant/reads/ByRun/sra/SRR/SRR027/SRR027959’.

I don't know enough about sra-s and why are we downloading them using wget - anyone ?

@Marlies1993 can comment and provide other examples if needed

@Phlya
Copy link
Member

Phlya commented Nov 13, 2019

Had the same problem. Here is what SRA say about it.
Screenshot_20191113-234245

@meoomen
Copy link

meoomen commented Nov 13, 2019

Thanks @Phlya! When I checked the presence of the sra links one by one, I also found that some of them were missing from ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR. When I removed the sra's that weren't present from my project.yml, my distiller project is running fine, but this doesn't solve the problem of course...

@Phlya
Copy link
Member

Phlya commented Nov 13, 2019

I just downloaded those missing ones manually... Wget download is much faster than the regular sra tools, but maybe in case of this problem distiller should fall back to fastq-dump?

@meoomen
Copy link

meoomen commented Nov 13, 2019

Yes, I figured I will have to do manual download for now as well. Thanks!

@golobor
Copy link
Member

golobor commented Nov 14, 2019 via email

@golobor
Copy link
Member

golobor commented Nov 14, 2019

Btw, here is another trick to force using fastq-dump, in project.yml specify input as:
library1:
lane1:
- sra:SRR0123456?start=1

@sergpolly
Copy link
Member Author

Thank you @golobor and @Phlya ! that was uber quick!

Does it look good?

we are about to try it - we'll let you know here

Any other fixes we could implement to the downloading process (i.e. try multiple URLs), while we're at it?

hmmmm - I do it so rarely that I don't really know what to say ... maybe @Phlya have suggestions ? MirnyLab people ?
If anything i would like a reminder why aren't we doing it the nextflow way ? https://www.nextflow.io/docs/edge/channel.html#fromsra https://www.nextflow.io/blog/2019/release-19.03.0-edge.html - is it because we didn't have time to do it - or because there is something wrong with it ?

@golobor
Copy link
Member

golobor commented Nov 14, 2019 via email

@sergpolly
Copy link
Member Author

worked for me on the original 2009 Hi-C data, - I guess @Marlies1993 would report here once she tries it as well:
Screenshot from 2019-11-14 13-07-01

Thank you, again!

maybe it's worth switching, anyone is interested in implementing? :)

sounds fun to me - should simplify some of the distiller.nf code - not sure about timeline requirements though ...

@meoomen
Copy link

meoomen commented Nov 14, 2019

Thanks for fixing this so quickly! All my sra's are downloading and mapping as well.

@golobor
Copy link
Member

golobor commented Apr 30, 2020

a more reliable fix: 55b5e6e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants