-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout waiting for connection from pool for 1000 genomes vcf on AWS #1951
Comments
Sigh, I am seeing this too... |
@fnothaft how are you running? Are you on EMR or through toil on standard aws instances? Apparently EMR dropped support for s3a. However, I can still loadAlignments from s3a, but not vcfs. Fortunately, s3 works just fine for vcfs (but is sloww) |
When did that happen? And at a specific version of EMR?
Practically, conductor is still a good solution for s3 → HDFS, and is faster than s3-dist-cp. Conductor can't upload directories of Parquet+Avro from HDFS → s3 though, so you'd need to fall back to s3-dist-cp for that. |
I'm not sure when s3a was dropped from. @delagoya may know more, as they were my informant. |
Are you able to use s3n? |
I am researching with the EMR team about what is the supported URL encodings. |
Was just passing through. Hopefully everyone has seen this page but linking just in case: Interesting that s3:// on EMR is slower than s3a:// considering EMRFS (EMR's proprietary S3 impl) is one of it's selling points. You might be able to use s3a URL's consistently by setting the following parameters:
Link: This is all untested but I might give this a whirl when I get a moment and see if I can get this working and post results here. |
@dstockstad Thanks for the note! Where do those properties need to be specified? |
You're going to want to do it using the instructions here: The settings go into core-site. So something like this:
Keep in mind that I still have not actually verified this so can't say for sure whether it will work and might also need additional configuration. |
val x = sc.loadGenotypes("s3a://1000genomes/phase1/analysis_results/integrated_call_sets/ALL.chr17.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz")
generates error Unable to execute HTTP request: Timeout waiting for connection from pool
with net.fnothaft:jsr203-s3a:0.0.2.
This error was tested with Hadoop-BAM 7.9.2 and 7.9.1
The text was updated successfully, but these errors were encountered: