-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EGGO-30] Generate partitioned data. #33
base: master
Are you sure you want to change the base?
Changes from all commits
bdb7ad8
ce6e215
0366ec2
d5f2d98
1cad68c
1e23617
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -99,6 +99,12 @@ def _install_adam(): | |
run('mvn clean package -DskipTests') | ||
|
||
|
||
def _install_adam_partitioning(): | ||
run('mkdir -p /root/adam-partitioning') | ||
with cd('/root/adam-partitioning'): | ||
run('wget https://github.com/tomwhite/adam-partitioning/raw/master/lib/adam-partitioning-0.0.1-SNAPSHOT-job.jar') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a temporary thing until the patches merge in ADAM/kite? If this is to be permanent, I'd support putting this in either ADAM or here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBD, I need to look into the Spark failure more. Hopefully we can get the ADAM version working and then we won't need this. |
||
|
||
|
||
def _install_eggo(fork='bigdatagenomics', branch='master'): | ||
# check out eggo | ||
with cd('~'): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this file be combined with the other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I want to have an example that exercises the partitioning. |
||
"name": "test-1kg-genotypes-subset", | ||
"title": "Test 1000 Genomes Project VCF data", | ||
"dag": "VCF2ADAMTask", | ||
"editions": ["basic", "flat", "locuspart", "flat_locuspart"], | ||
"numPartitionsHint": 36, | ||
"sources": [ | ||
{"format": "vcf", "compression": true, "url": "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr22.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz"} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine for now, but perhaps there's a way to do this with inheritance/mixins instead.