Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcbio-vc docker image not able to find mounted biodata containing reference genome files #9

Open
marrojwala opened this issue May 20, 2021 · 0 comments

Comments

@marrojwala
Copy link

Hi!

I am trying to incorporate bcbio-vc docker image onto my pipeline manager and for that I am trying to run and test variant calling on bcbio-vc docker image but I see that bcbio_nextgen.py is unable to find the genome builds in the bcbio installation.

This is how I ran it:

  1. Created ~/bcbio/biodata/genomes and ~/bcbio/biodata/galaxy directories on a local system which would be mounted to the docker container. And also created another directory ~/bcbio-test as the scratch for my test .
  2. Started a docker container using docker run -ti -v ~/bcbio/biodata:/mnt/biodata -v ~/bcbio-test:/data quay.io/bcbio/bcbio-vc
  3. In the container, I ran bcbio_nextgen.py upgrade -u skip --genomes hg38 --genomes mm10 --aligners bwa to download the reference genomes which were downloaded to /usr/local/share/bcbio-nextgen/genomes and corresponding galaxy directory was updated at /usr/local/share/bcbio-nextgen/galaxy. Attaching the tail of the stdout
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}, {'dbkey': 'mm10', 'name': 'Mouse (mm10)', 'indexes': ['seq', 'twobit'], 'annotations': ['problem_regions', 'prioritize', 'dbsnp', 'vcfanno', 'transcripts', 'rmsk', 'mirbase']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full, Mouse (mm10)
bcbio-nextgen data upgrade complete.
Upgrade completed successfully.
  1. Then started to run this tutorial in the /data directory and it failed with the following error:
root@edc1034c416f:/data/cancer-dream-syn3/work# bcbio_nextgen.py ../config/cancer-dream-syn3.yaml -n 8
Running bcbio version: 1.2.4
global config: /data/cancer-dream-syn3/work/bcbio_system.yaml
run info config: /data/cancer-dream-syn3/config/cancer-dream-syn3.yaml
[2021-05-20T00:31Z] System YAML configuration: /data/cancer-dream-syn3/work/bcbio_system-merged.yaml.
[2021-05-20T00:31Z] Locale set to C.UTF-8.
[2021-05-20T00:31Z] Resource requests: bwa, sambamba, samtools; memory: 4.00, 4.00, 4.00; cores: 16, 16, 16
[2021-05-20T00:31Z] Configuring 1 jobs to run, using 8 cores each with 32.1g of memory reserved for each job
[2021-05-20T00:31Z] Timing: organize samples
[2021-05-20T00:31Z] multiprocessing: organize_samples
[2021-05-20T00:31Z] Using input YAML configuration: /data/cancer-dream-syn3/config/cancer-dream-syn3.yaml
[2021-05-20T00:31Z] Checking sample YAML configuration: /data/cancer-dream-syn3/config/cancer-dream-syn3.yaml
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/usr/local/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 128, in variant2pipeline
    [x[0]["description"] for x in samples]]])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 459, in organize_samples
    return run_info.organize(*args)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 81, in organize
    item = add_reference_resources(item, remote_retriever)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 177, in add_reference_resources
    data["dirs"]["galaxy"], data)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/genome.py", line 233, in get_refs
    galaxy_config, data)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/genome.py", line 180, in _get_ref_from_galaxy_loc
    (genome_build, os.path.normpath(loc_file)))
ValueError: Did not find genome build hg38 in bcbio installation: /data/cancer-dream-syn3/work/tool-data/sam_fa_indices.loc

I am not sure this is how this docker image is intended to be used but I see that the bcbio installation is all-encompassing based on the Dockerfile. After some sleuthing, I think the issue might be that the way bcbio_nextgen.py is getting the base intallation directory from this function which is causing it to look for .loc file at /data/cancer-dream-syn3/work/tool-data/sam_fa_indices.loc instead of /usr/local/share/bcbio-nextgen/galaxy/tool-data/sam_fa_indices.loc

Let me know if you have questions re: the same and Thanks in Advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant