Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After recompiling from source get error 'Phred: negative error [2024-07-15 12:55:12] <EROR> probability -0.000000' #282

Open
harrymatthews50 opened this issue Jul 16, 2024 · 2 comments

Comments

@harrymatthews50
Copy link

Describe the bug
After recompiling the binaries from source for AVX-512 cores Octopus no longer runs as normal, producing error:
'Phred: negative error
[2024-07-15 12:55:12] probability -0.000000'

Octopus version
0.7.4

Context
Octopus makes up the backbone of our pipelines for somatic variant calling from cell free DNA from cerebral spinal fluid It's a truly excellent tool and we are therefore very interested in reducing the rather long runtimes previously reported by my colleague (#232)

I have already had quite some success by modifying the size of the read buffer. Runtimes are now usually <1day. I was hoping that recompiling from source (so as to better utilize the AVX-512 capable cores on my machine):

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             28
On-line CPU(s) list:                0-27
Vendor ID:                          GenuineIntel
Model name:                         Intel Xeon Processor (Skylake, IBRS)
CPU family:                         6
Model:                              85
Thread(s) per core:                 1
Core(s) per socket:                 1
Socket(s):                          28
Stepping:                           4
BogoMIPS:                           3990.62
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke md_clear arch_capabilities
Virtualization:                     VT-x
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          896 KiB (28 instances)
L1i cache:                          896 KiB (28 instances)
L2 cache:                           112 MiB (28 instances)
L3 cache:                           448 MiB (28 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-27
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Syscall hardening, KVM SW loop
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT Host state unknown

would further improve runtime.

Building Docker image/apptainer
I did so by making the following modifications to the Dockerfile.

  1. replaced the base image with ubuntu focal (as impish is not supported on the ubuntu archives anymore)
  2. removed the --architecture argument.
  3. copied the forests from local

the octopus code was pulled in the last few days from the master branch so should be up to date.

ARG ARCH="amd64"
FROM ${ARCH}/ubuntu:focal

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/London

# Get dependencies
RUN apt-get -y update \
    && apt-get -y install \
        build-essential \
        libboost-all-dev \
        libgmp-dev \
        cmake \
        libhts-dev \
        python3-pip \
        git \
    && pip3 install distro

# Install Octopus
ARG THREADS=4
#ARG CPU=haswell
COPY octopus /opt/octopus/
COPY forests/* /opt/octopus/resources/forests/
RUN /opt/octopus/scripts/install.py \
    --threads $THREADS 
   # --forest
  #  --architecture $CPU

# Cleanup git - only needed during install for commit info
RUN apt-get purge -y git \
    && rm -r /opt/octopus/.git \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/octopus/bin:${PATH}"

ENTRYPOINT ["octopus"]

During the build the compilation went smoothly without errors

#9 [5/6] RUN /opt/octopus/scripts/install.py     --threads 4
sr/bin/c++ -- works
#9 0.580 -- Detecting CXX compiler ABI info
#9 0.685 -- Detecting CXX compiler ABI info - done
#9 0.698 -- Detecting CXX compile features
#9 0.699 -- Detecting CXX compile features - done
#9 0.706 -- Build type: Release
#9 0.706 -- Installation prefix: /opt/octopus/bin
#9 0.706 -- Target architecture: x86_64
#9 1.286 -- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found suitable version "1.71.0", minimum required is "1.65") found components: iostreams 
#9 1.290 -- Linking against boost dynamic libraries
#9 1.303 -- Looking for pthread.h
#9 1.394 -- Looking for pthread.h - found
#9 1.394 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
#9 1.488 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
#9 1.488 -- Check if compiler accepts -pthread#9 0.201 -- The C compiler identification is GNU 9.4.0
#9 0.273 -- The CXX compiler identification is GNU 9.4.0
#9 0.279 -- Check for working C compiler: /usr/bin/cc
#9 0.370 -- Check for working C compiler: /usr/bin/cc -- works
#9 0.372 -- Detecting C compiler ABI info
#9 0.463 -- Detecting C compiler ABI info - done
#9 0.476 -- Detecting C compile features
#9 0.477 -- Detecting C compile features - done
#9 0.480 -- Check for working CXX compiler: /usr/bin/c++
#9 0.578 -- Check for working CXX compiler: /u
#9 1.583 -- Check if compiler accepts -pthread - yes
#9 1.584 -- Found Threads: TRUE  
#9 1.606 -- Found Boost: /usr/include (found suitable version "1.71.0", minimum required is "1.65") found components: system filesystem program_options date_time log_setup log iostreams timer thread regex chrono atomic 
#9 1.609 -- Boost include dir: /usr/include
#9 1.609 -- Boost libraries: /usr/lib/x86_64-linux-gnu/libboost_system.so/usr/lib/x86_64-linux-gnu/libboost_filesystem.so/usr/lib/x86_64-linux-gnu/libboost_program_options.so/usr/lib/x86_64-linux-gnu/libboost_date_time.so/usr/lib/x86_64-linux-gnu/libboost_log_setup.so/usr/lib/x86_64-linux-gnu/libboost_log.sooptimized/usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.71.0debug/usr/lib/x86_64-linux-gnu/libboost_iostreams.so/usr/lib/x86_64-linux-gnu/libboost_timer.so/usr/lib/x86_64-linux-gnu/libboost_thread.so-pthread/usr/lib/x86_64-linux-gnu/libboost_regex.so/usr/lib/x86_64-linux-gnu/libboost_chrono.so/usr/lib/x86_64-linux-gnu/libboost_atomic.so
#9 1.609 -- GMP_INCLUDES=/usr/include/x86_64-linux-gnu
#9 1.612 -- Found GMP: /usr/include/x86_64-linux-gnu (Required is at least version "5.1.0") 
#9 1.612 -- GMP include dir: /usr/include/x86_64-linux-gnu
#9 1.612 -- GMP libraries: /usr/lib/x86_64-linux-gnu/libgmp.so
#9 1.615 -- Found HTSlib 
#9 1.615 --    HTSlib include dirs: /usr/include
#9 1.615 --    HTSlib libraries: /usr/lib/x86_64-linux-gnu/libhts.so
#9 2.184 -- IPO is supported!
#9 2.188 -- Configuring done
#9 2.258 -- Generating done
#9 2.261 -- Build files have been written to: /opt/octopus/build
#9 2.310 Scanning dependencies of target libdivsufsort
#9 2.313 Scanning dependencies of target date-tz
#9 2.315 Scanning dependencies of target ranger
#9 2.323 [  1%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/sssort.c.o
#9 2.323 [  1%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/divsufsort.c.o
#9 2.327 [  1%] Building CXX object lib/date/CMakeFiles/date-tz.dir/tz.cpp.o
#9 2.329 [  1%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/Data.cpp.o
#9 2.578 [  1%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/trsort.c.o
#9 2.934 [  2%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/utils.c.o
#9 3.033 [  2%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/Forest.cpp.o
#9 3.111 [  2%] Linking C static library liblibdivsufsort.a
#9 3.159 [  2%] Built target libdivsufsort
#9 3.171 [  3%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/ForestClassification.cpp.o
#9 4.019 [  3%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/ForestProbability.cpp.o
#9 5.554 [  3%] Linking CXX static library libdate-tz.a
#9 5.602 [  3%] Built target date-tz
#9 5.616 Scanning dependencies of target tandem
#9 5.633 [  3%] Building CXX object lib/tandem/CMakeFiles/tandem.dir/tandem.cpp.o
#9 5.894 [  4%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/ForestRegression.cpp.o
#9 6.258 [  5%] Linking CXX static library libtandem.a
#9 6.306 [  5%] Built target tandem
#9 6.318 [  5%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/ForestSurvival.cpp.o
#9 6.774 [  5%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/Tree.cpp.o
#9 8.008 [  6%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/TreeClassification.cpp.o
#9 8.050 [  6%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/TreeProbability.cpp.o
#9 8.273 [  7%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/TreeRegression.cpp.o
#9 9.096 [  7%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/TreeSurvival.cpp.o
#9 9.492 [  7%] Building CXX object lib/ranger/CMakeFiles/ranger.dir/utility.cpp.o
#9 11.20 [  8%] Linking CXX static library libranger.a
#9 11.26 [  8%] Built target ranger
#9 11.63 Scanning dependencies of target octopus
#9 11.67 [  8%] Building CXX object src/CMakeFiles/octopus.dir/config/common.cpp.o
#9 11.67 [  8%] Building CXX object src/CMakeFiles/octopus.dir/config/config.cpp.o
#9 11.67 [  9%] Building CXX object src/CMakeFiles/octopus.dir/config/option_parser.cpp.o
#9 11.67 [ 10%] Building CXX object src/CMakeFiles/octopus.dir/main.cpp.o
#9 12.71 [ 10%] Building CXX object src/CMakeFiles/octopus.dir/config/option_collation.cpp.o
#9 17.77 [ 11%] Building CXX object src/CMakeFiles/octopus.dir/config/octopus_vcf.cpp.o
#9 18.96 [ 11%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/error.cpp.o
#9 19.28 [ 11%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/missing_file_error.cpp.o
#9 20.36 [ 12%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/malformed_file_error.cpp.o
#9 21.57 [ 12%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/missing_index_error.cpp.o
#9 22.82 [ 13%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/unwritable_file_error.cpp.o
#9 23.82 [ 13%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/unimplemented_feature_error.cpp.o
#9 24.30 [ 13%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/file_open_error.cpp.o
#9 24.86 [ 14%] Building CXX object src/CMakeFiles/octopus.dir/basics/cigar_string.cpp.o
#9 25.31 [ 14%] Building CXX object src/CMakeFiles/octopus.dir/basics/aligned_read.cpp.o
#9 25.34 [ 15%] Building CXX object src/CMakeFiles/octopus.dir/basics/ploidy_map.cpp.o
#9 26.55 [ 15%] Building CXX object src/CMakeFiles/octopus.dir/basics/pedigree.cpp.o
#9 28.96 [ 15%] Building CXX object src/CMakeFiles/octopus.dir/basics/trio.cpp.o
#9 31.06 [ 16%] Building CXX object src/CMakeFiles/octopus.dir/basics/read_pileup.cpp.o
#9 32.22 [ 16%] Building CXX object src/CMakeFiles/octopus.dir/basics/tandem_repeat.cpp.o
#9 32.98 [ 17%] Building CXX object src/CMakeFiles/octopus.dir/basics/aligned_template.cpp.o
#9 34.01 [ 17%] Building CXX object src/CMakeFiles/octopus.dir/logging/logging.cpp.o
#9 34.40 [ 17%] Building CXX object src/CMakeFiles/octopus.dir/logging/progress_meter.cpp.o
#9 34.48 [ 18%] Building CXX object src/CMakeFiles/octopus.dir/logging/error_handler.cpp.o
#9 36.98 [ 18%] Building CXX object src/CMakeFiles/octopus.dir/logging/main_logging.cpp.o
#9 40.16 [ 18%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/caching_fasta.cpp.o
#9 41.01 [ 19%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/fasta.cpp.o
#9 42.17 [ 19%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/reference_genome.cpp.o
#9 42.81 [ 20%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/threadsafe_fasta.cpp.o
#9 43.20 [ 20%] Building CXX object src/CMakeFiles/octopus.dir/io/region/region_parser.cpp.o
#9 43.34 [ 20%] Building CXX object src/CMakeFiles/octopus.dir/io/pedigree/pedigree_reader.cpp.o
#9 44.05 [ 21%] Building CXX object src/CMakeFiles/octopus.dir/io/read/htslib_sam_facade.cpp.o
#9 44.10 [ 21%] Building CXX object src/CMakeFiles/octopus.dir/io/read/read_manager.cpp.o
#9 46.93 [ 22%] Building CXX object src/CMakeFiles/octopus.dir/io/read/read_reader.cpp.o
#9 48.55 [ 22%] Building CXX object src/CMakeFiles/octopus.dir/io/read/read_writer.cpp.o
#9 48.58 [ 22%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/htslib_bcf_facade.cpp.o
#9 49.13 [ 23%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_header.cpp.o
#9 50.17 [ 23%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_parser.cpp.o
#9 51.00 [ 24%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_reader.cpp.o
#9 51.08 [ 24%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_record.cpp.o
#9 53.39 [ 24%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_type.cpp.o
#9 54.38 [ 25%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_utils.cpp.o
#9 54.72 [ 25%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/vcf_writer.cpp.o
#9 54.91 [ 26%] Building CXX object src/CMakeFiles/octopus.dir/readpipe/read_pipe.cpp.o
#9 55.35 [ 26%] Building CXX object src/CMakeFiles/octopus.dir/readpipe/buffered_read_pipe.cpp.o
#9 58.26 [ 26%] Building CXX object src/CMakeFiles/octopus.dir/readpipe/downsampling/downsampler.cpp.o
#9 59.41 [ 27%] Building CXX object src/CMakeFiles/octopus.dir/readpipe/filtering/read_filter.cpp.o
#9 60.96 [ 27%] Building CXX object src/CMakeFiles/octopus.dir/readpipe/transformers/read_transform.cpp.o
#9 64.37 [ 28%] Building CXX object src/CMakeFiles/octopus.dir/readpipe/transformers/read_transformer.cpp.o
#9 64.50 [ 28%] Building CXX object src/CMakeFiles/octopus.dir/utils/compression.cpp.o
#9 64.75 [ 28%] Building CXX object src/CMakeFiles/octopus.dir/utils/path_utils.cpp.o
#9 65.83 [ 29%] Building CXX object src/CMakeFiles/octopus.dir/utils/read_stats.cpp.o
#9 65.84 [ 29%] Building CXX object src/CMakeFiles/octopus.dir/utils/string_utils.cpp.o
#9 66.23 [ 30%] Building CXX object src/CMakeFiles/octopus.dir/utils/input_reads_profiler.cpp.o
#9 66.44 [ 30%] Building CXX object src/CMakeFiles/octopus.dir/utils/kmer_mapper.cpp.o
#9 66.85 [ 30%] Building CXX object src/CMakeFiles/octopus.dir/utils/memory_footprint.cpp.o
#9 68.38 [ 31%] Building CXX object src/CMakeFiles/octopus.dir/utils/repeat_finder.cpp.o
#9 68.59 [ 31%] Building CXX object src/CMakeFiles/octopus.dir/utils/genotype_reader.cpp.o
#9 69.13 [ 31%] Building CXX object src/CMakeFiles/octopus.dir/utils/thread_pool.cpp.o
#9 69.76 [ 32%] Building CXX object src/CMakeFiles/octopus.dir/utils/system_utils.cpp.o
#9 69.81 [ 32%] Building CXX object src/CMakeFiles/octopus.dir/utils/read_duplicates.cpp.o
#9 70.71 [ 33%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/caller_builder.cpp.o
#9 71.18 [ 33%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/caller_factory.cpp.o
#9 77.11 [ 33%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/caller.cpp.o
#9 77.22 [ 34%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/cancer_caller.cpp.o
#9 81.54 [ 34%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/individual_caller.cpp.o
#9 83.25 [ 35%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/population_caller.cpp.o
#9 97.76 [ 35%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/trio_caller.cpp.o
#9 98.56 [ 35%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/polyclone_caller.cpp.o
#9 99.81 [ 36%] Building CXX object src/CMakeFiles/octopus.dir/core/callers/cell_caller.cpp.o
#9 114.7 [ 36%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/call.cpp.o
#9 117.5 [ 37%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/germline_variant_call.cpp.o
#9 121.8 [ 37%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/reference_call.cpp.o
#9 124.4 [ 37%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/somatic_call.cpp.o
#9 128.8 [ 38%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/variant_call.cpp.o
#9 132.8 [ 38%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/denovo_call.cpp.o
#9 136.1 [ 39%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/denovo_reference_reversion_call.cpp.o
#9 139.7 [ 39%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/cell_variant_call.cpp.o
#9 143.2 [ 39%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/polyclone_variant_call.cpp.o
#9 147.1 [ 40%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/cnv_call.cpp.o
#9 150.3 [ 40%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/call_wrapper.cpp.o
#9 154.2 [ 41%] Building CXX object src/CMakeFiles/octopus.dir/core/types/calls/call_utils.cpp.o
#9 157.3 [ 41%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vcf_header_factory.cpp.o
#9 161.4 [ 41%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vcf_record_factory.cpp.o
#9 165.2 [ 42%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/facet.cpp.o
#9 171.9 [ 42%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/samples.cpp.o
#9 172.8 [ 43%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/overlapping_reads.cpp.o
#9 178.5 [ 43%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/read_assignments.cpp.o
#9 179.7 [ 43%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/reference_context.cpp.o
#9 186.2 [ 44%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/genotypes.cpp.o
#9 188.9 [ 44%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/alleles.cpp.o
#9 192.8 [ 44%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/ploidies.cpp.o
#9 196.9 [ 45%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/pedigree.cpp.o
#9 199.6 [ 45%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/repeat_context.cpp.o
#9 203.6 [ 46%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/reads_summary.cpp.o
#9 206.9 [ 46%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/facets/facet_factory.cpp.o
#9 210.6 [ 46%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/variant_call_filter.cpp.o
#9 219.1 [ 47%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/single_pass_variant_call_filter.cpp.o
#9 219.7 [ 47%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/double_pass_variant_call_filter.cpp.o
#9 226.5 [ 48%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/threshold_filter.cpp.o
#9 228.5 [ 48%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/unsupervised_clustering_filter.cpp.o
#9 230.5 [ 48%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/variant_call_filter_factory.cpp.o
#9 231.1 [ 49%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/threshold_filter_factory.cpp.o
#9 237.7 [ 49%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/unsupervised_clustering_filter_factory.cpp.o
#9 240.7 [ 50%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/passing_filter.cpp.o
#9 241.3 [ 50%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/training_filter_factory.cpp.o
#9 244.9 [ 50%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/conditional_threshold_filter.cpp.o
#9 248.5 [ 51%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/somatic_threshold_filter.cpp.o
#9 251.7 [ 51%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/denovo_threshold_filter.cpp.o
#9 252.0 [ 52%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/random_forest_filter.cpp.o
#9 256.1 [ 52%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/random_forest_filter_factory.cpp.o
#9 259.8 [ 52%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/somatic_random_forest_filter.cpp.o
#9 263.4 [ 53%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/denovo_random_forest_filter.cpp.o
#9 264.4 [ 53%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/filters/variant_filter_utils.cpp.o
#9 267.3 [ 54%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/measure.cpp.o
#9 271.1 [ 54%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/quality.cpp.o
#9 274.6 [ 54%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/depth.cpp.o
#9 275.8 [ 55%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/quality_by_depth.cpp.o
#9 277.5 [ 55%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/genotype_quality.cpp.o
#9 279.6 [ 56%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/genotype_quality_by_depth.cpp.o
#9 283.7 [ 56%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/mapping_quality_zero_count.cpp.o
#9 284.6 [ 56%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/mean_mapping_quality.cpp.o
#9 286.1 [ 57%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/model_posterior.cpp.o
#9 288.8 [ 57%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/model_posterior_by_depth.cpp.o
#9 292.6 [ 57%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/allele_depth.cpp.o
#9 293.4 [ 58%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/allele_frequency.cpp.o
#9 294.8 [ 58%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/allele_frequency_bias.cpp.o
#9 297.5 [ 59%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/strand_bias.cpp.o
#9 302.1 [ 59%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/mapping_quality_divergence.cpp.o
#9 302.4 [ 59%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/gc_content.cpp.o
#9 304.1 [ 60%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/filtered_read_fraction.cpp.o
#9 307.6 [ 60%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/clipped_read_fraction.cpp.o
#9 311.2 [ 61%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/is_denovo.cpp.o
#9 312.0 [ 61%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/is_somatic.cpp.o
#9 313.0 [ 61%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/measure_factory.cpp.o
#9 316.2 [ 62%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/ambiguous_read_fraction.cpp.o
#9 319.9 [ 62%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/median_base_quality.cpp.o
#9 320.7 [ 63%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/mismatch_count.cpp.o
#9 323.4 [ 63%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/mismatch_fraction.cpp.o
#9 326.1 [ 63%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/is_refcall.cpp.o
#9 330.4 [ 64%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/normal_contamination.cpp.o
#9 330.4 [ 64%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/denovo_contamination.cpp.o
#9 331.8 [ 65%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/read_side_bias.cpp.o
#9 335.1 [ 65%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/alt_allele_count.cpp.o
#9 341.7 [ 65%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/str_length.cpp.o
#9 341.7 [ 66%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/str_period.cpp.o
#9 342.0 [ 66%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/posterior_probability.cpp.o
#9 343.6 [ 67%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/posterior_probability_by_depth.cpp.o
#9 350.3 [ 67%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/classification_confidence.cpp.o
#9 350.5 [ 67%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/somatic_haplotype_count.cpp.o
#9 350.6 [ 68%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/median_somatic_mapping_quality.cpp.o
#9 352.6 [ 68%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/strand_disequilibrium.cpp.o
#9 358.8 [ 68%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/supplementary_fraction.cpp.o
#9 359.1 [ 69%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/misaligned_read_count.cpp.o
#9 361.5 [ 69%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/read_tail_bias.cpp.o
#9 362.1 [ 70%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/read_end_bias.cpp.o
#9 368.6 [ 70%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/variant_length.cpp.o
#9 368.9 [ 70%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/base_mismatch_count.cpp.o
#9 372.0 [ 71%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/base_mismatch_fraction.cpp.o
#9 372.9 [ 71%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/base_mismatch_quality.cpp.o
#9 378.0 [ 72%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/assigned_depth.cpp.o
#9 378.7 [ 72%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/duplicate_concordance.cpp.o
#9 380.7 [ 72%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/duplicate_allele_depth.cpp.o
#9 382.8 [ 73%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/duplicate_allele_fraction.cpp.o
#9 388.6 [ 73%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/error_rate.cpp.o
#9 388.9 [ 74%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/error_rate_stdev.cpp.o
#9 390.6 [ 74%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/is_transversion.cpp.o
#9 391.3 [ 74%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/phase_length.cpp.o
#9 398.4 [ 75%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/max_read_length.cpp.o
#9 399.0 [ 75%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/mean_likelihood.cpp.o
#9 399.2 [ 76%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/allele_mapping_quality.cpp.o
#9 400.3 [ 76%] Building CXX object src/CMakeFiles/octopus.dir/core/csr/measures/phylogeny_posterior.cpp.o
#9 407.2 [ 76%] Building CXX object src/CMakeFiles/octopus.dir/core/models/haplotype_likelihood_array.cpp.o
#9 408.5 [ 77%] Building CXX object src/CMakeFiles/octopus.dir/core/models/haplotype_likelihood_model.cpp.o
#9 409.1 [ 77%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/subclone_model.cpp.o
#9 409.1 [ 78%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/constant_mixture_genotype_likelihood_model.cpp.o
#9 417.2 [ 78%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/individual_model.cpp.o
#9 417.3 [ 78%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/independent_population_model.cpp.o
#9 417.9 [ 79%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/population_model.cpp.o
#9 422.2 [ 79%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/trio_model.cpp.o
#9 425.7 [ 80%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/cancer_genotype_prior_model.cpp.o
#9 426.1 [ 80%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/coalescent_population_prior_model.cpp.o
#9 427.5 [ 80%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/hardy_weinberg_model.cpp.o
#9 427.9 [ 81%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/variable_mixture_genotype_likelihood_model.cpp.o
#9 430.1 [ 81%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/variational_bayes_mixture_mixture_model.cpp.o
#9 430.4 [ 81%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/single_cell_prior_model.cpp.o
#9 432.3 [ 82%] Building CXX object src/CMakeFiles/octopus.dir/core/models/genotype/single_cell_model.cpp.o
#9 434.6 [ 82%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/indel_error_model.cpp.o
#9 435.8 [ 83%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/repeat_based_indel_error_model.cpp.o
#9 436.0 [ 83%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/repeat_based_snv_error_model.cpp.o
#9 437.6 [ 83%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/snv_error_model.cpp.o
#9 437.8 [ 84%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/error_model_factory.cpp.o
#9 437.9 [ 84%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/basic_repeat_based_indel_error_model.cpp.o
#9 439.2 [ 85%] Building CXX object src/CMakeFiles/octopus.dir/core/models/error/custom_repeat_based_indel_error_model.cpp.o
#9 439.3 [ 85%] Building CXX object src/CMakeFiles/octopus.dir/core/models/mutation/somatic_mutation_model.cpp.o
#9 441.1 [ 85%] Building CXX object src/CMakeFiles/octopus.dir/core/models/mutation/coalescent_model.cpp.o
#9 441.3 [ 86%] Building CXX object src/CMakeFiles/octopus.dir/core/models/mutation/denovo_model.cpp.o
#9 443.2 [ 86%] Building CXX object src/CMakeFiles/octopus.dir/core/models/mutation/indel_mutation_model.cpp.o
#9 445.1 [ 87%] Building CXX object src/CMakeFiles/octopus.dir/core/models/reference/individual_reference_likelihood_model.cpp.o
#9 445.1 [ 87%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/haplotype_filter.cpp.o
#9 446.6 [ 87%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/read_assigner.cpp.o
#9 447.1 [ 88%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/read_realigner.cpp.o
#9 454.6 [ 88%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/bam_realigner.cpp.o
#9 456.1 [ 89%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/indel_profiler.cpp.o
#9 457.3 [ 89%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/bad_region_detector.cpp.o
#9 466.0 [ 89%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/hapgen/genome_walker.cpp.o
#9 466.6 [ 90%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/hapgen/haplotype_generator.cpp.o
#9 469.4 [ 90%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/hapgen/haplotype_tree.cpp.o
#9 472.7 [ 91%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/phaser/phaser.cpp.o
#9 476.9 [ 91%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/cigar_scanner.cpp.o
#9 478.3 [ 91%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/downloader.cpp.o
#9 485.3 [ 92%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/local_reassembler.cpp.o
#9 485.5 [ 92%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/randomiser.cpp.o
#9 486.7 [ 93%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/variant_generator.cpp.o
#9 492.9 [ 93%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/vcf_extractor.cpp.o
#9 495.0 [ 93%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/variant_generator_builder.cpp.o
#9 497.8 [ 94%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/active_region_generator.cpp.o
#9 501.1 [ 94%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/repeat_scanner.cpp.o
#9 503.7 [ 94%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/utils/assembler.cpp.o
#9 504.8 [ 95%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/utils/global_aligner.cpp.o
#9 505.5 [ 95%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/utils/assembler_active_region_generator.cpp.o
#9 509.6 [ 96%] Building CXX object src/CMakeFiles/octopus.dir/core/tools/vargen/utils/misaligned_reads_detector.cpp.o
#9 513.0 [ 96%] Building CXX object src/CMakeFiles/octopus.dir/core/types/allele.cpp.o
#9 514.5 [ 96%] Building CXX object src/CMakeFiles/octopus.dir/core/types/cancer_genotype.cpp.o
#9 514.8 [ 97%] Building CXX object src/CMakeFiles/octopus.dir/core/types/genotype.cpp.o
#9 515.7 [ 97%] Building CXX object src/CMakeFiles/octopus.dir/core/types/haplotype.cpp.o
#9 517.0 [ 98%] Building CXX object src/CMakeFiles/octopus.dir/core/types/variant.cpp.o
#9 517.9 [ 98%] Building CXX object src/CMakeFiles/octopus.dir/core/types/phylogeny.cpp.o
#9 518.1 [ 98%] Building CXX object src/CMakeFiles/octopus.dir/core/calling_components.cpp.o
#9 518.5 [ 99%] Building CXX object src/CMakeFiles/octopus.dir/core/octopus.cpp.o
#9 519.2 [ 99%] Building CXX object src/CMakeFiles/octopus.dir/timers.cpp.o
#9 558.0 [100%] Linking CXX executable octopus
#9 935.5 [100%] Built target octopus
#9 935.5 Install the project...
#9 935.6 -- Install configuration: "Release"
#9 935.6 -- Installing: /opt/octopus/bin/octopus
#9 935.6 No bin directory found, making one
#9 935.6 Installing Octopus 0.7.4 (develop 072a0c80)
#9 DONE 935.7s

Everything looked good after the build.
For use with snakemake I converted the docker container to a .sif

apptainer build octopus_avx512.sif docker-daemon://library/octopus_avx512:latest

open the .sif and check octopus --version

apptainer exec octopus_avx512.sif octopus --version
octopus version 0.7.4 (develop 072a0c80)
Target: x86_64 Linux 5.15.0-113-generic
SIMD extension: AVX512
Compiler: GNU 9.4.0
Boost: 1_71

Error

I run with the following command inside the apptainer

        debug_log=$(readlink -m logs/octopus_paired/347782.log)
        trace_log=$(readlink -m logs/octopus_paired/347782_trace.log)

        octopus \
        --threads 1 \
        --max-reference-cache-memory 2000MB \
         --target-read-buffer-memory 100MB \
         --reference ../resources/hg38/v0/Homo_sapiens_assembly38.fasta \
         --debug ${debug_log} \
         --trace ${trace_log} \
         --regions-file ../resources/bed/NPHD2022A_3383431_Covered_adaptedChrom_padded_2_hg38.bed \
        --bad-region-tolerance LOW \
        -I /mnt/local_data/csf-cfdna-snakemake/workflow/results/L1311/DEDUPED_BAM/347782.aligned.dedup.bam \
        -I /mnt/local_data/csf-cfdna-snakemake/workflow/results/L1311/DEDUPED_BAM/347779.aligned.dedup.bam \
        --normal-sample 347779 \
        --allow-octopus-duplicates \
         --disable-downsampling \
         --min-candidate-credible-vaf-probability 0.5 \
         --min-somatic-posterior 1.0 \
         --min-expected-somatic-frequency 0.001 \
         --min-credible-somatic-frequency 0.001 \
         --min-supporting-reads 2 \
         --normal-contamination-risk LOW \
         --output /mnt/local_data/csf-cfdna-snakemake/workflow/results/L1311/ASSESSMENT_2/paired/347782/variant_callers/octopus/Octopus.vcf \
         --sequence-error-model PCR.NOVASEQ \
         --bamout /mnt/local_data/csf-cfdna-snakemake/workflow/results/L1311/ASSESSMENT_2/paired/347782/variant_callers/octopus/347782_realigned_bam \
         --somatic-forest /opt/octopus/resources/forests/somatic.v0.7.4.forest \
         --keep-unfiltered-calls \
         -w ${PWD} \
         --max-haplotypes 100 \
         --somatics-only \
         --target-working-memory 1000MB \
         --temp-directory-prefix tmp_347782 \
         2>${debug_log}

It runs for several hours and then exits with an error

[2024-07-15 12:55:12] <EROR> A program error has occurred:
[2024-07-15 12:55:12] <EROR> 
[2024-07-15 12:55:12] <EROR>     Encountered an exception during calling 'Phred: negative error
[2024-07-15 12:55:12] <EROR>     probability -0.000000'. This means there is a bug and your results
[2024-07-15 12:55:12] <EROR>     are untrustworthy.
[2024-07-15 12:55:12] <EROR> 
[2024-07-15 12:55:12] <EROR> To help resolve this error run in debug mode and send the log file to
[2024-07-15 12:55:12] <EROR> https://github.com/luntergroup/octopus/issues.
[2024-07-15 12:55:12] <INFO> ------------------------------------------------------------------------

I have tried this with 4 tumour-normal pairs and have the same issue. All four of these pairs run fine with the default docker image and binaries for AVX2.

Here you can find the apptainer image and the four 'debug' and four 'trace' log files. I am unable to share the bam files due to ethical constraints. I tried to test on the GIAB dataset, however the link from the Supplementary Note of the Octopus paper is no longer current and its not obvious to me how to find this data.

@harrymatthews50 harrymatthews50 changed the title After recompiling from source, files now give 'Phred: negative error [2024-07-15 12:55:12] <EROR> probability -0.000000' After recompiling from source get error 'Phred: negative error [2024-07-15 12:55:12] <EROR> probability -0.000000' Jul 16, 2024
@jelber2
Copy link

jelber2 commented Jul 22, 2024

Unfortunately, I seem to be the only (just a user of octopus) person responding to some Issues here. Octopus is for the most part abandon-ware and is to my knowledge (unless in a private repo) not being maintained. It is certainly a cool tool, and perhaps someone with more C++ compiling experience could chime in on your AVX-512 issue. Hoping for the best, Jean P. Elbers.

@jelber2
Copy link

jelber2 commented Jul 22, 2024

To reduce the run times, in my experience it is best to split the calling by number of parts that you desire (even by human chromosome is too slow). Running the following snakemake script on a big node with about 128-256 cores, would be a huge speed increase -

Try running octopus with SUP_herro specific error model and random forest filtering

/msc/home/jelber43/clair3-training/hg002-new/hg002-4.smk

scattergather:
    split=config["splits"]

# final target rule to produce all sub targets
rule all:
    input: "octopus/octopus.ann.vcf"

# step 4: map reads to reference genome
rule reference_fasta:
    input: config["reference_fasta"]
    output: 
        reference = "auxData/reference-fasta.fa",
        index = "auxData/reference-fasta.fa.fai"
    shell:
        '''
        # see config.yaml on path to reference
        seqtk seq -UC {input} > {output.reference}
        samtools faidx {output.reference}
        '''

# step 6 call SNPs
rule make_test_bed:
    input:
        index = "auxData/reference-fasta.fa.fai"
    output:
        "octopus/test.bed"
    log:
        "octopus/test.bed.err"
    shell:
        '''
        cut -f 1-2 {input} |head -n 22| bedtools makewindows -g - -w 6000000 |grep -P "chr1\t" > {output}
        '''

rule make_bed:
    input: "octopus/test.bed"
    output: scatter.split("octopus/{scatteritem}.bed")
    run:
        shell("touch {output}")
        for i in range(1, config["splits"] + 1):
            shell("sed -n {i}p {input} > octopus/{i}-of-{config[splits]}.bed")

rule octopus:
    input:
        bed = "octopus/{scatteritem}.bed",
        bam = config["bam"],
        reference = "auxData/reference-fasta.fa",
        model = config["sequence_error_model"]
    output:
        vcf1 = "octopus/{scatteritem}.vcf.gz",
        index = "octopus/{scatteritem}.vcf.gz.tbi"
    log:
        err = "octopus/{scatteritem}.log"
    params:
        threads = config["threads"]
    shell:
        '''
        octopus --max-read-length=500 --split-long-reads --read-linkage=LINKED --variant-discovery-mode=PACBIO --force-pileup-candidates --max-assembly-region-size=1000 --max-assembly-region-overlap=500 --max-indel-errors=16 --sequence-error-model {input.model} --max-haplotypes=400 --min-protected-haplotype-posterior=1e-5 --disable-call-filtering --regions-file {input.bed} --temp-directory-prefix octopus/{wildcards.scatteritem}.temp --reference {input.reference} --reads {input.bam} --threads {params.threads} -o {output.vcf1} > {log.err} 2>&1
        tabix -f -p vcf {output.vcf1}
        '''

rule octopus2:
    input:
        bed = "octopus/{scatteritem}.bed",
        bam = config["bam"],
        reference = "auxData/reference-fasta.fa",
        vcf = "octopus/{scatteritem}.vcf.gz",
        index = "octopus/{scatteritem}.vcf.gz.tbi",
        model = config["sequence_error_model"]
    output:
        vcf1 = "octopus/{scatteritem}.ann.vcf.gz",
        index = "octopus/{scatteritem}.ann.vcf.gz.tbi"
    log:
        err = "octopus/{scatteritem}.log2"
    params:
        threads = config["threads"]
    shell:
        '''
        octopus --filter-vcf {input.vcf} --max-read-length=500 --split-long-reads --read-linkage=LINKED --variant-discovery-mode=PACBIO --force-pileup-candidates --max-assembly-region-size=1000 --max-assembly-region-overlap=500 --max-indel-errors=16 --sequence-error-model {input.model} --max-haplotypes=400 --min-protected-haplotype-posterior=1e-5 --forest /mnt/home/jelber43/clair3-training/forest2/octopus.forest --regions-file {input.bed} --temp-directory-prefix octopus/{wildcards.scatteritem}.temp3 --reference {input.reference} --reads {input.bam} --threads {params.threads} -o {output.vcf1} > {log.err} 2>&1
        tabix -f -p vcf {output.vcf1}
        '''

rule gather_vcf:
    input:
        vcfs = gather.split("octopus/{scatteritem}.vcf.gz"),
        index = "auxData/reference-fasta.fa.fai"
    output: "octopus/octopus.vcf"
    params:
        threads = config["threads"]
    shell:
        '''
        bcftools concat --threads {params.threads} -a -d all {input.vcfs} |bedtools sort  -header -faidx {input.index} > {output}
        '''

rule gather_vcf2:
    input:
        vcfs = gather.split("octopus/{scatteritem}.ann.vcf.gz"),
        index = "auxData/reference-fasta.fa.fai",
        vcf = "octopus/octopus.vcf"
    output: "octopus/octopus.ann.vcf"
    params:
        threads = config["threads"]
    shell:
        '''
        bcftools concat --threads {params.threads} -a -d all {input.vcfs} |bedtools sort  -header -faidx {input.index} > {output}
        '''

Setup snakemake run configuration

SAMPLE=hg002-new
cd /mnt/home/jelber43/clair3-training/${SAMPLE}
kangaroo=`cut -f 1-2 /mnt/home/jelber43/clair3-training/Homo_sapiens_GRCh38_no_alt.fa.fai |head -n 22| bedtools makewindows -g - -w 6000000 |grep -P "chr1\t" |wc -l`
echo -e "\nthere are $kangaroo regions for $SAMPLE\n"
jobs=10
threads=25
echo "SAMPLE: ${SAMPLE}" > ${SAMPLE}.yaml
echo "threads: $threads" >> ${SAMPLE}.yaml
echo "splits: $kangaroo" >> ${SAMPLE}.yaml
echo "reference_fasta: /mnt/home/jelber43/clair3-training/Homo_sapiens_GRCh38_no_alt.fa" >> ${SAMPLE}.yaml
echo "sequence_error_model: /mnt/home/jelber43/clair3-training/hg002/herro/PCR-free.Nanopore_SUP_herro.octopus.error_model2.csv" >> ${SAMPLE}.yaml
echo "bam: herro/hg002.herro.Q30.sam1.3.SoftClip.bam" >> ${SAMPLE}.yaml

echo && cat ${SAMPLE}.yaml && echo

snakemake --configfile ${SAMPLE}.yaml -j ${jobs} --snakefile hg002-4.smk --printshellcmds --local-cores ${jobs} --cores ${jobs} all

The above was for running Nanopore SUP reads, error-corrected with Herro with a fake Quality value of 30 with a custom error-model, but you could easily adjust the core part of the snakefile to just run octopus on Illumina reads for example (there is no forest file to use, though unless you were lucky to grab it before it "went away"). EDIT Oh, I see you got the somatic forest file to use. Great! In your case on your computer, you would probably just use 8 jobs with 3 threads or something like that and keep the number bed size the same in the main snakefile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants