This directory contains the scripts needed to run the E.coli experiments with natural reads.
- Download the indexing input with e.g.
wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-index-input.tar.bz2
. Extract the contents to thee-coli
directory with e.g.pbzip2 -dc e-coli-index-input.tar.bz2 | tar x
. - Download the reads used in the experiment with e.g.
wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-reads.tar
. Extract the contents withtar x e-coli-reads.tar
. - Generate an index using the founder sequences with
snakemake --configfile config-common-index.yaml config-index/e-coli-msd5.yaml --snakefile ../panvc-sample-workflow/Snakefile.index --cores 16 --printshellcmds --use-conda --conda-prefix ../conda-env --resources mem_mb=16384
. - Generate an index using the random selection of reference sequences with
snakemake --configfile config-common-index.yaml config-index/e-coli-subset20.yaml --snakefile ../panvc-sample-workflow/Snakefile.index --cores 16 --printshellcmds --use-conda --conda-prefix ../conda-env --resources mem_mb=16384
. - The experiments may be run with e.g.
./founder-experiment-cmds.sh | parallel -jjobs
where jobs is the number of Snakemake instances. The script allocates 16 GB of memory and 5 CPU cores to each instance of Snakemake. Alternatively the commands may be piped to e.g. bash:./founder-experiment-cmds.sh | bash -x -e
. - The experiments with the random selection of reference sequences may be run similarly with either
./subset20-experiment-cmds.sh | parallel -jjobs
or./subset20-experiment-cmds.sh | bash -x -e
. - To gather statistics with Samtools, run first
mkdir samtools-stats
and then./run-samtools-stats.sh
. The results will be placed in the aforementioned directory.
- Download the reads used in the experiment with e.g.
wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-reads-de-novo-experiment.tar
. Extract the contents to thee-coli
directory withtar x e-coli-reads-de-novo-experiment.tar
. - Download the contigs used in the experiment with e.g.
wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-de-novo-contigs.tar
. Extract the contents withtar x e-coli-de-novo-contigs.tar
. - Download the E.coli K-12 reference without line breaks with e.g.
wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e.coli-k12-mg1655.no-linebreaks.fa.gz
. Extract withgunzip e.coli-k12-mg1655.no-linebreaks.fa.gz
. - The indices generated in the previous experiment may be used.
- Run the following commands where jobs is the number of Snakemake instances.
./founder-de-novo-experiment-cmds.sh | parallel -jjobs
./subset20-de-novo-experiment-cmds.sh | parallel -jjobs
- Generate the predicted sequences with
snakemake --printshellcmds --use-conda --conda-prefix ../conda-env
. Here Snakemake is only used to activate the correct Conda environment; the snakefile essentially runspredicted-sequences.sh
. - Run QUAST with
mkdir -p quast && ./quast.sh
. The results will be placed inquast
.