Name		Name	Last commit message	Last commit date
parent directory ..
config-call-founders		config-call-founders
config-call-subset20		config-call-subset20
config-index		config-index
README.md		README.md
Snakefile		Snakefile
accession-list-de-novo.txt		accession-list-de-novo.txt
accession-list.txt		accession-list.txt
config-common-call-founders.yaml		config-common-call-founders.yaml
config-common-call-subset20.yaml		config-common-call-subset20.yaml
config-common-index.yaml		config-common-index.yaml
convert-alignment-positions-primary.sh		convert-alignment-positions-primary.sh
convert-alignment-positions.sh		convert-alignment-positions.sh
convert_alignment_positions.py		convert_alignment_positions.py
founder-de-novo-experiment-cmds.sh		founder-de-novo-experiment-cmds.sh
founder-experiment-cmds.sh		founder-experiment-cmds.sh
list-alignment-positions-primary-only.sh		list-alignment-positions-primary-only.sh
list-alignment-positions.sh		list-alignment-positions.sh
make_config_files.py		make_config_files.py
predicted-sequences.sh		predicted-sequences.sh
quast.sh		quast.sh
read_lengths.py		read_lengths.py
run-samtools-stats.sh		run-samtools-stats.sh
store-aln-gap-positions.sh		store-aln-gap-positions.sh
store-ref-gap-positions.sh		store-ref-gap-positions.sh
store_gap_positions.py		store_gap_positions.py
subset20-de-novo-experiment-cmds.sh		subset20-de-novo-experiment-cmds.sh
subset20-experiment-cmds.sh		subset20-experiment-cmds.sh

README.md

E.coli experiment with natural reads and known variants

This directory contains the scripts needed to run the E.coli experiments with natural reads.

Running the experiment

Download the indexing input with e.g. wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-index-input.tar.bz2. Extract the contents to the e-coli directory with e.g. pbzip2 -dc e-coli-index-input.tar.bz2 | tar x.
Download the reads used in the experiment with e.g. wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-reads.tar. Extract the contents with tar x e-coli-reads.tar.
Generate an index using the founder sequences with snakemake --configfile config-common-index.yaml config-index/e-coli-msd5.yaml --snakefile ../panvc-sample-workflow/Snakefile.index --cores 16 --printshellcmds --use-conda --conda-prefix ../conda-env --resources mem_mb=16384.
Generate an index using the random selection of reference sequences with snakemake --configfile config-common-index.yaml config-index/e-coli-subset20.yaml --snakefile ../panvc-sample-workflow/Snakefile.index --cores 16 --printshellcmds --use-conda --conda-prefix ../conda-env --resources mem_mb=16384.
The experiments may be run with e.g. ./founder-experiment-cmds.sh | parallel -jjobs where jobs is the number of Snakemake instances. The script allocates 16 GB of memory and 5 CPU cores to each instance of Snakemake. Alternatively the commands may be piped to e.g. bash: ./founder-experiment-cmds.sh | bash -x -e.
The experiments with the random selection of reference sequences may be run similarly with either ./subset20-experiment-cmds.sh | parallel -jjobs or ./subset20-experiment-cmds.sh | bash -x -e.
To gather statistics with Samtools, run first mkdir samtools-stats and then ./run-samtools-stats.sh. The results will be placed in the aforementioned directory.

Running the experiment with the comparison to de novo sequenced contigs

Download the reads used in the experiment with e.g. wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-reads-de-novo-experiment.tar. Extract the contents to the e-coli directory with tar x e-coli-reads-de-novo-experiment.tar.
Download the contigs used in the experiment with e.g. wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e-coli-de-novo-contigs.tar. Extract the contents with tar x e-coli-de-novo-contigs.tar.
Download the E.coli K-12 reference without line breaks with e.g. wget https://cs.helsinki.fi/group/gsa/panvc-founders/natural-e-coli-experiment/e.coli-k12-mg1655.no-linebreaks.fa.gz. Extract with gunzip e.coli-k12-mg1655.no-linebreaks.fa.gz.
The indices generated in the previous experiment may be used.
Run the following commands where jobs is the number of Snakemake instances.
- ./founder-de-novo-experiment-cmds.sh | parallel -jjobs
- ./subset20-de-novo-experiment-cmds.sh | parallel -jjobs
Generate the predicted sequences with snakemake --printshellcmds --use-conda --conda-prefix ../conda-env. Here Snakemake is only used to activate the correct Conda environment; the snakefile essentially runs predicted-sequences.sh.
Run QUAST with mkdir -p quast && ./quast.sh. The results will be placed in quast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e-coli

e-coli

README.md

E.coli experiment with natural reads and known variants

Running the experiment

Running the experiment with the comparison to de novo sequenced contigs

Files

e-coli

Directory actions

More options

Directory actions

More options

Latest commit

History

e-coli

Folders and files

parent directory

README.md

E.coli experiment with natural reads and known variants

Running the experiment

Running the experiment with the comparison to de novo sequenced contigs