Home

Table of Contents Introduction Installation Installation instructions Setup Setup for individual components Pipeline setup instructions Configuring config.json webserver setup genome directory Running Individual components Running the pipeline Manually generating the json file Webform

Introduction

aLib is a sets of software tools to do basic analysis of Illumina sequencers. The different components can be used in conjuction or independently. We provide instructions for whether users wish to use aLib as a whole or just sub-components.

Installation

Installation instructions

First, make sure you are running a Linux computer with the following:

C++ compiler
Python interpreter
R

Also, make sure you have installed the following dependencies:

fastqc (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
freeIbis (optional) (http://github.com/grenaud/freeIbis)

Also, make sure you have pulled the submodules

biohazard (http://github.com/udo-stenzel/biohazard/)
network-aware-bwa (http://github.com/udo-stenzel/network-aware-bwa/)
libgab (https://github.com/grenaud/libgab)
bamtools (https://github.com/pezmaster31/bamtools)

Once this is done, do the following:

Compile bamtools (https://github.com/pezmaster31/bamtools)
In the main directory, just type make.

Setup

The first step, is to configure the config.json file. This has to be done once. Once this is done, you can run aLib on a given sequencing run. Whether you want to use the individual components or use them in conjunction, the basic configuration is stored in the config.json file. For the use of individual components, the default config.json can probably just be used as is.

Setup for individual components

If you have successfully typed "make" (and maybe configured the config.json file if you need to change some values ex: sequence for barcodes for the demultiplexer), the various components should be ready to use.

Pipeline setup instructions

The workflow can be described as follows:

The read directory from where your sequencer(s) will write their sequencing data (basecalls and intensities in /Data/Intensities/)
The write directory is where aLib will produce the usable data

Make sure that the sequencing runs are written format in this format:

 YYMMDD_SEQUENCERID_RUN-NUMBER_COMMENTS

Configuring config.json

The main configuration file is config.json.

Field	Meaning
alibdir	The base directory where aLib is installed.
fastqcdir	Directory containing fastqc
illuminareaddir	The directory where the sequencer writes the sequencing data (basecalls and intensities)
illuminawritedir	This is the directory where aLib will write the processed data
sequencers	Enter the id and type of the sequencer for your sequencing center
runstodisplay	The number of runs to display
emailAddrToSend	Email of the administrator
genomedirectory	Directory that contains the BWA genomic databases. (see details about setup).
tempdirectory	Directory used by aLib to write temp files
freeibispath	Path to freeIbis
controlindex	7 bp index for reads used a phiX control spike-in
phixref	Path to the phiX reference
chimeras	For various protocols, define the name of the protocol, the sequence of the adapters and putative chimeric sequences
Indices	Define as the high level the indexing scheme and the id to sequence data for the indices used by the demultiplexer

webserver setup

Create a directory that is web accessible and copy the contents of webForm/ in there. Let the URL defined by this directory as http://internal.webserver.com/aLib/

genome directory

On the server where aLib is running, there should be an access to BWA genomes indices. Each BWA index should be in a directory of its own indicating the name of the build:

 hg19/

and the index should be bwa-0.4.9 as such:

 hg19/bwa-0.4.9.amb
 hg19/bwa-0.4.9.ann
 ...

Also, the directory should contain a BWA for the index used for the control genome (PhiX, not crucial but nice to have). This directory should be named :

  phiX/

Have within it the directory control/:

 phiX/control/

and have the following files for the fasta genome and BWA index:

 phiX/control/whole_genome.fa
 phiX/control/bwa-0.4.9.{amb,ann,bwt,pac,rbwt,rpac,rsa,sa}

Running

Individual components

Here is a partial list of the different components:

bam2fastq/bam2fastq	Format converter from bam to fastq
BCL2BAM/bcl2bam	Format converter from BCL to bam
fastq2bam/fastSingle2bam	Converts single reads into bam
fastq2bam/fastq2bam	Converts paired reads into bam
pipeline/generate_report	Reads the RTA report and saves it as an HTML document for archiving purposes
pipeline/filterReads	Flags reads with high expectancy of mismatches
pipeline/assignRG	Demultiplexes reads (assigns to read groups) and computes likelihood of belonging to these read groups.
pipeline/errorRatePerCycle	Computes the sequencing error rate and type of error on a per cycle basis using an aligned bam file.
tileCount/tileCount.py	Counts # of clusters in a BAM file and a Illumina cluster coordinate file
qualScoreC++/qualScoresObsVsPred	Reads an aligned bam file and computes obseved vs predicted quality scores.
biohazard/dist/build/bam-rmdup/bam-rmdup	Removes duplicates and calls consensus using those.

Running the pipeline

Once the installation and setup completed, you can run aLib as a pipeline. aLib uses GNU make to resolve dependencies. To build the makefile, you need a json file detailing the different parameters then run json2make.py. There are two ways to generate the json file : manually and use the web form.

Manually generating the json file

To generate Makefiles, you can manually generate a json configuration file. There is an example (webForm/exampleRun.json) of a configuration json file distributed along aLib. The program webForm/json2make.py can generate the makefiles from the json configuration file. The following is a description of the fields:

parambwa	Enter either "default" for default parameters or "ancient" for mapping ancient DNA
genomebwa	Enter the name of the genome (directory name) stored in your genomedirectory (see https://github.com/grenaud/aLib/wiki#configuring-configjson)
usebwa	true for use of mapping, false otherwise
indicesraw	This contains a json array of two three fields "name", "p7" and possibly "p5" for double indices. This stored the name of the read group ("name") with the correspondence with the numerical value for the indices ("p7" and "p5")
indicesseq	Like "indicesraw", but stores the actual sequences. To generate a json file with that field created automatically, given that you have configured config.json, use webForm/jsonIndices.py.
spikedin	true if control sequences were spiked in, false otherwise
bustard	true if we will use the default Bustard basecalls, false otherwise
freeibis	true if we will use the default freeIbis basecalls, false otherwise
lanes	json array of lanes to process
sequencer	Type of sequencer either "ga", "hiseq" or "miseq"
email	Email of the person
TileCount
SwathCount
runid
expname
cyclesread1
cyclesread2
cyclesindx1
cyclesindx2
LaneCount
SurfaceCount
ctrlindex
lanesdedicated
adapter1
adapter2
chimeras
protocol
mergeoverlap
key1
key2
filterseqexp
seqNormExpcutoff
filterentropy
filterfrequency
entropycutoff
frequencycutoff

Webform

 http://internal.webserver.com/aLib/form.php. Ask the user to select their run and click launch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly