Skip to content

Commit

Permalink
loading samtools earlier
Browse files Browse the repository at this point in the history
  • Loading branch information
k8hertweck authored Oct 31, 2019
1 parent 2bb01ca commit d90e540
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions lectures/lecture09/scripting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,14 @@ Now you should be able to run your script with `./script1.sh`.

Next we'll be playing around scripting with samtools using [this bam file](https://console.cloud.google.com/storage/browser/_details/gatk-test-data/wgs_bam/NA12878_20k_b37/NA12878.bam) that's available to you in the GitHub repository.

It has an awkward name, so let's rename it `input.bam` with
[samtools](https://www.htslib.org/doc/samtools.html) is a collection of tools for manipulating data in
Sequence Alignment/Map (SAM) format.
This software is installed on rhino,
but needs to be made available for use using `module load`, or `ml` for short:

ml samtools

Our data has an awkward name, so let's rename it `input.bam` with

mv wgs_bam_NA12878_20k_b37_NA12878.bam input.bam

Expand Down Expand Up @@ -128,10 +135,9 @@ Of course, shell does have loops, but next I'll present an alternative, which is

Here we will provide an introduction to a command that provides a convenient and efficient alternative to loops: [GNU Parallel](https://www.gnu.org/software/parallel/).

First, let's load the software we'll need for the rest of this exercise. These libraries are installed on rhino,
but need to be made available for use using `module load`, or `ml` for short:
First, let's load the software we'll need for parallel processing:

ml samtools parallel
ml parallel

To get our data set up, let's split our input BAM file into a set of smaller files.

Expand Down

0 comments on commit d90e540

Please sign in to comment.