Skip to content

Commit

Permalink
add themisto indexing into general pipeline instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
tmaklin committed Jan 27, 2020
1 parent 18a61d4 commit 258d127
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,9 @@ You should see that roughly 90% of the reads are assigned to group "clust2".
# General pipeline
## Preprocessing
- Obtain a set of reference sequences.
- Index the reference sequences for pseudoalignment with:
- Index the reference sequences for pseudoalignment with Themisto:
> build_index --k 31 --input-file reference_sequences.fasta --auto-colors --index-dir themisto_index --temp-dir tmp
- ... or with kallisto
> kallisto pseudo -i reference_kmi reference_sequences.fasta
- Define some grouping for the reference and save the grouping in a text file where each line contains the identifier of the grouping the corresponding reference sequence belongs to. For example with four sequences and two groups:
```
Expand All @@ -115,7 +117,8 @@ cluster2
cluster1
```
The grouping identifiers must be in the same order as their
corresponding sequences appear in the reference file.
corresponding sequences appear in the reference file. Alternatively,
you can use the 'matchfasta' utility to reorder the indicators.

### Reordering identifiers
If your grouping identifiers are not in the same order as in the fasta
Expand Down Expand Up @@ -179,6 +182,10 @@ estimates by bootstrapping the pseudoalignment counts and rerunning
the abundance estimation a number of times. This can be done
automatically by adding the '--iters' option to running mSWEEP:
```
> mSWEEP --themisto-1 alignment_1.txt --themisto-2 alignment_2.txt -i cluster_indicators.txt --iters 100 -o abundances.txt
```
or with kallisto
```
> mSWEEP -f kallisto_out_folder -i cluster_indicators.txt --iters 100 -o abundances.txt
```
The bootstrapped abundance estimates will be appended to the output
Expand Down

0 comments on commit 258d127

Please sign in to comment.