Extract regions from 16s gene sequences

This tool extract variable regions from the 16S rRNA gene.

Summary

Installation

git clone https://github.com/AlessioMilanese/extract_regions_16s.git
cd extract_regions_16s

Note: in the following examples we assume that the python script extract_regions is in the system path.

Pre-requisites:

python >= 3.6
cmalign (Infernal)

Simple example

The expected input is a fasta file with one (or more) 16S sequences, example:

cat my_16S_seq.fasta
>seq1
CAGTATTAGCGGGGATCATCGATCGATTACGATCGAGCTAGC....
>seq2
CAGTATGATCGCGGATCATCGATCGATTACGATCCAGCTAGG....

Running:

extract_regions -i my_16S_seq.fasta

results is:

>seq1__V1
ACACAGCAUGCAUAACACGAGCUAUCGACGACUACGACGGCA
>seq1__V2
CAUCAGUACCGAUCAUCGGAAUCAGCGAGGCAGGCGAAGGCGAGAGAGCAUAC
...
>seq1__V8
ACUACGUGCACGAGUACGUAUACGGACAUCGAUUUACGAGCAGCGA
>seq1__V9
ACAGAUCGGAUUCGAUCGGCAUCGGACCCAUCAGGAGGAUCGUCAAUCAUG

Mode 1: extract variable regions

If you don't specify any primer (with -f and -r), then all the variable regions will be extracted. We extract the variable regions defined in Yarza et al. Nat. Rev. Microbiol. (2014):

Variable region	Start	End
V1	69	99
V2	137	242
V3	433	497
V4	576	682
V5	822	879
V6	986	1043
V7	1117	1173
V8	1243	1294
V9	1435	1465

These positions refers to the 16S rRNA gene of Escherichia coli str. K-12. The output is a fasta file with 9 times more sequences (one for each variable region) than the input fasta file.

The regions are indicated with __V[1-9] at the end of the fasta header (see also Simple example).

Mode 2: extract from primers

You can specify which primers to use with -f and -r.

For Primer	Start	End	Relative position	Rev Primer	Start	End	Relative position
8F	8	27	before V1	338R	338	355	after V2
27F	8	27	before V1	519R	519	536	after V3
68F	49	68	before V1	785R	785	805	after V4
341F	341	357	before V3	806R	787	806	after V4
515F	515	533	before V4	907R	907	926	after V5
967F	967	985	before V6	926R	907	926	after V5
1237F	1220	1237	before V8	1100R	1100	1115	after V6
				1391R	1391	1407	after V8
				1492R	1492	1510	after V9

If you run on the fasta file defined before (my_16S_seq.fasta with two sequences) using extract_regions -i my_16S_seq.fasta -f 8F -r 338R you will obtain:

>seq1
AGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGAAGAAGCUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGUACCUUCGGGCCUCUUGCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGGCUCACCUAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGA
>seq2
CGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGAAGAAGCUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGUACCUUCGGGCCUCUUGCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGGCUCACCUAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGA

Where the header is the same as the original fasta file.

Mode 3: extract based on positions

You can specify the exact position that you want to extract with -f and -r.

For example, if you want to extract all the sequences of primer 8F, you can run extract_regions -i my_16S_seq.fasta -f 8 -r 27, which will result in:

>seq1
AGAGUUUGAUCAUGGCUCAG
>seq2
CGAGUUUGAUCAUGGCUUCAG

Note that the positions are based on the E.coli K12 16S gene.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
DATA		DATA
bin		bin
.gitignore		.gitignore
README.md		README.md
extract_regions		extract_regions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract regions from 16s gene sequences

Summary

Installation

Simple example

Mode 1: extract variable regions

Mode 2: extract from primers

Mode 3: extract based on positions

Notes and advance usage

Details of implementation

Options list

About

Releases

Packages

Languages

AlessioMilanese/extract_regions_16s

Folders and files

Latest commit

History

Repository files navigation

Extract regions from 16s gene sequences

Summary

Installation

Simple example

Mode 1: extract variable regions

Mode 2: extract from primers

Mode 3: extract based on positions

Notes and advance usage

Details of implementation

Options list

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages