Skip to content

Prokaryotic 16S rRNA

Vyacheslav Brover edited this page Sep 28, 2021 · 12 revisions

Get sequence files

Create a directory seq/ containing prokaryotic 16S rRNA sequences in FASTA format.
The sequences should be filtered for quality.
Each sequence must be in a separate file.
The name of a file must be the same as the identifier of the sequence in a FASTA header.

An example of 120 sequences is provided:

cp $TT/phylogeny/data/16S.fa .
mkdir seq
$TT/genetics/splitFastaDna 16S.fa seq
rm 16S.fa

Put sequence names into the table Locus

Populate the table ListC:

ls seq > seq.list
$TT/database/bulk.sh $SERVER $BULK_LOCAL $BULK_REMOTE seq.list $DATABASE..ListC

Populate the table Locus by the SQL command:

insert into Locus (id, taxroot, gene)
  select id, 2, '16S'
    from ListC;

Create an incremental distance tree directory inc/

$TT/phylogeny/distTree_inc_init_stnd.sh inc $TT/phylogeny/inc/rRNA/bacteria \
   $SERVER $DATABASE $BULK_LOCAL $BULK_REMOTE

If the Univa Grid Engine is not available then the example sequences can be processed by disabling the grid engine by this command:

echo "10000" > inc/grid_min

Build an initial tree

Create a list of objects start.list for the initial tree:

ls seq | sort -R | head -100 | sort > start.list

Build an initial tree for 100 sequences:

$TT/phylogeny/distTree_inc_complete.sh inc start.list

Build a tree incrementally from the initial tree

Create a list of objects new.list to add to the tree incrementally:

ls seq > seq.list
$TT/setMinus seq.list start.list > new.list
rm seq.list

Transfer the objects in new.list to inc/new/:

$TT/trav new.list "touch inc/new/%f"

Run on a computer with large memory:

$TT/phylogeny/distTree_inc.sh inc 1