-
Notifications
You must be signed in to change notification settings - Fork 7
Prokaryotic 16S rRNA
Vyacheslav Brover edited this page Sep 28, 2021
·
12 revisions
Create a directory seq/
containing prokaryotic 16S rRNA sequences in FASTA format.
The sequences should be filtered for quality.
Each sequence must be in a separate file.
The name of a file must be the same as the identifier of the sequence in a FASTA header.
An example of 120 sequences is provided:
cp $TT/phylogeny/data/16S.fa .
mkdir seq
$TT/genetics/splitFastaDna 16S.fa seq
rm 16S.fa
Populate the table ListC
:
ls seq > seq.list
$TT/database/bulk.sh $SERVER $BULK_LOCAL $BULK_REMOTE seq.list $DATABASE..ListC
Populate the table Locus
by the SQL command:
insert into Locus (id, taxroot, gene)
select id, 2, '16S'
from ListC;
$TT/phylogeny/distTree_inc_init_stnd.sh inc $TT/phylogeny/inc/rRNA/bacteria \
$SERVER $DATABASE $BULK_LOCAL $BULK_REMOTE
If the Univa Grid Engine is not available then the example sequences can be processed by disabling the grid engine by this command:
echo "10000" > inc/grid_min
Create a list of objects start.list
for the initial tree:
ls seq | sort -R | head -100 | sort > start.list
Build an initial tree for 100 sequences:
$TT/phylogeny/distTree_inc_complete.sh inc start.list
Create a list of objects new.list
to add to the tree incrementally:
ls seq > seq.list
$TT/setMinus seq.list start.list > new.list
rm seq.list
Transfer the objects in new.list
to inc/new/
:
$TT/trav new.list "touch inc/new/%f"
Run on a computer with large memory:
$TT/phylogeny/distTree_inc.sh inc 1