You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran PhylotaR according to the tutorial, modifying it as is shown in the code below.
To create fasta files for each cluster I ran a loop in R. But when checking the sequences in the resulting fasta-files I discovered that they contain different loci. The first cluster (and corresponding fasta) contained ITS and trnL. This issue is present in other clusters as well even in the pre-filtered phylota-object.
Is this the result of a mistake in my code or is this a PhylotaR issue? And how can I possibly fix it?
Regards,
Bart
The code:
library(phylotaR)
wd <- '[FILEPATH TO WORKING DIRECTORY]'
ncbi_dr <- '[FILEPATH TO COMPILED BLAST+ TOOLS]'
txid <- 4210 # Asteracea ID
setup(wd = wd, txid = txid, ncbi_dr = ncbi_dr)
run(wd=wd)
phylota=read_phylota(wd)
#Create a new Phylota-object that has one sequence per taxon.
phylota_sp <- drop_by_rank(phylota = phylota, rnk = 'species', n = 1)
cids=phylota_sp@cids
#Create a new Phylota-object, but with excluding clusters that only contain a few taxa.
ntaxa=get_ntaxa(phylota = phylota_sp, cid = cids)
keep=cids[ntaxa>50]#new set of cluster IDs with more than 50 taxa
selected=drop_clstrs(phylota = phylota_sp, cid = keep)
#And now the loop.
cids_sel=selected@cids
#A for loop that goes over the cluster IDs
for (i in cids_sel){
txids <- get_txids(phylota = selected, cid = i, rnk = 'species')#makes a list of taxon-IDs for the specific cluster
# look up name for Taxon IDs per cluster and create a list out of it.
scientific_names <- get_tx_slot(phylota = selected, txid = txids, slt_nm = 'scnm')
#formatting the names
scientific_names <- gsub('\\.', '', scientific_names)
scientific_names <- gsub('\\s+', '_', scientific_names)
#specify the sequence IDs per cluster
sids <- reduced@clstrs[[i]]@sids
# write out (one file per cluster)
pad=[GENERAL PATH TO LOCATION]
outfile= str_c(pad, i)#add the cluster ID to the filename
write_sqs(phylota = selected, sid = sids, sq_nm = scientific_names,
outfile = outfile)}
The text was updated successfully, but these errors were encountered:
I've managed to re-run your analysis for Asteracea and I'm looking over the clusters. So that I can repeat your process, how are you determining that there is a mix of ITS and trnL in the first cluster?
The top thing I've noticed is ...
#specify the sequence IDs per clustersids<-reduced@clstrs[[i]]@sids
Should be:
#specify the sequence IDs per clustersids<-selected@clstrs[[i]]@sids
Where does the "reduced" phylota object originate? Mixing sids across different phylota objects could be causing the mixing of sequences in a cluster.
Hello,
I ran PhylotaR according to the tutorial, modifying it as is shown in the code below.
To create fasta files for each cluster I ran a loop in R. But when checking the sequences in the resulting fasta-files I discovered that they contain different loci. The first cluster (and corresponding fasta) contained ITS and trnL. This issue is present in other clusters as well even in the pre-filtered phylota-object.
Is this the result of a mistake in my code or is this a PhylotaR issue? And how can I possibly fix it?
Regards,
Bart
The code:
The text was updated successfully, but these errors were encountered: