Multiple markers/loci per cluster issue #53

bpjburger · 2021-02-03T13:20:28Z

Hello,

I ran PhylotaR according to the tutorial, modifying it as is shown in the code below.
To create fasta files for each cluster I ran a loop in R. But when checking the sequences in the resulting fasta-files I discovered that they contain different loci. The first cluster (and corresponding fasta) contained ITS and trnL. This issue is present in other clusters as well even in the pre-filtered phylota-object.

Is this the result of a mistake in my code or is this a PhylotaR issue? And how can I possibly fix it?

Regards,

Bart

The code:

library(phylotaR)
wd <- '[FILEPATH TO WORKING DIRECTORY]'
ncbi_dr <- '[FILEPATH TO COMPILED BLAST+ TOOLS]'
txid <- 4210  # Asteracea ID
setup(wd = wd, txid = txid, ncbi_dr = ncbi_dr)
run(wd=wd)
phylota=read_phylota(wd)

#Create a new Phylota-object that has one sequence per taxon.

phylota_sp <- drop_by_rank(phylota = phylota, rnk = 'species', n = 1)
cids=phylota_sp@cids

#Create a new Phylota-object, but with excluding clusters that only contain a few taxa. 

ntaxa=get_ntaxa(phylota = phylota_sp, cid = cids)
keep=cids[ntaxa>50]#new set of cluster IDs with more than 50 taxa
selected=drop_clstrs(phylota = phylota_sp, cid = keep)

#And now the loop.

cids_sel=selected@cids

#A for loop that goes over the cluster IDs
for (i in cids_sel){
  txids <- get_txids(phylota = selected, cid = i, rnk = 'species')#makes a list of taxon-IDs for the specific cluster
  
# look up name for Taxon IDs per cluster and create a list out of it.
scientific_names <- get_tx_slot(phylota = selected, txid = txids, slt_nm = 'scnm')

#formatting the names
scientific_names <- gsub('\\.', '', scientific_names)
scientific_names <- gsub('\\s+', '_', scientific_names)

#specify the sequence IDs per cluster
  sids <- reduced@clstrs[[i]]@sids
  
  # write out (one file per cluster)
  pad=[GENERAL PATH TO LOCATION]
  outfile= str_c(pad, i)#add the cluster ID to the filename
  write_sqs(phylota = selected, sid = sids, sq_nm = scientific_names,
            outfile = outfile)}

The text was updated successfully, but these errors were encountered:

DomBennett · 2021-02-21T18:57:48Z

Hi,

I've managed to re-run your analysis for Asteracea and I'm looking over the clusters. So that I can repeat your process, how are you determining that there is a mix of ITS and trnL in the first cluster?

The top thing I've noticed is ...

#specify the sequence IDs per cluster
sids <- reduced@clstrs[[i]]@sids

Should be:

#specify the sequence IDs per cluster
sids <- selected@clstrs[[i]]@sids

Where does the "reduced" phylota object originate? Mixing sids across different phylota objects could be causing the mixing of sequences in a cluster.

Bunholi · 2021-05-20T14:26:01Z

Hi @bpjburger did you solve the problem following @DomBennett correction?
I am planning to do the same around here

bpjburger · 2021-05-21T08:20:14Z

Hi @Bunholi,

Unfortunately, the corrections did no solve the problem.
I think that it is a software issue.

maelle · 2022-05-09T10:14:02Z

For info, we're looking for a new maintainer / a new maintainer team for this package, see #57 and feel free to volunteer, we'd be happy to help.

ShixiangWang added the mid label Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple markers/loci per cluster issue #53

Multiple markers/loci per cluster issue #53

bpjburger commented Feb 3, 2021

DomBennett commented Feb 21, 2021

Bunholi commented May 20, 2021

bpjburger commented May 21, 2021

maelle commented May 9, 2022

Multiple markers/loci per cluster issue #53

Multiple markers/loci per cluster issue #53

Comments

bpjburger commented Feb 3, 2021

DomBennett commented Feb 21, 2021

Bunholi commented May 20, 2021

bpjburger commented May 21, 2021

maelle commented May 9, 2022