-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditionally & persistently rare taxa #83
Comments
This article categorizes microorganisms based on threshold values:always AT (AAT) with a relative abundance of ≥1% in the dataset; conditionally AT (CAT) with ≥1% relative abundance in some samples, but never <0.01%; always RT (ART) with <0.01% relative abundance in all samples; conditionally RT (CRT) with <1% relative abundance in all initial samples, but <0.01% in some sample; moderate taxa (MT) with between 0.01% and 1% relative abundances in all data; and conditionally abundant and rare taxa (CRAT) whose relative abundances ranged from rare values of <0.01% to abundant values of ≥1%. As also previously described, CRAT, CAT, and AAT were collectively referred to as abundant taxa (AT), while ART and CRT comprised rare taxa (RT).(https://www.sciencedirect.com/science/article/pii/S0013935123028347#sec2). However,` I am currently facing the same issue and cannot find the R code to classify them, nor do I know how to calculate their diversity indices after classification. If you have already solved this problem, please contact me (email: [email protected]). Below is the only information I have obtained, the R code to classify them. `otu <- read.table("C:/Users/heng/Desktop/micro/BF.txt", header = TRUE, row.names = 1, sep = "\t") #(i)稀有类群(rare taxa,RT),在所有样本中丰度均 ≤0.1% 的 OTU #(vi)条件稀有或丰富类群(conditionally rare or abundant taxa,CRAT),丰度跨越从稀有(最低丰度 ≤0.1%)到丰富(最高丰度 ≥1%)的 OTU #备注:这 6 个类群没有重叠,总数即等于 OTU 表的总数,相对丰度总和 100% library(openxlsx) for (i in 1:(ncol(otu)-1)) otu[[i]] <- ifelse(as.character(otu[[i]]) == '0', NA, otu[[ncol(otu)]]) library(ggplot2) |
This might use addPrevalence functions from mia, and just add argument to define the conditional abundance. |
This should be feasible measure to implement per each taxa, could you give a try @Daenarys8 |
we could have a wrapper function/method that classifies taxa into six distinct categories and return a list: always abundant taxa (AAT), conditionally abundant taxa (CAT), always rare taxa (ART), conditionally rare taxa (CRT), moderate taxa (MT), and conditionally rare and abundant taxa (CRAT).
for some example:
|
Do we have all these definitions in the indicated (or other) literature, for instance MT and CRAT, or did you come up with these definitions on your own? They might be useful but good to know. Some suggestions:
|
@ginkgozh stated this article that mentioned the additional types with their definitions: link to literature |
Can this be handled by calculating prevalence for each feature for each time point, i.e., by looping all the time points? Then we have X*N matrix where X is features and N time points. Based on this matrix, it should be easy to classify features to these categories. The only time consuming part is looping the time points (usually this is not problem since the number of time points is so little); everything else should be efficient |
In fact, this could be done also for other than time series ("Conditionally rare taxa are defined for instance as taxa with a maximum relative abundance at least N times higher than their minimum value."). Although time series are an interesting application. So essentially the user needs to decide how to split the data, and we could provide a standard function to calculate CRT (or other such measures) for the given data. If this is done for time series, then user may like to calculate this per time series (ie. per subject) for each feature. Then one should not do it by looping over time points but rather by looping over time series. It could be one function with different options (as in diversity calculation). Note that prevalence calculation is closely related and possibly could even be included among the options. But I am not sure if that is helpful as it is so common and might be useful as a dedicated function of its own. Could we create a minimal function to calculate this for a given data vector and then matrix (unless it already exists in R ecosystem?) and then see if this could/should be converted into a full TreeSE wrapper? |
we can continue this discussion in the pr #98 |
In microbiome time series analyses we have the definitions of abundant taxa, conditionally rare taxa, persistently rare taxa and other rare taxa (see e.g. the two refs below).
Conditionally rare taxa are defined for instance as taxa with a maximum relative abundance at least N times higher than their minimum value.
Persistently rare taxa are taxa whose maximum relative abundance never exceeds X times greater than the minimum.
https://doi.org/10.1093%2Ffemsec%2Ffix126
https://doi.org/10.1128/mbio.01371-14
It could be helpful to have a function or example showing how to fetch these taxa sets from microbiome time series.
The text was updated successfully, but these errors were encountered: