Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mStat_generate_report_long() : Error in update_data_obj_count() #62

Open
wode0000ai opened this issue Aug 1, 2024 · 4 comments
Open
Labels
Bug Fixed This bug has been addressed and resolved in the latest update. bug Something isn't working

Comments

@wode0000ai
Copy link

Dear Chen,

I hope this message finds you well.

I have encountered an issue while using the mStat_generate_report_long() function from the MicrobiomeStat package. Below is the script I am using, which is based on the tutorial script provided. Despite following the tutorial closely, I am receiving the following error message:

Error in update_data_obj_count():
! The provided count table must have row and column names.
Backtrace:
MicrobiomeStat::generate_taxa_areaplot_long(...)
MicrobiomeStat::mStat_normalize_data(data.obj, method = "TSS")
MicrobiomeStat:::update_data_obj_count(...)
Quitting from lines 195-215 [taxa-areaplot-longitudinal-generation]

Here is the script I am using:

setwd("/Users/tongfiles/Downloads/FLX micro")

library(MicrobiomeStat)
library(ape)

# Import the data
feature.tab <- read.csv("/Users/tongfiles/Downloads/FLX micro/otu_table.csv", header = TRUE, row.names = 1)
feature.tab <- feature.tab[1:10000,]
feature.tab[] <- lapply(feature.tab, function(x) as.numeric(as.character(x)))

meta.dat <- read.csv("/Users/tongfiles/Downloads/FLX micro/groupmap_FLX.csv", header = TRUE, row.names = 1)
feature.ann <- read.csv("/Users/tongfiles/Downloads/FLX micro/otu_taxa_table2.csv", header = TRUE, row.names = 1)
feature.ann <- feature.ann[1:10000,]

tree <- read.tree("/Users/tongfiles/Downloads/FLX micro/phylogeny.tre")

# Create data object
data.obj <- list(
  feature.tab = as.matrix(feature.tab),
  meta.dat = meta.dat,
  feature.ann = as.matrix(feature.ann),
  tree = tree
)

# Specify variable names
group.var = "Group"
subject.var = "Subject"
time.var = "Timepoint"
strata.var = NULL

# Specify diversity indices
alpha.name = c("shannon", "observed_species")
dist.name = c("BC", "Jaccard")

# Specify feature levels for visualization and testing
vis.feature.level = c("Genus")
test.feature.level = c("Genus")

# Specify other parameters
feature.dat.type = "count"
theme.choice = "bw"
base.size = 20
feature.mt.method = "none"
feature.sig.level = 0.3
feature.box.axis.transform = "sqrt"

# Specify output file
output.file = "Omics Analysis Report.pdf"

# Specify parameters for feature retention
bar.area.feature.no = 10
heatmap.feature.no = 40

# Specify optional parameters
dist.obj = NULL
alpha.obj = NULL
feature.change.func = "relative change"

# Run the function
mStat_generate_report_long(
  data.obj = data.obj,
  group.var = group.var,
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  strata.var = strata.var,
  subject.var = subject.var,
  time.var = time.var,
  t0.level = NULL,
  ts.levels = NULL,
  alpha.obj = alpha.obj,
  alpha.name = alpha.name,
  dist.obj = dist.obj,
  dist.name = dist.name,
  feature.change.func = feature.change.func,
  vis.feature.level = vis.feature.level,
  test.feature.level = test.feature.level,
  bar.area.feature.no = bar.area.feature.no,
  heatmap.feature.no = heatmap.feature.no,
  feature.dat.type = feature.dat.type,
  feature.mt.method = feature.mt.method,
  feature.sig.level = feature.sig.level,
  feature.box.axis.transform = feature.box.axis.transform,
  theme.choice = theme.choice,
  base.size = base.size,
  output.file = output.file
)

Here is the structure of my object:

```r
str(data.obj)
List of 4
 $ feature.tab: num [1:10000, 1:37] 24 0 14 44 38 51 134 4 50 9 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:10000] "OTU1" "OTU2" "OTU3" "OTU4" ...
  .. ..$ : chr [1:37] "pre_10" "pre_1" "pre_2" "pre_3" ...
 $ meta.dat   :'data.frame':	37 obs. of  3 variables:
  ..$ Subject  : chr [1:37] "m10" "m1" "m2" "m3" ...
  ..$ Group    : chr [1:37] "FLX" "FLX" "FLX" "control" ...
  ..$ Timepoint: chr [1:37] "baseline" "baseline" "baseline" "baseline" ...
 $ feature.ann: chr [1:10000, 1:7] "d__Bacteria" "d__Bacteria" "d__Bacteria" "d__Bacteria" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:10000] "OTU1" "OTU2" "OTU3" "OTU4" ...
  .. ..$ : chr [1:7] "Kingdom" "Phylum" "Class" "Order" ...
 $ tree       :List of 5
  ..$ edge       : int [1:77485, 1:2] 38754 38755 38755 38754 38756 38756 38754 38757 38758 38758 ...
  ..$ edge.length: num [1:77485] 0.00011 0.00171 0.00085 0.00014 0.00428 0.00428 0.00014 0.00014 0.00343 0.00086 ...
  ..$ Nnode      : int 38733
  ..$ node.label : chr [1:38733] "" "0.902" "0.978" "0.929" ...
  ..$ tip.label  : chr [1:38753] "OTU11301" "OTU7973" "OTU5811" "OTU2336" ...
  ..- attr(*, "class")= chr "phylo"
  ..- attr(*, "order")= chr "cladewise"

I have verified that my count table (feature.tab) and other data frames (meta.dat and feature.ann) have both row and column names. Here are the checks I performed to ensure that:

# Check for NA values
if (any(is.na(feature.tab))) {
  stop("feature.tab contains NA values.")
}

# Check if row names and column names are character strings
if (!all(sapply(rownames(feature.tab), is.character)) || !all(sapply(colnames(feature.tab), is.character))) {
  stop("Row names and column names of feature.tab must be character strings.")
}

# Print first few rows and columns to visually inspect the data
print(head(feature.tab))
print(colnames(feature.tab))
print(rownames(feature.tab))

All checks passed successfully, yet the issue persists. I would greatly appreciate any guidance or suggestions you may have to resolve this problem.

Thank you very much for your time and assistance.

@wode0000ai wode0000ai added the bug Something isn't working label Aug 1, 2024
@cafferychen777
Copy link
Owner

Dear @wode0000ai,

Thank you for reaching out and providing such a detailed description of the issue you're facing with the mStat_generate_report_long() function from the MicrobiomeStat package. I appreciate the thorough information you've shared, including your R script, the error message, and the structure of your data object.

To help diagnose the problem more efficiently, I was wondering if you would be comfortable sharing your data files with me directly. If so, could you please send them to my email address: [email protected]

Having access to the actual data would allow me to replicate the issue on my end and potentially identify the source of the error more quickly. Of course, I understand if you have any concerns about data privacy or confidentiality. If you prefer not to share the data, please let me know, and we can continue troubleshooting based on the information you've already provided.

Thank you for your patience as we work to resolve this issue. I look forward to hearing back from you.

Best regards,
Chen YANG

@cafferychen777
Copy link
Owner

Dear @wode0000ai,

Thank you for sharing the detailed information about your issue with the mStat_generate_report_long() function. After carefully reviewing the structure of your data object, I believe I've identified a potential cause of the error you're encountering.

The issue likely stems from the row names in your meta.dat data frame. It appears that these row names may not be set to match the sample names (column names) in your feature.tab matrix. This mismatch can cause the error you're seeing: "The provided count table must have row and column names."

To resolve this, you can try the following:

  1. Check if the row names of your meta.dat match the column names of your feature.tab. You can do this with:

    all(rownames(data.obj$meta.dat) == colnames(data.obj$feature.tab))
  2. If this returns FALSE, you can fix it by setting the row names of meta.dat to match the column names of feature.tab:

    rownames(data.obj$meta.dat) <- colnames(data.obj$feature.tab)
  3. After making this change, try running your mStat_generate_report_long() function again.

This adjustment ensures that your metadata correctly aligns with your feature table, which is crucial for many microbiome analysis functions.

If you make this change and still encounter issues, please let me know, and we can investigate further. Don't hesitate to reach out if you need any clarification or assistance in implementing this fix.

Best regards,
Chen YANG

@wode0000ai
Copy link
Author

Dear @cafferychen777,
Thank you for your reply. I've checked the match, and below is the return:

> all(rownames(data.obj$meta.dat) == colnames(data.obj$feature.tab))
[1] TRUE

I have sent you the data as requested, and I look forward to any further guidance or suggestions you might have. Please let me know if there is any problem with the email. Your support has been incredibly valuable to me, and I am grateful for your expertise and time.

@cafferychen777
Copy link
Owner

cafferychen777 commented Aug 4, 2024

Dear @wode0000ai,

Thank you for your detailed explanation of the problem you're experiencing with the MicrobiomeStat package. After reviewing your code and data structure, I believe I've identified the root cause of the error you're encountering.

The issue stems from the format of your time variable. Currently, your time.var is stored as a character type, whereas the function expects it to be either a factor or numeric type. Additionally, you haven't specified the t0.level and ts.levels parameters, which are crucial for longitudinal analysis.

To resolve this, I suggest the following steps:

  1. Convert your time variable to a factor:

    data.obj$meta.dat$Timepoint <- factor(data.obj$meta.dat$Timepoint, levels = c("baseline", "4w", "8w"))
  2. Specify the t0.level and ts.levels in your function call:

    t0.level = "baseline"
    ts.levels = c("4w", "8w")

Include these parameters in your mStat_generate_report_long() function call.

While this should resolve the immediate error, I've noticed another potential issue. Your dataset appears to have very few time points, which may prevent some of the longitudinal tests from running successfully. Given this limitation, I would recommend using individual functions from the package rather than generating a full report.

If you need any further assistance or clarification on using these individual functions, please don't hesitate to ask. I'm here to help you get the most out of your analysis with MicrobiomeStat.

Best regards,
Chen YANG

@cafferychen777 cafferychen777 added the Bug Fixed This bug has been addressed and resolved in the latest update. label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Fixed This bug has been addressed and resolved in the latest update. bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants