Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_taxa_test_pair() issue with time.var = NULL in random effect #33

Open
16svale opened this issue Feb 18, 2024 · 5 comments
Open
Labels
By Design This behavior is intentional and is part of the package's design. It's not considered a bug or an is Not a Bug - Solution Provided This behavior is intentional as part of our design. However, we've provided a solution to accommodat

Comments

@16svale
Copy link

16svale commented Feb 18, 2024

Dear developer,
I am encountering an issue when using the generate_taxa_test_pair() function to test for random effect.

When I use the basic linda function as below it works:

linda.obj.all <- linda(asv_table[, ind], meta[ind, ], formula = '~SampleType1 + (1|SSF)',
feature.dat.type = 'count',
prev.filter = 0.1,
mean.abund.filter = 0.004 ,
is.winsor = TRUE, outlier.pct = 0.03,
p.adj.method = "BH", alpha = 0.05)

However, the generate_taxa_test_pair() that should work the same but used as below gives me an error:
#create test.list
test.list <- generate_taxa_test_pair(
data.obj = data.obj,
group.var = "sampletype1", adj.vars = NULL,
subject.var = "SSF", time.var = NULL,
feature.dat.type = "count",
feature.level = "Genus",
prev.filter = 0.1,
abund.filter = 0.004)

Output:
Rule 1 passed: data.obj is a list.
Rule 2 passed: meta.dat is a data.frame.
Rule 3 passed: The row names of feature.tab match the row names of feature.ann.
Rule 4 passed: The column names of feature.tab match the row names of meta.dat.
Rule 5 passed: feature.tab is a matrix.
Rule 6 passed: feature.ann is a matrix.
Please note: The data components should follow base R data.frame and matrix structures, not phyloseq's formal class.
Validation passed.
Note: Passing validation does not guarantee the absence of all data issues. Further data exploration may be needed.
Your data is in raw format ('Raw'). Normalization is crucial for further analyses. Now, 'mStat_normalize_data' function is automatically applying 'TSS' transformation.
Data has been successfully normalized using TSS method.
0 features are filtered!

The filtered data has 98 samples and 49 features that will be tested!

Fit linear mixed effects models ...
Completed.
Error in grepl(pattern = time.var, x = df_name) :
invalid 'pattern' argument

The generate_taxa_test_pair() function seems designed to optionally handle longitudinal data, given it accepts a time.var parameter, which I set to NULL. This design choice suggests it should be capable of running analyses without a time component, but the implementation does not gracefully handle a NULL time.var.

I write here a piece of the core code of the generate_taxa_test_pair() function:

Case where time.var is NULL

  if (is.null(time.var)) {
    if (is.null(group.var)) {
      fixed_effects <- adj.vars_str
      if (is.null(fixed_effects)) {
        fixed_effects <- "1"  # Intercept-only model
      }
    } else {
      if (!is.null(adj.vars_str)) {
        fixed_effects <- paste(adj.vars_str, "+", group.var)
      } else {
        fixed_effects <- group.var
      }
    }
    random_effects <- paste("(1 |", subject.var, ")")

Do you have any solution for this issue?

Thank you in advance I hope I can improve to implement the use of this package.
Best Regards,
Valentina

@16svale 16svale added the bug Something isn't working label Feb 18, 2024
@16svale 16svale changed the title Issue with random effect in the Issue with random effect in the generate_taxa_test_paired() Feb 18, 2024
@16svale 16svale changed the title Issue with random effect in the generate_taxa_test_paired() Issue with random effect in the generate_taxa_test_pair() Feb 18, 2024
@16svale 16svale changed the title Issue with random effect in the generate_taxa_test_pair() generate_taxa_test_pair() issue with time.var = NULL in random effect Feb 18, 2024
@16svale
Copy link
Author

16svale commented Feb 18, 2024

Is it possible that you should modify the part of the generate_taxa_test_pair function that handles time.var to ensure it only attempts to use time.var in grepl if time.var is not NULL?

A suggestion foe the core code but not sure if it works:

Before using time.var in grepl, check it's not NULL

if (!is.null(time.var) && !is.null(df_name)) {
if (grepl(pattern = time.var, x = df_name)) {
#rest of the code
}
}

@cafferychen777 cafferychen777 added Not a Bug - Solution Provided This behavior is intentional as part of our design. However, we've provided a solution to accommodat and removed bug Something isn't working labels Feb 19, 2024
@cafferychen777
Copy link
Owner

Dear Valentina,

Thank you for your detailed description of the issue you encountered while using the generate_taxa_test_pair() function.

Indeed, the generate_taxa_test_pair() function is designed primarily for paired samples, which inherently involves a time component, such as "pre" and "post" measurements. Therefore, the parameter time.var cannot be set to NULL as the function expects this information to appropriately handle the paired nature of the data.

While I understand your suggestion regarding modifying the function to handle NULL time.var gracefully, it contradicts the fundamental design principle of the function, which requires a time component for paired sample analysis.

Thank you for your understanding, and I hope this clarifies the situation. If you have any further questions or need assistance with alternative analyses, please don't hesitate to ask.

Best Regards,

Chen YANG

@cafferychen777 cafferychen777 added the By Design This behavior is intentional and is part of the package's design. It's not considered a bug or an is label Feb 19, 2024
@16svale
Copy link
Author

16svale commented Feb 19, 2024

Dear @cafferychen777 ,
I understand your answer.
I am wondering if there is another function within the package that you developed that can be used for this purpose.
I mean to get the same results as if I use the linda function and take into consideration the random effect of only one variable:

linda.obj.all <- linda(asv_table[, ind], meta[ind, ], formula = '~SampleType1 + (1|SSF)',
feature.dat.type = 'count',
prev.filter = 0.1,
mean.abund.filter = 0.004 ,
is.winsor = TRUE, outlier.pct = 0.03,
p.adj.method = "BH", alpha = 0.05)

or as the LinDa documentation example report:
linda.obj <- linda(otu.tab, meta, formula = '~Smoke + (1|SubjectID)', alpha = 0.1,
prev.cut = 0.1, lib.cut = 1000, winsor.quan = 0.97)

I believe it must be possible to implement a random effect variable even if is not a timepoint.

Best,
Valentina

@cafferychen777
Copy link
Owner

Dear Valentina,

Thank you for your follow-up question.

I understand your desire to replicate the functionality of the linda function while considering a random effect that is not explicitly a timepoint. However, as the linda function is maintained by other members of our team, any modifications to its functionality would require coordination and discussion among team members.

I apologize for any inconvenience this may cause. I will communicate your request to the relevant team members and discuss the possibility of implementing such functionality in future updates of the package.

Thank you for your understanding and patience in this matter. If you have any further questions or concerns, please feel free to reach out.

Best Regards,

Chen YANG

@16svale
Copy link
Author

16svale commented Feb 19, 2024

Dear Chen Yang,

I understand. However, the linda function already allows the inclusion of a random effect variable that is not explicitly a timepoint (as shown in the documentation of linda() function). Is it correct?

Best Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
By Design This behavior is intentional and is part of the package's design. It's not considered a bug or an is Not a Bug - Solution Provided This behavior is intentional as part of our design. However, we've provided a solution to accommodat
Projects
None yet
Development

No branches or pull requests

2 participants