Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: number of observations (=44) <= number of random effects (=44) for term (1 + time | subject) #26

Open
AlexanderUm opened this issue Jan 22, 2024 · 8 comments
Labels
By Design This behavior is intentional and is part of the package's design. It's not considered a bug or an is Not a Bug - Solution Provided This behavior is intentional as part of our design. However, we've provided a solution to accommodat

Comments

@AlexanderUm
Copy link

First of all thank you for a very useful package!
It would be great if you can give some clarifications regarding the following error:
Error in linda: Error in fun(i): task 1 failed - "number of observations (=44) <= number of random effects (=44) for term (1 + time | subject); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable"
This error occurs frequently when generate_taxa_test_pair() function is used, including dataset provided together with the MicrobiomeStat package(peerj32.obj).
Despite the error, the function produce results, however I am not sure if this results are reliable.
Can you please clarify this issue?

@cafferychen777
Copy link
Owner

Dear @AlexanderUm,

First and foremost, thank you for your kind words and for using our package!

Regarding the error message you encountered: "Error in linda: Error in fun(i): task 1 failed - 'number of observations (=44) <= number of random effects (=44) for term (1 + time | subject); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable'". This error typically arises when attempting to fit a complex model that requires more data than what is currently available.

In the MicrobiomeStat package, particularly in the generate_taxa_test_pair() function, we have indeed defaulted to a more complex model by design. The rationale behind this is that complex models can potentially offer more insightful results, given sufficient data.

However, when the data set is relatively small, as in the case with 44 observations in your dataset, these complex mathematical models often struggle to fit properly. This is due to the intricate nature of these models, which require a larger amount of data to identify the random-effects parameters and the residual variance accurately.

To address this, we have implemented a fallback mechanism in the code. When the primary complex model fails to fit due to limited data, the function automatically switches to a simpler model that is more suitable for smaller datasets. This ensures that the function still produces results, even when the ideal conditions for the complex model are not met.

Therefore, the results you are seeing, despite the initial error, are indeed reliable. They are generated from the simpler model that is designed to work effectively with the amount of data you have.

We appreciate your engagement and hope this clarifies the issue. Please do not hesitate to reach out if you have any more questions or need further assistance.

Best regards,
Chen YANG

@cafferychen777 cafferychen777 added By Design This behavior is intentional and is part of the package's design. It's not considered a bug or an is Not a Bug - Solution Provided This behavior is intentional as part of our design. However, we've provided a solution to accommodat labels Jan 22, 2024
@adahal123
Copy link

adahal123 commented Feb 10, 2024

Hello, this is somewhat related to the question above. I have been using just linda to do my differential analysis testing and it's been easy to plug in a formula with fixed and random effects. However, I am starting to explore all of MicrobiomeStat starting with "generate_taxa_test_single" but am unclear where I should be putting in random effects.
I placed all my fixed exposed in the adj.vars but can't figure out how to get it to take a few random effects. I am also unclear how I go about setting a reference grouping variable.

I am sorry if these are super basic but I had gotten using to phyloseq and manipulating data there so this is a bit of a learning curve for me.

test.list <- generate_taxa_test_single(
data.obj = data.obj,
group.var = "site",
adj.vars = c("sex", "age"),
feature.dat.type = "count",
feature.level = "Genus",
prev.filter = 0,
abund.filter = 0,
feature.sig.level = 0.05)

@cafferychen777
Copy link
Owner

Hello @adahal123,

Regarding your question about incorporating random effects into your analysis using generate_taxa_test_single in MicrobiomeStat, and your experience transitioning from phyloseq, it's understandable that adjusting to a new tool and its capabilities can present a learning curve.

In the context of cross-sectional study designs, the inclusion or exclusion of random effects in your model may not significantly impact the results. This is because cross-sectional designs typically assess the relationships or associations at a single point in time, and the variability attributed to random effects might not be as critical as in longitudinal or hierarchical data structures where the data points are nested or repeated measures are involved.

For the generate_taxa_test_single function you've used, it appears to primarily focus on fixed effects through the adj.vars argument, as shown in your code snippet. If you're looking to incorporate random effects into your microbiome data analysis, this particular function linda() might support that directly.

I hope this helps clarify things a bit. It's completely okay to have questions as you adapt to a new analysis framework. The transition from a familiar tool like phyloseq to something like MicrobiomeStat involves getting used to different functions and ways of specifying models, including how to handle fixed and random effects. If you have further questions or need more detailed guidance, please feel free to ask.

Best regards,
Chen YANG

@adahal123
Copy link

Thanks for your quick response. My data is partially longitudinal (ie. not every data point has longitudinal data points/is missing some data points for certain visits bc of the nature of clinical data).

I have used linda() for my analysis but I was hoping to use the full suite of MicrobiomeStat for generation of graphs and plots. I see that the paired and longitudinal data functions have subject variable (random effect) and time variable. is there a way to include other variables (ie sequencing batch) in the paired/longitudinal functions?

@aherms12
Copy link

@cafferychen777 I'm encountering this same issue on the generate_beta_trend_test_long function. Does it also have a simpler fallback model?

I also have a few data points missing due to clinical data so when I run it I get this error: number of observations (=60) <= number of random effects (=62) for term (1 + time.num | ID). However, if I subset the data to only include subjects with complete data, I still get the error because they are equal (=58 each), and for this function, it does not still provide data.

@cafferychen777
Copy link
Owner

cafferychen777 commented Mar 20, 2024

Hello @adahal123,

I apologize for the delayed response, as I have been preparing for my PhD interviews.

Regarding your question about incorporating batch effects into your analysis with MicrobiomeStat, you can indeed try including the batch variable as a covariate in the models. Since our functions allow for the inclusion of multiple covariates, you can experiment with adding the batch variable alongside other covariates to see how it affects your results.

However, I would like to recommend an approach specifically designed to address batch effects in microbiome data. Our collaborators have developed a method called Conditional Quantile Regression (ConQuR) that is tailored for microbiome data. ConQuR uses a two-part quantile regression model to remove batch effects while accommodating the complex distributions of microbial read counts. This approach generates batch-removed zero-inflated read counts that can be used in subsequent analyses, preserving the signals of interest.

The citation for the ConQuR approach is as follows:
"Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) approach to remove microbiome batch effects using a two-part quantile regression model. ConQuR is a comprehensive method that accommodates the complex distributions of microbial read counts by non-parametric modeling, and it generates batch-removed zero-inflated read counts that can be used in and benefit usual subsequent analyses. We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest."

I hope this information is helpful. Please feel free to reach out if you have any further questions or need assistance with your analysis.

Best regards,
Chen YANG

@cafferychen777
Copy link
Owner

Hello @aherms12,

I apologize for the late reply; I have been busy with PhD interviews in the past few days. Regarding your question about the generate_beta_trend_test_long function and the fallback model, you are correct that there should be a simpler fallback model in place for situations where the data is insufficient for the more complex model.

I appreciate you bringing this to my attention. I realize now that I did not encounter this issue during my testing with sample data, and as a result, I did not implement the fallback model for this function. However, I plan to address this in the next one or two days by adding a simpler fallback model to the generate_beta_trend_test_long function.

I apologize for any inconvenience this may have caused and thank you for your patience. Please feel free to reach out if you have any further questions or concerns.

Best regards,
Chen YANG

@cafferychen777
Copy link
Owner

Dear all,

Thank you for your continued feedback on the generate_beta_trend_test_long function. I’m pleased to inform you that the function has now been updated to handle the overparameterization issue some of you have encountered. Specifically, the new version introduces an adaptive model fitting approach that:

  1. First attempts to fit a model with both random intercepts and slopes.
  2. If overparameterization is detected, the model automatically simplifies to use only random intercepts.

To update the MicrobiomeStat package, please use the following commands:

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("cafferychen777/MicrobiomeStat")

After updating, the function should handle datasets with limited observations more effectively. As always, feel free to report any further issues or provide additional feedback. Thank you for your patience and contributions to improving MicrobiomeStat!

Best regards,
Chen Yang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
By Design This behavior is intentional and is part of the package's design. It's not considered a bug or an is Not a Bug - Solution Provided This behavior is intentional as part of our design. However, we've provided a solution to accommodat
Projects
None yet
Development

No branches or pull requests

4 participants