Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short term change #94

Open
wants to merge 13 commits into
base: devel
Choose a base branch
from
Open

Short term change #94

wants to merge 13 commits into from

Conversation

Daenarys8
Copy link
Contributor

Function to return short term change in a dataframe

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
@Daenarys8
Copy link
Contributor Author

@antagomir @TuomasBorman

@TuomasBorman
Copy link
Collaborator

In holiday currently. I will check this early next week.

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Copy link
Collaborator

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is probably also not the most urgent ones

The calcultion part / core functionality is rather simple. The function should:

  1. Get data with mia::meltSE (transformations sould be applied in prior to this function)
  2. Calculate short term change based on group and time.col
  3. Return the data.frame

After this, user could plot the results if wanted

R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
R/shortTermChange.R Outdated Show resolved Hide resolved
@antagomir
Copy link
Member

Good to finalize this as proposed.

@antagomir
Copy link
Member

antagomir commented Nov 3, 2024

Example shows the following:

# Load time series data
#' data(minimalgut)
#' tse <- minimalgut
#' 
#' short_time_labels <- c("74.5h", "173h", "438h", "434h", "390h")
#' 
#' # Subset samples by Time_lable and StudyIdentifier
#' tse <- tse[, !(colData(tse)$Time_label %in% short_time_labels)]
#' tse <- tse[, (colData(tse)$StudyIdentifier == "Bioreactor A")]
#' 
#' # Plot short term change in abundance
#' shortTermChange(tse, rarefy = TRUE, plot = TRUE) + ggtitle("Bioreactor A")

Is this mean to be an analysis or visualization function?

The example could explain a bit more what this aims to show. I am not sure if I understand the motivation of this function from the manpage description.

It would be good to have a separate function, like getShortTermChange / addShortTermChange and then a separate function for visualization (unless it is easy to do with existing functions, like miaViz::plotSeries).

Is this linked to some miaTime issue? Issue #7 ?

@Daenarys8
Copy link
Contributor Author

#'
#' # Plot short term change in abundance
#' shortTermChange(tse, rarefy = TRUE, plot = TRUE) + ggtitle("Bioreactor A")


Is this mean to be an analysis or visualization function?

original reference had plotting as part of the function. It is now removed and the function returns a dataframe with short term changes

Is this linked to some miaTime issue? Issue #7 ?

yes, syncomR consumed phyloseq object.

Signed-off-by: Daena Rys <[email protected]>
@Daenarys8
Copy link
Contributor Author

Moreover given the nature of the data frame results, I am not sure plotSeries is the right visualization method for it. Suppose we somehow addShortTermChange using one of the tse slots(say maybe reducedDims), it would still require it's own ggplot wrapper .
I just checked but to be sure @TuomasBorman do we have miaViz function that takes dataframes?
For reference, this is how the plots from syncomR could look like linear or polarized:

Rplotstc
Rplot01stc

and of course we can also discuss how we want to represent our own plots.

Copy link
Collaborator

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice

R/getShortTermChange.R Outdated Show resolved Hide resolved
R/getShortTermChange.R Outdated Show resolved Hide resolved
R/getShortTermChange.R Outdated Show resolved Hide resolved
R/getShortTermChange.R Outdated Show resolved Hide resolved
R/getShortTermChange.R Outdated Show resolved Hide resolved
@TuomasBorman
Copy link
Collaborator

Moreover given the nature of the data frame results, I am not sure plotSeries is the right visualization method for it. Suppose we somehow addShortTermChange using one of the tse slots(say maybe reducedDims), it would still require it's own ggplot wrapper . I just checked but to be sure @TuomasBorman do we have miaViz function that takes dataframes? For reference, this is how the plots from syncomR could look like linear or polarized:

Rplotstc Rplot01stc

and of course we can also discuss how we want to represent our own plots.

There is no that kind of restriction that miaViz cannot have functions that take df as input. In fact, loadings plotting function can take df as input,

This outputs currently data.frame and I think it makes sense. reducedDims is not correct because this does not reduce dimensionality. This could be stored to metadata and then we could have a plotting function for TreeSE and df.

@antagomir
Copy link
Member

Or can it be part of colData? Could be simpler. This is a measure that is available on a per-sample basis. Except that some samples may have NAs if they are missing the comparison.

@TuomasBorman
Copy link
Collaborator

Or can it be part of colData? Could be simpler. This is a measure that is available on a per-sample basis. Except that some samples may have NAs if they are missing the comparison.

That could be possible but then columns would denote features. It might be confusing and there might be lots of features to add in the columns

@antagomir
Copy link
Member

Or can it be part of colData? Could be simpler. This is a measure that is available on a per-sample basis. Except that some samples may have NAs if they are missing the comparison.

That could be possible but then columns would denote features. It might be confusing and there might be lots of features to add in the columns

Yep, not so optimal.

@antagomir
Copy link
Member

"This function essentially calculates the change in abundance over time for a microbe by abundancet2/ abundancet1."

We already have getStepwiseDivergence?

getStepwiseDivergence calculates x(t + d) - x(t) for abundance x at time points t+d vs. t any user defined interval d along the time series.

getShortTermChange calculates this per taxa. One can generalize and use it for a more general interval x(t+d)/x(t), analogous to getStepwiseDivergence.

getShortTermChange suggests to do x(t+d)/x(t) instead of x(t+d)-x(t). But both are available and the choice may also depend on whether the data is logarithmic. log(a/b) = log(a)-log(b) although zeroes may cause trouble. It could be enough to implement the difference "x-y" and the rest ("x/y") can be always dealt (by the user?) with log/exp operations the data if necessary.

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Copy link
Collaborator

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some suggestions

# Reshape data and calculate growth metrics
assay_data <- meltSE(x, assay.type = assay.type,
add.col = time.col, row.name = "Feature_ID")
assay_data <- assay_data %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use |> operators. They are used in elsewhere

Comment on lines +79 to +83
#' @export
setMethod("addShortTermChange", signature = c(x = "SummarizedExperiment"),
function(x, assay.type = "counts", name = "short_term_change", ...){
# Calculate short term change
res <- getShortTermChange(x, ...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check BiocHCeck::BiocHeck() at least indentation is off

Comment on lines +21 to +26
#' \url{https://github.com/nwisnoski/ul-seedbank}. Their approach is based on
#' the following calculation log(present abundance/past abundance).
#' Also a compositional version using relative abundance similar to
#' Brian WJi, Sheth R et al
#' \url{https://www.nature.com/articles/s41564-020-0685-1} can be used.
#' This approach is useful for identifying short term growth behaviors of taxa.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this indentation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also from above

#'
#' short_time_labels <- c("74.5h", "173h", "438h", "434h", "390h")
#'
#' # Subset samples by Time_label and StudyIdentifier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this comment, say what it is the goal: get specified time points from certain bioreactor

#' \code{\link[SummarizedExperiment:SummarizedExperiment-class]{SummarizedExperiment}}
#' object with these results in its \code{metadata}.
#'
#' @details This approach is used by Wisnoski NI and colleagues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use math notation, it would be easier to read. Also the calculated values include other measurements than this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could describe them also

Comment on lines +11 to +24
# Should still return a dataframe
short_time_labels <- c("74.5h", "173h", "438h", "434h", "390h")
# Subset samples by Time_label and StudyIdentifier
tse_filtered <- tse[, !(tse$Time_label %in% short_time_labels)]
tse_filtered <- tse_filtered[, (tse_filtered$StudyIdentifier == "Bioreactor A")]

expect_true(all(!(tse_filtered$Time_label %in% short_time_labels)))

result <- getShortTermChange(tse_filtered, time.col = "Time.hr")
# Expected output is a dataframe
expect_true(is.data.frame(result))
expect_true("growth_diff" %in% colnames(result))
# Test some expected properties (e.g., that growth_diff isn't all NAs)
expect_false(all(is.na(result$growth_diff)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values should be also checked

Comment on lines +60 to +61
setMethod("getShortTermChange", signature = c(x = "SummarizedExperiment"),
function(x, assay.type = "counts", ...){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function should also take into account the grouping, i.e., from which patient/bioreactor the data is coming

Comment on lines +99 to +107
arrange( !!sym(time.col) ) %>%
group_by(Feature_ID) %>%
mutate(
time_lag = !!sym(time.col) - lag( !!sym(time.col) ),
growth_diff =!!sym(assay.type) - lag(!!sym(assay.type)),
growth_rate = (!!sym(assay.type) - lag(!!sym(assay.type))) / lag(!!sym(assay.type)),
var_abund = (!!sym(assay.type) - lag(!!sym(assay.type))) / time_lag
)
return(assay_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should take into account duplicated values. I tink you could give warning first and then try something like this

df <- meltSE(tse, add.col = TRUE)
df |>
    group_by(StudyIdentifier, FeatureID, Time.hr) |>
    mutate(mean_abundance_for_timepoint = mean(counts, na.rm = TRUE)) |>
    distinct(StudyIdentifier, FeatureID, Time.hr, .keep_all = TRUE) |>
    ungroup() |>
    arrange(StudyIdentifier, FeatureID, Time.hr) |>
    group_by(StudyIdentifier, FeatureID) |>
    mutate(growth_diff = mean_abundance_for_timepoint - lag(mean_abundance_for_timepoint)) |>
    ungroup()

Comment on lines +120 to +122
grs.all$Feature_IDabb <- toupper(abbreviate(grs.all$Feature_ID,
minlength = 3,
method = "both.sides"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feature names come directly from rownames()? Then I think they should be kept unchanged

########################### Growth Metrics ############################
grwt <- .calculate_growth_metrics(x, assay.type = assay.type, ...)
# Clean and format growth metrics
grs.all <- .clean_growth_metrics(grwt, ...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these are needed, as these are quite basic metric that can be easily calculated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants