Skip to content

Commit

Permalink
Release v.1.2.0
Browse files Browse the repository at this point in the history
SurvSHAP(t) calculation using {treeshap} and fixes for SurvLIME
  • Loading branch information
mikolajsp authored Oct 24, 2023
2 parents d6ee933 + 06bf634 commit 371fda2
Show file tree
Hide file tree
Showing 29 changed files with 286 additions and 114 deletions.
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
Package: survex
Title: Explainable Machine Learning in Survival Analysis
Version: 1.1.3.9000
Version: 1.2.0
Authors@R:
c(
person("Mikołaj", "Spytek", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-7111-2286")),
person("Mateusz", "Krzyziński", role = c("aut"), comment = c(ORCID = "0000-0001-6143-488X")),
person("Sophie", "Langbein", role = c("aut")),
person("Hubert", "Baniecki", role = c("aut"), comment = c(ORCID = "0000-0001-6661-5364")),
person("Lorenz A.", "Kapsner", role = c("ctb"), comment = c(ORCID = "0000-0003-1866-860X")),
person("Przemyslaw", "Biecek", role = c("aut"), comment = c(ORCID = "0000-0001-8423-1823"))
)
Description: Survival analysis models are commonly used in medicine and other areas. Many of them
Expand Down Expand Up @@ -46,6 +47,7 @@ Suggests:
rmarkdown,
rms,
testthat (>= 3.0.0),
treeshap (>= 0.3.0),
withr,
xgboost
Config/testthat/edition: 3
Expand Down
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# survex (development version)
# survex 1.2.0
* added new `calculation_method` for `surv_shap()` called `"treeshap"` that uses the `treeshap` package ([#75](https://github.com/ModelOriented/survex/issues/75))
* enable to calculate SurvSHAP(t) explanations based on subsample of the explainer's data
* changed default kernel width in SurvLIME from sqrt(p * 0.75) to sqrt(p) * 0.75
* fixed error in SurvLIME when non-factor `categorical_variables` were provided

# survex 1.1.3

Expand Down
36 changes: 18 additions & 18 deletions R/metrics.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ utils::globalVariables(c("PredictionSurv"))
#' @return a function that can be used to calculate metrics (with parameters `y_true`, `risk`, `surv`, and `times`)
#'
#' @section References:
#' - \[1\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#' - \[1\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#'
#' @export
loss_integrate <- function(loss_function, ..., normalization = NULL, max_quantile = 1) {
Expand Down Expand Up @@ -57,7 +57,7 @@ loss_integrate <- function(loss_function, ..., normalization = NULL, max_quantil
#' @return numeric from 0 to 1, higher values indicate better performance
#'
#' @section References:
#' - \[1\] Harrell, F.E., Jr., et al. ["Regression modelling strategies for improved prognostic prediction."](https://onlinelibrary.wiley.com/doi/10.1002/sim.4780030207) Statistics in Medicine 3.2 (1984): 143-152.
#' - \[1\] Harrell, F.E., Jr., et al. "Regression modelling strategies for improved prognostic prediction." Statistics in Medicine 3.2 (1984): 143-152.
#'
#' @rdname c_index
#' @seealso [loss_one_minus_c_index()]
Expand Down Expand Up @@ -109,7 +109,7 @@ attr(c_index, "loss_type") <- "risk-based"
#' @return numeric from 0 to 1, lower values indicate better performance
#'
#' @section References:
#' - \[1\] Harrell, F.E., Jr., et al. ["Regression modelling strategies for improved prognostic prediction."](https://onlinelibrary.wiley.com/doi/10.1002/sim.4780030207) Statistics in Medicine 3.2 (1984): 143-152.
#' - \[1\] Harrell, F.E., Jr., et al. "Regression modelling strategies for improved prognostic prediction." Statistics in Medicine 3.2 (1984): 143-152.
#'
#' @rdname loss_one_minus_c_index
#' @seealso [c_index()]
Expand Down Expand Up @@ -152,8 +152,8 @@ attr(loss_one_minus_c_index, "loss_type") <- "risk-based"
#' @return numeric from 0 to 1, lower scores are better (Brier score of 0.25 represents a model which returns always returns 0.5 as the predicted survival function)
#'
#' @section References:
#' - \[1\] Brier, Glenn W. ["Verification of forecasts expressed in terms of probability."](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml) Monthly Weather Review 78.1 (1950): 1-3.
#' - \[2\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19990915/30)18:17/18%3C2529::AID-SIM274%3E3.0.CO;2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#' - \[1\] Brier, Glenn W. "Verification of forecasts expressed in terms of probability." Monthly Weather Review 78.1 (1950): 1-3.
#' - \[2\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#'
#' @rdname brier_score
#' @seealso [cd_auc()]
Expand Down Expand Up @@ -217,8 +217,8 @@ attr(loss_brier_score, "loss_type") <- "time-dependent"
#' Calculate Cumulative/Dynamic AUC
#'
#' This function calculates the Cumulative/Dynamic AUC metric for a survival model. It is done using the
#' estimator proposed proposed by Uno et al. \[[1](https://www.jstor.org/stable/27639883)\],
#' and Hung and Chang \[[2](https://www.jstor.org/stable/41000414)\].
#' estimator proposed proposed by Uno et al. \[1\],
#' and Hung and Chang \[2\].
#'
#' C/D AUC is an extension of the AUC metric known from classification models.
#' Its values represent the model's performance at specific time points.
Expand All @@ -232,8 +232,8 @@ attr(loss_brier_score, "loss_type") <- "time-dependent"
#' @return a numeric vector of length equal to the length of the times vector, each value (from the range from 0 to 1) represents the AUC metric at a specific time point, with higher values indicating better performance.
#'
#' @section References:
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#'
#' @rdname cd_auc
#' @seealso [loss_one_minus_cd_auc()] [integrated_cd_auc()] [brier_score()]
Expand Down Expand Up @@ -297,8 +297,8 @@ attr(cd_auc, "loss_type") <- "time-dependent"
#' @return a numeric vector of length equal to the length of the times vector, each value (from the range from 0 to 1) represents 1 - AUC metric at a specific time point, with lower values indicating better performance.
#'
#' #' @section References:
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#'
#' @rdname loss_one_minus_cd_auc
#' @seealso [cd_auc()]
Expand Down Expand Up @@ -337,8 +337,8 @@ attr(loss_one_minus_cd_auc, "loss_type") <- "time-dependent"
#' @return numeric from 0 to 1, higher values indicate better performance
#'
#' #' @section References:
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#'
#' @rdname integrated_cd_auc
#' @seealso [cd_auc()] [loss_one_minus_cd_auc()]
Expand Down Expand Up @@ -373,8 +373,8 @@ attr(integrated_cd_auc, "loss_type") <- "integrated"
#' @return numeric from 0 to 1, lower values indicate better performance
#'
#' #' @section References:
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
#'
#' @rdname loss_one_minus_integrated_cd_auc
#' @seealso [integrated_cd_auc()] [cd_auc()] [loss_one_minus_cd_auc()]
Expand Down Expand Up @@ -417,8 +417,8 @@ attr(loss_one_minus_integrated_cd_auc, "loss_type") <- "integrated"
#' @return numeric from 0 to 1, lower values indicate better performance
#'
#' @section References:
#' - \[1\] Brier, Glenn W. ["Verification of forecasts expressed in terms of probability."](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml) Monthly Weather Review 78.1 (1950): 1-3.
#' - \[2\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19990915/30)18:17/18%3C2529::AID-SIM274%3E3.0.CO;2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#' - \[1\] Brier, Glenn W. "Verification of forecasts expressed in terms of probability." Monthly Weather Review 78.1 (1950): 1-3.
#' - \[2\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#'
#' @rdname integrated_brier_score
#' @seealso [brier_score()] [integrated_cd_auc()] [loss_one_minus_integrated_cd_auc()]
Expand Down Expand Up @@ -458,6 +458,7 @@ attr(loss_integrated_brier_score, "loss_type") <- "integrated"
#'
#' @return a function with standardized parameters (`y_true`, `risk`, `surv`, `times`) that can be used to calculate loss
#'
#' @examples
#' if(FALSE){
#' measure <- msr("surv.calib_beta")
#' mlr_measure <- loss_adapt_mlr3proba(measure)
Expand All @@ -483,7 +484,6 @@ loss_adapt_mlr3proba <- function(measure, reverse = FALSE, ...) {

return(output)
}

if (reverse) {
attr(loss_function, "loss_name") <- paste("one minus", measure$id)
} else {
Expand Down
14 changes: 7 additions & 7 deletions R/model_performance.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@
#' @param times a numeric vector of times. If `type == "metrics"` then the survival function is evaluated at these times, if `type == "roc"` then the ROC curves are calculated at these times.
#'
#' @return An object of class `"model_performance_survival"`. It's a list of metric values calculated for the model. It contains:
#' - Harrell's concordance index \[[1](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780030207)\]
#' - Brier score \[[2](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml), [3](https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5)\]
#' - C/D AUC using the estimator proposed by Uno et. al \[[4](https://www.jstor.org/stable/27639883#metadata_info_tab_contents)\]
#' - Harrell's concordance index \[1\]
#' - Brier score \[2, 3\]
#' - C/D AUC using the estimator proposed by Uno et. al \[4\]
#' - integral of the Brier score
#' - integral of the C/D AUC
#'
#' @section References:
#' - \[1\] Harrell, F.E., Jr., et al. ["Regression modelling strategies for improved prognostic prediction."](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780030207) Statistics in Medicine 3.2 (1984): 143-152.
#' - \[2\] Brier, Glenn W. ["Verification of forecasts expressed in terms of probability."](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml) Monthly Weather Review 78.1 (1950): 1-3.
#' - \[3\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#' - \[4\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883#metadata_info_tab_contents) Journal of the American Statistical Association 102.478 (2007): 527-537.
#' - \[1\] Harrell, F.E., Jr., et al. "Regression modelling strategies for improved prognostic prediction." Statistics in Medicine 3.2 (1984): 143-152.
#' - \[2\] Brier, Glenn W. "Verification of forecasts expressed in terms of probability." Monthly Weather Review 78.1 (1950): 1-3.
#' - \[3\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
#' - \[4\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
#'
#' @examples
#' \donttest{
Expand Down
5 changes: 4 additions & 1 deletion R/model_survshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ model_survshap <- function(explainer, ...) {
model_survshap.surv_explainer <- function(explainer,
new_observation = NULL,
y_true = NULL,
N = NULL,
calculation_method = "kernelshap",
aggregation_method = "integral",
output_type = "survival",
Expand Down Expand Up @@ -98,9 +99,11 @@ model_survshap.surv_explainer <- function(explainer,
explainer = explainer,
new_observation = observations,
output_type = output_type,
N = N,
y_true = y_true,
calculation_method = calculation_method,
aggregation_method = aggregation_method
aggregation_method = aggregation_method,
...
)

attr(shap_values, "label") <- explainer$label
Expand Down
2 changes: 1 addition & 1 deletion R/plot_model_profile_survival.R
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ plot2_mp <- function(x,
if (!is.null(subtitle) && subtitle == "default") {
subtitle <- paste0("created for the ", unique(variable), " variable")
if (single_timepoint && !marginalize_over_time) {
subtitle <- paste0(subtitle, " and time =", times)
subtitle <- paste0(subtitle, " and time = ", times)
}
}

Expand Down
2 changes: 1 addition & 1 deletion R/plot_predict_profile_survival.R
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ plot2_cp <- function(x,
if (!is.null(subtitle) && subtitle == "default") {
subtitle <- paste0("created for the ", unique(variable), " variable")
if (single_timepoint && !marginalize_over_time) {
subtitle <- paste0(subtitle, " and time =", times)
subtitle <- paste0(subtitle, " and time = ", times)
}
}

Expand Down
7 changes: 4 additions & 3 deletions R/plot_surv_shap.R
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,11 @@ plot.surv_shap <- function(x,
#' * `color_variable` - variable used to denote the color, by default equal to `variable`
#'
#'
#'#' ## `plot.aggregated_surv_shap(geom = "curves")`
#' ## `plot.aggregated_surv_shap(geom = "curves")`
#'
#' * `variable` - variable for which SurvSHAP(t) curves are to be plotted, by default first from result data
#' * `boxplot` - whether to plot functional boxplot with marked outliers or all curves colored by variable value
#'
#' * `coef` - length of the functional boxplot's whiskers as multiple of IQR, by default 1.5
#'
#' @examples
#' \donttest{
Expand Down Expand Up @@ -293,7 +293,7 @@ plot_shap_global_beeswarm <- function(x,
max_vars = 7,
colors = NULL) {
df <- as.data.frame(do.call(rbind, x$aggregate))
cols <- names(sort(colMeans(abs(df))))[1:min(max_vars, length(df))]
cols <- names(sort(colMeans(abs(df)), decreasing = TRUE))[1:min(max_vars, length(df))]
df <- df[, cols]
df <- stack(df)
colnames(df) <- c("shap_value", "variable")
Expand Down Expand Up @@ -325,6 +325,7 @@ plot_shap_global_beeswarm <- function(x,
ggplot(data = df, aes(x = shap_value, y = variable, color = var_value)) +
geom_vline(xintercept = 0, color = "#ceced9", linetype = "solid") +
geom_jitter(width = 0, height = 0.15) +
scale_y_discrete(limits=rev) +
scale_color_gradient2(
name = "Variable value",
low = colors[1],
Expand Down
Loading

0 comments on commit 371fda2

Please sign in to comment.