Skip to content

Commit

Permalink
Survival learner doc updates + miscellaneous stuff (#385)
Browse files Browse the repository at this point in the history
* remove repeated section

* compact refs

* disable parallelization for aorsf (default behavior => use 1 core)

* add section prediction types

* crank first for surv_coxboost learners + times refactoring

* fix example bug

* fix misspell

* remove example (template is used)

* add prediction types section in flexsurv + minor doc refactoring

* doc refinements

* update doc for surv.glmnet learners

* refine Breslow doc

* update gamboost doc

* add min working versions for surv gbm, glmnet and mboost

* minor update on surv.gamboost doc + small refactoring

* refine surv.glmboost doc

* update RWeka doc links

* surv doc updates

* update doc on the rest of mboost surv learners

* update doc on ctree and cforest surv learners

* add minimum working version for partykit

* correct link

* update docs

* better doc for surv.penalized

* set trace = FALSE for surv.penalized

* change crank <=> distr order, fix small bug

* add selected_features() for surv.penalized

* update doc for surv.penalized

* refactor + add a new test for surv.penalized

* update doc for surv.penalized

* add doc about family and type.measure params

* correct prediction types, add breslow distr to surv.prioritylasso

* remove predict-tagged params that are not actually used by predict method

* add comment on cv.glmnet() params

* update doc on prioritylasso learners

* add encapsulate method in all learner docs

* correct doc for selected_features() method

* add example to surv.prioritylasso learner

* remove unused parameter: no handing of missing data

* explictly use coef method from penalized

* fix cv.glmnet() link

* minimum version for surv svm

* add predict type doc for ssvm

* refactor: ssvm

* split gamma.mu to two parameters in surv svm

* update ssvm tests

* update docs (mlr3 => CRAN version 0.20.2)

* improve doc for surv.cforest - set cores to 1 explicitly

* aorsf: use unique event times for S(t) during predict to be in line with the rest of RSFs

* add min working version for penalized

* set ntree to 500 as is the default of the package

* add min working version for ranger and rfsrc learners

* rfsrc survival learner updates

* change ntime and ntree defaults
* add doc

* change doc format

* doc update to mlr3 v0.21.0

* small fix in example template

* change order of doc sections

* add prediction types doc for surv.ranger

* correct prediction types doc

* refactor akritas: use unique train times in predict + ntime parameter + doc

* refactor parametric

* use unique train times in predict + ntime parameter
* refine doc
* change default value for discrete to TRUE

* update tests for akritas and parametric

* change file name as this learner is now called from survivalmodels

* fix styling issues

* rename test file

* add bagging_by_query param to lightgbm learners

* increase lightgbm min version

* fix example (importance required a parameter to be set in the learner)

* refine don for NA survival estimator

* refine styling of doc

* fix examples with learner importance

* add prediction types doc for DL survival models

* small fix in the example

* add min working vesion for xgboost

* rename doc section to prediction types

* correct another example

* fix yet another example with importance

* Update R/learner_aorsf_surv_aorsf.R

Co-authored-by: Sebastian Fischer <[email protected]>

* use @examplesIf where necessary

* refactor => gridify_times()

* refine doc: custom mlr3 => initial values across learners

* update news

---------

Co-authored-by: Sebastian Fischer <[email protected]>
  • Loading branch information
bblodfon and sebffischer authored Oct 18, 2024
1 parent a5d43c7 commit a622524
Show file tree
Hide file tree
Showing 184 changed files with 1,632 additions and 686 deletions.
20 changes: 10 additions & 10 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -54,20 +54,20 @@ Suggests:
FNN,
formattable,
future,
gbm,
glmnet,
gbm (>= 2.2.2),
glmnet (>= 4.1-6),
gss,
jsonlite,
keras (>= 2.3.0),
kernlab,
knitr,
ks,
LiblineaR,
lightgbm (>= 4.4.0),
lightgbm (>= 4.5.0),
lme4,
locfit,
logspline,
mboost,
mboost (>= 2.9-10),
mda,
mgcv,
mlr3cluster,
Expand All @@ -78,17 +78,17 @@ Suggests:
nnet,
np,
param6,
partykit,
penalized,
partykit (>= 1.2-21),
penalized (>= 0.9-52),
pendensity,
plugdensity,
pracma,
prioritylasso (>= 0.3.1),
pseudo,
randomForest,
randomPlantedForest,
randomForestSRC,
ranger,
randomForestSRC (>= 3.3.0),
ranger (>= 0.16.0),
remotes,
reticulate (>= 1.16),
rpart,
Expand All @@ -101,10 +101,10 @@ Suggests:
stats,
survival,
survivalmodels (>= 0.1.19),
survivalsvm,
survivalsvm (>= 0.0.5),
tensorflow (>= 2.0.0),
testthat,
xgboost
xgboost (>= 1.7.8.1)
Remotes:
binderh/CoxBoost,
catboost/catboost/catboost/R-package,
Expand Down
15 changes: 15 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# dev

* Add "Prediction types" doc section for all 30 survival learners + make sure it is consistent #347
* All survival learners have `crank` as main prediction type (and it is always returned) #331
* Added minimum working version for all survival learners in `DESCRIPTION` file
* Harmonized the use of times points for prediction as much as possible across survival learners #387
* added `gridify_times()` function to coarse time points
* fixed `surv.parametric` and `surv.akritas` use of `ntime` argument
* `surv.parametric` is now used by default with `discrete = TRUE` (no survival learner returns now `distr6` vectorized distribution by default)
* Doc update for `mlr3` (version `0.21.0`)
* Fixed custom and initial values across all learners documentation pages
* Fixed doc examples that used `learner$importance()`
* Set `n_thread = 1` for `surv.aorsf` and use unique event time points for predicted `S(t)`
* Add `selected_features()` for `surv.penalized`
* Fix `surv.prioritylasso` learner + add `distr` predictions via Breslow #344
* Survival SVM `gamma.mu` parameter was split to `gamma` and `mu` to enable easier tuning (`surv.svm` learner)

# mlr3extralearners 0.9.0

* Added response (i.e., survival time) prediction to `aorsf` learner
Expand Down
12 changes: 12 additions & 0 deletions R/helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,15 @@ rename = function(x, old, new) {
}
x
}

# coerce given times points to an `ntime` grid is `ntime` is not NULL,
# otherwise just returns the sorted unique times points
gridify_times = function(times, ntime) {
times = sort(unique(times))
if (!is.null(ntime)) {
indx = unique(round(seq.int(1, length(times), length.out = ntime)))
times = times[indx]
}

times
}
7 changes: 4 additions & 3 deletions R/learner_BART_surv_bart.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
#' Fits a Bayesian Additive Regression Trees (BART) learner to right-censored
#' survival data. Calls [BART::mc.surv.bart()] from \CRANpkg{BART}.
#'
#' @details
#' Two types of prediction are returned for this learner:
#' @section Prediction types:
#' This learner returns two prediction types:
#' 1. `distr`: a 3d survival array with observations as 1st dimension, time
#' points as 2nd and the posterior draws as 3rd dimension.
#' 2. `crank`: the expected mortality using [mlr3proba::.surv_return]. The parameter
#' Calculated using the internal `predict.survbart()` function.
#' 2. `crank`: the expected mortality using [mlr3proba::.surv_return()]. The parameter
#' `which.curve` decides which posterior draw (3rd dimension) will be used for the
#' calculation of the expected mortality. Note that the median posterior is
#' by default used for the calculation of survival measures that require a `distr`
Expand Down
28 changes: 14 additions & 14 deletions R/learner_CoxBoost_surv_coxboost.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,15 @@
#' Fit a Survival Cox model with a likelihood based boosting algorithm.
#' Calls [CoxBoost::CoxBoost()] from package 'CoxBoost'.
#'
#' @section Prediction types:
#' This learner returns three prediction types, using the internal `predict.CoxBoost()` function:
#' 1. `lp`: a vector containing the linear predictors (relative risk scores),
#' where each score corresponds to a specific test observation.
#' 2. `crank`: same as `lp`.
#' 3. `distr`: a 2d survival matrix, with observations as rows and time points
#' as columns. The internal transformation uses the Breslow estimator to compute
#' the baseline hazard and compose the survival distributions from the `lp` predictions.
#'
#' @template learner
#' @templateVar id surv.coxboost
#'
Expand All @@ -18,15 +27,6 @@
#' multiple hyperparameters, \CRANpkg{mlr3tuning} and [LearnerSurvCoxboost] will likely give better
#' results.
#'
#' Three prediction types are returned for this learner, using the internal
#' `predict.CoxBoost()` function:
#' 1. `lp`: a vector of linear predictors (relative risk scores), one per
#' observation.
#' 2. `crank`: same as `lp`.
#' 3. `distr`: a 2d survival matrix, with observations as rows and time points
#' as columns. The internal transformation uses the Breslow estimator to compose
#' the survival distributions from the `lp` predictions.
#'
#' @references
#' `r format_bib("binder2009boosting")`
#'
Expand Down Expand Up @@ -60,7 +60,7 @@ LearnerSurvCoxboost = R6Class("LearnerSurvCoxboost",
id = "surv.coxboost",
packages = c("mlr3extralearners", "CoxBoost", "pracma"),
feature_types = c("integer", "numeric"),
predict_types = c("distr", "crank", "lp"),
predict_types = c("crank", "lp", "distr"),
param_set = ps,
properties = c("weights", "selected_features"),
man = "mlr3extralearners::mlr_learners_surv.coxboost",
Expand Down Expand Up @@ -126,16 +126,16 @@ LearnerSurvCoxboost = R6Class("LearnerSurvCoxboost",
.args = pars,
type = "lp"))

# all the unique training time points
times = sort(unique(self$model$time))
surv = invoke(predict,
self$model,
newdata = newdata,
.args = pars,
type = "risk",
times = sort(unique(self$model$time)))
times = times)

mlr3proba::.surv_return(times = sort(unique(self$model$time)),
surv = surv,
lp = lp)
mlr3proba::.surv_return(times = times, surv = surv, lp = lp)
}
)
)
Expand Down
22 changes: 7 additions & 15 deletions R/learner_CoxBoost_surv_cv_coxboost.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
#' @name mlr_learners_surv.cv_coxboost
#'
#' @description
#' Fits a survival Cox model using likelihood based boosting and interal cross-validation for the
#' Fits a survival Cox model using likelihood based boosting and internal cross-validation for the
#' number of steps.
#' Calls [CoxBoost::CoxBoost()] or [CoxBoost::cv.CoxBoost()] from package 'CoxBoost'.
#'
#' @inheritSection mlr_learners_surv.coxboost Prediction types
#' @template learner
#' @templateVar id surv.cv_coxboost
#'
Expand All @@ -22,15 +23,6 @@
#' If `penalty == "optimCoxBoostPenalty"` then [CoxBoost::optimCoxBoostPenalty] is used to determine
#' the penalty value to be used in [CoxBoost::cv.CoxBoost].
#'
#' Three prediction types are returned for this learner, using the internal
#' `predict.CoxBoost()` function:
#' 1. `lp`: a vector of linear predictors (relative risk scores), one per
#' observation.
#' 2. `crank`: same as `lp`.
#' 2. `distr`: a 2d survival matrix, with observations as rows and time points
#' as columns. The internal transformation uses the Breslow estimator to compose
#' the survival distributions from the `lp` predictions.
#'
#' @references
#' `r format_bib("binder2009boosting")`
#'
Expand Down Expand Up @@ -77,7 +69,7 @@ LearnerSurvCVCoxboost = R6Class("LearnerSurvCVCoxboost",
id = "surv.cv_coxboost",
packages = c("mlr3extralearners", "CoxBoost", "pracma"),
feature_types = c("integer", "numeric"),
predict_types = c("distr", "crank", "lp"),
predict_types = c("crank", "lp", "distr"),
param_set = ps,
properties = c("weights", "selected_features"),
man = "mlr3extralearners::mlr_learners_surv.cv_coxboost",
Expand Down Expand Up @@ -189,16 +181,16 @@ LearnerSurvCVCoxboost = R6Class("LearnerSurvCVCoxboost",
.args = pars,
type = "lp"))

# all the unique training time points
times = sort(unique(self$model$time))
surv = invoke(predict,
self$model,
newdata = newdata,
.args = pars,
type = "risk",
times = sort(unique(self$model$time)))
times = times)

mlr3proba::.surv_return(times = sort(unique(self$model$time)),
surv = surv,
lp = lp)
mlr3proba::.surv_return(times = times, surv = surv, lp = lp)
}
)
)
Expand Down
2 changes: 1 addition & 1 deletion R/learner_RWeka_classif_LMT.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' @template learner
#' @templateVar id classif.LMT
#'
#' @section CUstom mlr3 parameters:
#' @section Custom mlr3 parameters:
#' - `output_debug_info`:
#' - original id: output-debug-info
#'
Expand Down
7 changes: 2 additions & 5 deletions R/learner_abess_classif_abess.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,10 @@
#' @template learner
#'
#' @section Initial parameter values:
#' * `num.threads`: This parameter is initialized to 1 (default is 0) to avoid conflicts with the mlr3 parallelization.
#'
#' @section Custom mlr3 parameters:
#' * `family` - Depending on the task type, if the parameter `family` is `NULL`, it is set to `"binomial"` for binary
#' - `num.threads`: This parameter is initialized to 1 (default is 0) to avoid conflicts with the mlr3 parallelization.
#' - `family`: Depends on the task type, if the parameter `family` is `NULL`, it is set to `"binomial"` for binary
#' classification tasks and to `"multinomial"` for multiclass classification problems.
#'
#'
#' @template seealso_learner
#' @template example
#' @export
Expand Down
20 changes: 12 additions & 8 deletions R/learner_aorsf_surv_aorsf.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@
#' principle deal with missing values, the behaviour has to be configured using
#' the parameter `na_action`.
#'
#' @details
#' @section Initial parameter values:
#' * `n_thread`: This parameter is initialized to 1 (default is 0) to avoid conflicts with the mlr3 parallelization.
#'
#' @section Prediction types:
#' This learner returns three prediction types:
#' 1. `distr`: a survival matrix in two dimensions, where observations are
#' represented in rows and (unique event) time points in columns.
#' Calculated using the internal `predict.ObliqueForest()` function.
#' 2. `response`: the restricted mean survival time of each test observation,
#' derived from the survival matrix prediction (`distr`).
#' 3. `crank`: the expected mortality using [mlr3proba::.surv_return].
#' 3. `crank`: the expected mortality using [mlr3proba::.surv_return()].
#'
#' @template learner
#' @templateVar id surv.aorsf
Expand All @@ -27,10 +31,7 @@
#' Note that `mtry` and `mtry_ratio` are mutually exclusive.
#'
#' @references
#' `r format_bib("jaeger_2019")`
#'
#' `r format_bib("jaeger_2022")`
#'
#' `r format_bib("jaeger_2019", "jaeger_2022")`
#'
#' @template seealso_learner
#' @template example
Expand All @@ -45,7 +46,7 @@ LearnerSurvAorsf = R6Class("LearnerSurvAorsf",
n_tree = p_int(default = 500L, lower = 1L, tags = "train"),
n_split = p_int(default = 5L, lower = 1L, tags = "train"),
n_retry = p_int(default = 3L, lower = 0L, tags = "train"),
n_thread = p_int(default = 0, lower = 0, tags = c("train", "predict")),
n_thread = p_int(default = 0, lower = 0, tags = c("train", "predict", "threads")),
pred_aggregate = p_lgl(default = TRUE, tags = "predict"),
pred_simplify = p_lgl(default = FALSE, tags = "predict"),
oobag = p_lgl(default = FALSE, tags = 'predict'),
Expand Down Expand Up @@ -81,6 +82,8 @@ LearnerSurvAorsf = R6Class("LearnerSurvAorsf",
verbose_progress = p_lgl(default = FALSE, tags = "train"),
na_action = p_fct(levels = c("fail", "omit", "impute_meanmode"), default = "fail", tags = "train"))

ps$values = list(n_thread = 1)

super$initialize(
id = "surv.aorsf",
packages = c("mlr3extralearners", "aorsf", "pracma"),
Expand Down Expand Up @@ -177,7 +180,8 @@ LearnerSurvAorsf = R6Class("LearnerSurvAorsf",
},
.predict = function(task) {
pv = self$param_set$get_values(tags = "predict")
utime = task$unique_event_times() # increasing by default
# estimate S(t) on the unique event times from the train set
utime = self$model$event_times
surv = mlr3misc::invoke(predict,
self$model,
new_data = ordered_features(task, self),
Expand Down
4 changes: 2 additions & 2 deletions R/learner_dbarts_regr_bart.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
#' @templateVar id regr.bart
#'
#' @section Custom mlr3 parameters:
#' * Parameter: offset
#' * Parameter: `offset`
#' * The parameter is removed, because only `dbarts::bart2` allows an offset during training,
#' and therefore the offset parameter in `dbarts:::predict.bart` is irrelevant for
#' `dbarts::dbart`.
#' * Parameter: nchain, combineChains, combinechains
#' * Parameter: `nchain`, `combineChains`, `combinechains`
#' * The parameters are removed as parallelization of multiple models is handled by future.
#'
#' @section Initial parameter values:
Expand Down
18 changes: 9 additions & 9 deletions R/learner_flexsurv_surv_flexible.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@
#' @template learner
#' @templateVar id surv.flexible
#'
#' @details
#' This learner returns two prediction types:
#' 1. `lp`: a vector of linear predictors (relative risk scores), for each test
#' observation.
#' @section Prediction types:
#' This learner returns three prediction types:
#' 1. `lp`: a vector containing the linear predictors (relative risk scores),
#' where each score corresponds to a specific test observation.
#' Calculated using [flexsurv::flexsurvspline()] and the estimated coefficients.
#' For fitted coefficients, \eqn{\beta = (\beta_0,...,\beta_P)},
#' and covariates \eqn{X^T = (X_0,...,X_P)^T}, where \eqn{X_0}{X0}
#' is a column of \eqn{1}s, the linear predictor (`lp`) is \eqn{lp = \beta X}.
#' For fitted coefficients, \eqn{\hat{\beta} = (\hat{\beta_0},...,\hat{\beta_P})},
#' and the test data covariates \eqn{X^T = (X_0,...,X_P)^T}, where \eqn{X_0}{X0}
#' is a column of \eqn{1}s, the linear predictor vector is \eqn{lp = \hat{\beta} X^T}.
#' 2. `distr`: a survival matrix in two dimensions, where observations are
#' represented in rows and time points in columns.
#' Calculated using `predict.flexsurvreg()`.
Expand Down Expand Up @@ -111,8 +111,8 @@ predict_flexsurvreg = function(object, task, learner, ...) {
}

X = stats::model.matrix(formulate(rhs = task$feature_names),
data = newdata,
xlev = task$levels())
data = newdata,
xlev = task$levels())

# collect the auxiliary arguments for the fitted object
args = object$aux
Expand Down
Loading

0 comments on commit a622524

Please sign in to comment.