diff --git a/SDeMo/previews/PR341/.documenter-siteinfo.json b/SDeMo/previews/PR341/.documenter-siteinfo.json index b3287a5f6..3472e01af 100644 --- a/SDeMo/previews/PR341/.documenter-siteinfo.json +++ b/SDeMo/previews/PR341/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-13T16:09:14","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-13T16:19:15","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/SDeMo/previews/PR341/explanations/index.html b/SDeMo/previews/PR341/explanations/index.html index c61e63e15..07411c694 100644 --- a/SDeMo/previews/PR341/explanations/index.html +++ b/SDeMo/previews/PR341/explanations/index.html @@ -1,2 +1,2 @@ -Explanations · SDeMo

Explanations

Shapley values

SDeMo.explainFunction
explain(model::AbstractSDM, j; observation = nothing, instances = nothing, samples = 100, kwargs..., )

Uses the MCMC approximation of Shapley values to provide explanations to specific predictions. The second argument j is the variable for which the explanation should be provided.

The observation keywords is a row in the instances dataset for which explanations must be provided. If instances is nothing, the explanations will be given on the training data.

All other keyword arguments are passed to predict.

source

Counterfactuals

SDeMo.counterfactualFunction
counterfactual(model::AbstractSDM, x::Vector{T}, yhat, λ; maxiter=100, minvar=5e-5, kwargs...) where {T <: Number}

Generates one counterfactual explanation given an input vector x, and a target rule to reach yhat. The learning rate is λ. The maximum number of iterations used in the Nelder-Mead algorithm is maxiter, and the variance improvement under which the model will stop is minvar. Other keywords are passed to predict.

source

Partial responses

SDeMo.partialresponseFunction
partialresponse(model::T, i::Integer, args...; inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i are set to their mean value. The inflated keywork, when set to true, will instead pick a random value within the range of the observations.

The different arguments that can follow the variable position are

  • nothing, where the unique values for the i-th variable are used (sorted)
  • a number, in which point that many evenly spaced points within the range of the variable are used
  • an array, in which case each value of this array is evaluated

All keyword arguments are passed to predict.

source
partialresponse(model::T, i::Integer, j::Integer, s::Tuple=(50, 50); inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i and j are set to their mean value.

This function will return a grid corresponding to evenly spaced values of i and j, the size of which is given by the last argument s (defaults to 50 × 50).

All keyword arguments are passed to predict.

source
+Explanations · SDeMo

Explanations

Shapley values

SDeMo.explainFunction
explain(model::AbstractSDM, j; observation = nothing, instances = nothing, samples = 100, kwargs..., )

Uses the MCMC approximation of Shapley values to provide explanations to specific predictions. The second argument j is the variable for which the explanation should be provided.

The observation keywords is a row in the instances dataset for which explanations must be provided. If instances is nothing, the explanations will be given on the training data.

All other keyword arguments are passed to predict.

source

Counterfactuals

SDeMo.counterfactualFunction
counterfactual(model::AbstractSDM, x::Vector{T}, yhat, λ; maxiter=100, minvar=5e-5, kwargs...) where {T <: Number}

Generates one counterfactual explanation given an input vector x, and a target rule to reach yhat. The learning rate is λ. The maximum number of iterations used in the Nelder-Mead algorithm is maxiter, and the variance improvement under which the model will stop is minvar. Other keywords are passed to predict.

source

Partial responses

SDeMo.partialresponseFunction
partialresponse(model::T, i::Integer, args...; inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i are set to their mean value. The inflated keywork, when set to true, will instead pick a random value within the range of the observations.

The different arguments that can follow the variable position are

  • nothing, where the unique values for the i-th variable are used (sorted)
  • a number, in which point that many evenly spaced points within the range of the variable are used
  • an array, in which case each value of this array is evaluated

All keyword arguments are passed to predict.

source
partialresponse(model::T, i::Integer, j::Integer, s::Tuple=(50, 50); inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i and j are set to their mean value.

This function will return a grid corresponding to evenly spaced values of i and j, the size of which is given by the last argument s (defaults to 50 × 50).

All keyword arguments are passed to predict.

source
diff --git a/SDeMo/previews/PR341/index.html b/SDeMo/previews/PR341/index.html index 9d1251191..48e7031cc 100644 --- a/SDeMo/previews/PR341/index.html +++ b/SDeMo/previews/PR341/index.html @@ -1,2 +1,2 @@ -SDeMo · SDeMo

SDeMo

SDeMo.accuracyFunction
accuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.accuracyMethod
accuracy(M::ConfusionMatrix)

Accuracy

$\frac{TP + TN}{TP + TN + FP + FN}$

source
SDeMo.backwardselection!Method
backwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)

Removes variables one at a time until the optimality measure stops increasing. Variables included in pool are not removed.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.backwardselection!Method
backwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)

Removes variables one at a time until the optimality measure stops increasing.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.balancedaccuracyFunction
balancedaccuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.ciMethod
ci(C::Vector{ConfusionMatrix}, f)

Applies f to all confusion matrices in the vector, and returns the 95% CI.

source
SDeMo.ciMethod
ci(C::Vector{ConfusionMatrix})

Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.

source
SDeMo.classifierMethod
classifier(model::Bagging)

Returns the classifier used by the model that is used as a template for the bagged model

source
SDeMo.coinflipMethod
coinflip(sdm::SDM)

Version of coinflip using the training labels for an SDM.

source
SDeMo.coinflipMethod
coinflip(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.

source
SDeMo.constantnegativeMethod
constantnegative(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.

source
SDeMo.constantpositiveMethod
constantpositive(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.

source
SDeMo.counterfactualMethod
counterfactual(model::AbstractSDM, x::Vector{T}, yhat, λ; maxiter=100, minvar=5e-5, kwargs...) where {T <: Number}

Generates one counterfactual explanation given an input vector x, and a target rule to reach yhat. The learning rate is λ. The maximum number of iterations used in the Nelder-Mead algorithm is maxiter, and the variance improvement under which the model will stop is minvar. Other keywords are passed to predict.

source
SDeMo.crossvalidateMethod
crossvalidate(sdm, folds; thr = nothing, kwargs...)

Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.

This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.

source
SDeMo.dorFunction
dor(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.dorMethod
dor(M::ConfusionMatrix)

Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.

source
SDeMo.explainMethod
explain(model::AbstractSDM, j; observation = nothing, instances = nothing, samples = 100, kwargs..., )

Uses the MCMC approximation of Shapley values to provide explanations to specific predictions. The second argument j is the variable for which the explanation should be provided.

The observation keywords is a row in the instances dataset for which explanations must be provided. If instances is nothing, the explanations will be given on the training data.

All other keyword arguments are passed to predict.

source
SDeMo.f1Function
f1(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.f1Method
f1(M::ConfusionMatrix)

F₁ score, defined as the harmonic mean between precision and recall:

$2\times\frac{PPV\times TPR}{PPV + TPR}$

This uses the more general fscore internally.

source
SDeMo.fdirFunction
fdir(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fdirMethod
fdir(M::ConfusionMatrix)

False discovery rate, 1 - ppv

source
SDeMo.featuresMethod
features(sdm::SDM, n)

Returns the n-th feature stored in the field X of the SDM.

source
SDeMo.featuresMethod
features(sdm::SDM)

Returns the features stored in the field X of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.

source
SDeMo.fnrFunction
fnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fnrMethod
fnr(M::ConfusionMatrix)

False-negative rate

$\frac{FN}{FN+TP}$

source
SDeMo.fomrFunction
fomr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.forwardselection!Method
forwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)

Adds variables one at a time until the optimality measure stops increasing. The variables in pool are added at the start.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.forwardselection!Method
forwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)

Adds variables one at a time until the optimality measure stops increasing.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.fprFunction
fpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fprMethod
fpr(M::ConfusionMatrix)

False-positive rate

$\frac{FP}{FP+TN}$

source
SDeMo.fscoreFunction
fscore(M::ConfusionMatrix, β=1.0)

Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:

$(1 + \beta^2)\times\frac{PPV\times TPR}{(\beta^2 \times PPV) + TPR}$

source
SDeMo.fscoreFunction
fscore(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.holdoutMethod
holdout(y, X; proportion = 0.2, permute = true)

Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.

This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].

source
SDeMo.holdoutMethod
holdout(sdm::Bagging)

Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.holdoutMethod
holdout(sdm::SDM)

Version of holdout using the instances and labels of an SDM.

source
SDeMo.instanceMethod
instance(sdm::SDM, n; strict=true)

Returns the n-th instance stored in the field X of the SDM. If the keyword argument strict is true, only the variables used for prediction are returned.

source
SDeMo.iqrFunction
iqr(x, m=0.25, M=0.75)

Returns the inter-quantile range, by default between 25% and 75% of observations.

source
SDeMo.kfoldMethod
kfold(y, X; k = 10, permute = true)

Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All kgroups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training). The groups are stratified (so that they have the same prevalence).

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
SDeMo.kfoldMethod
kfold(sdm::Bagging)

Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.kfoldMethod
kfold(sdm::SDM)

Version of kfold using the instances and labels of an SDM.

source
SDeMo.labelsMethod
labels(sdm::SDM)

Returns the labels stored in the field y of the SDM – note that this is not a copy of the labels, but the object itself.

source
SDeMo.leaveoneoutMethod
leaveoneout(y, X)

Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
SDeMo.leaveoneoutMethod
leaveoneout(sdm::Bagging)

Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.leaveoneoutMethod
leaveoneout(sdm::SDM)

Version of leaveoneout using the instances and labels of an SDM.

source
SDeMo.loadsdmMethod
loadsdm(file::String; kwargs...)

Loads a model to a JSON file. The keyword arguments are passed to train!. The model is trained in full upon loading.

source
SDeMo.markednessFunction
markedness(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.markednessMethod
markedness(M::ConfusionMatrix)

Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions

$PPV + NPV -1$

source
SDeMo.mccFunction
mcc(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.mccMethod
mcc(M::ConfusionMatrix)

Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.

source
SDeMo.montecarloMethod
montecarlo(y, X; n = 100, kwargs...)

Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
SDeMo.montecarloMethod
montecarlo(sdm::Bagging)

Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.montecarloMethod
montecarlo(sdm::SDM)

Version of montecarlo using the instances and labels of an SDM.

source
SDeMo.nlrFunction
nlr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.nlrMethod
nlr(M::ConfusionMatrix)

Negative likelihood ratio

$\frac{FNR}{TNR}$

source
SDeMo.noselection!Method
noselection!(model, folds; verbose::Bool = false, kwargs...)

Returns the model to the state where all variables are used.

All keyword arguments are passed to train!.

source
SDeMo.noskillMethod
noskill(sdm::SDM)

Version of noskill using the training labels for an SDM.

source
SDeMo.noskillMethod
noskill(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.

source
SDeMo.npvFunction
npv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.npvMethod
npv(M::ConfusionMatrix)

Negative predictive value

$\frac{TN}{TN+FN}$

source
SDeMo.outofbagMethod
outofbag(ensemble::Bagging; kwargs...)

This method returns the confusion matrix associated to the out of bag error, wherein the succes in predicting instance i is calculated on the basis of all models that have not been trained on i. The consensus of the different models is a simple majority rule.

The additional keywords arguments are passed to predict.

source
SDeMo.partialresponseMethod
partialresponse(model::T, i::Integer, j::Integer, s::Tuple=(50, 50); inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i and j are set to their mean value.

This function will return a grid corresponding to evenly spaced values of i and j, the size of which is given by the last argument s (defaults to 50 × 50).

All keyword arguments are passed to predict.

source
SDeMo.partialresponseMethod
partialresponse(model::T, i::Integer, args...; inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i are set to their mean value. The inflated keywork, when set to true, will instead pick a random value within the range of the observations.

The different arguments that can follow the variable position are

  • nothing, where the unique values for the i-th variable are used (sorted)
  • a number, in which point that many evenly spaced points within the range of the variable are used
  • an array, in which case each value of this array is evaluated

All keyword arguments are passed to predict.

source
SDeMo.plrFunction
plr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.plrMethod
plr(M::ConfusionMatrix)

Positive likelihood ratio

$\frac{TPR}{FPR}$

source
SDeMo.ppvFunction
ppv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.ppvMethod
ppv(M::ConfusionMatrix)

Positive predictive value

$\frac{TP}{TP+FP}$

source
SDeMo.precisionFunction
precision(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.precisionMethod
precision(M::ConfusionMatrix)

Alias for ppv, the positive predictive value

source
SDeMo.recallFunction
recall(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.recallMethod
recall(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source
SDeMo.reset!Function
reset!(sdm::SDM, thr=0.5)

Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.

source
SDeMo.sensitivityFunction
sensitivity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.specificityFunction
specificity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.stepwisevif!Function
stepwisevif!(model::SDM, limit, tr=:;kwargs...)

Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :) is the indices to use for the VIF calculation. All keyword arguments are passed to train!.

source
SDeMo.thresholdMethod
threshold(sdm::SDM)

This returns the value above which the score returned by the SDM is considered to be a presence.

source
SDeMo.tnrFunction
tnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.tnrMethod
tnr(M::ConfusionMatrix)

True-negative rate

$\frac{TN}{TN+FP}$

source
SDeMo.tprFunction
tpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.tprMethod
tpr(M::ConfusionMatrix)

True-positive rate

$\frac{TP}{TP+FN}$

source
SDeMo.train!Method
train!(ensemble::Bagging; kwargs...)

Trains all the model in an ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

source
SDeMo.train!Method
train!(ensemble::Ensemble; kwargs...)

Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

The keywod arguments are passed to train! and can include the training indices.

source
SDeMo.train!Method
train!(sdm::SDM; threshold=true, training=:, optimality=mcc)

This is the main training function to train a SDM.

The three keyword arguments are:

  • training: defaults to :, and is the range (or alternatively the indices) of the data that are used to train the model
  • threshold: defaults to true, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimal
  • optimality: defaults to mcc, and is the function applied to the confusion matrix to evaluate which value of the threshold is the best
  • absences: defaults to false, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set to true

Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold is true, the threshold is then optimized.

source
SDeMo.transformerMethod
transformer(model::Bagging)

Returns the transformer used by the model that is used as a template for the bagged model

source
SDeMo.trueskillFunction
trueskill(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.trueskillMethod
trueskill(M::ConfusionMatrix)

True skill statistic (a.k.a Youden's J, or informedness)

$TPR + TNR - 1$

source
SDeMo.variableimportanceMethod
variableimportance(model, folds; kwargs...)

Returns the importance of all variables in the model. The keywords are passed to variableimportance.

source
SDeMo.variableimportanceMethod
variableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)

Returns the importance of one variable in the model. The samples keyword fixes the number of bootstraps to run (defaults to 10, which is not enough!).

The keywords are passed to ConfusionMatrix.

source
SDeMo.variablesMethod
variables(sdm::SDM)

Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.

source
SDeMo.vifMethod
vif(::Matrix)

Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.

source
SDeMo.vifMethod
vif(::AbstractSDM, tr=:)

Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to : for all points). The VIF is calculated on the de-meaned predictors.

source
SDeMo.writesdmMethod
writesdm(file::String, model::SDM)

Writes a model to a JSON file. This method is very bare-bones, and only saves the structure of the model, as well as the data.

source
SDeMo.κFunction
κ(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Bagging; kwargs...)

Predicts the ensemble model for all training data.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Ensemble; kwargs...)

Predicts the heterogeneous ensemble model for all training data.

source
StatsAPI.predictMethod
StatsAPI.predict(sdm::SDM; kwargs...)

This method performs the prediction on the entire set of training data available for the training of an SDM.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Bagging, X; consensus = median, kwargs...)

Returns the prediction for the ensemble of models a dataset X. The function used to aggregate the outputs from different models is consensus (defaults to median). All other keyword arguments are passed to predict.

To get a direct estimate of the variability, the consensus function can be changed to iqr (inter-quantile range), or any measure of variance.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Ensemble, X; consensus = median, kwargs...)

Returns the prediction for the heterogeneous ensemble of models a dataset X. The function used to aggregate the outputs from different models is consensus (defaults to median). All other keyword arguments are passed to predict.

To get a direct estimate of the variability, the consensus function can be changed to iqr (inter-quantile range), or any measure of variance.

source
StatsAPI.predictMethod
StatsAPI.predict(sdm::SDM, X; threshold = true)

This is the main prediction function, and it takes as input an SDM and a matrix of features. The only keyword argument is threshold, which determines whether the prediction is returned raw or as a binary value (default is true).

source
SDeMo.AbstractEnsembleSDMType
AbstractEnsembleSDM

This abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging and Ensemble.

source
SDeMo.ClassifierType
Classifier

This abstract type covers all algorithms to convert transformed data into prediction.

source
SDeMo.ConfusionMatrixType
ConfusionMatrix{T <: Number}

A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(ensemble::Bagging; kwargs...)

Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict method.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(sdm::SDM; kwargs...)

Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict method.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(pred::Vector{Bool}, truth::Vector{Bool})

Given a vector of binary predictions and a vector of ground truths, returns the confusion matrix.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}, τ::T) where {T <: Number}

Given a vector of scores and a vector of ground truths, as well as a threshold, transforms the score into binary predictions and returns the confusion matrix.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}) where {T <: Number}

Given a vector of scores and a vector of truth, returns the confusion matrix under the assumption that the score are probabilities and that the threshold is one half.

source
SDeMo.DecisionTreeType
DecisionTree

The depth and number of nodes can be adjusted with maxnodes! and maxdepth!.

source
SDeMo.EnsembleType
Ensemble

An heterogeneous ensemble model is defined as a vector of SDMs.

source
SDeMo.MultivariateTransformType
MultivariateTransform{T} <: Transformer

T is a multivariate transformation, likely offered through the MultivariateStats package. The transformations currently supported are PCA, PPCA, KernelPCA, and Whitening, and they are documented through their type aliases (e.g. PCATransform).

source
SDeMo.NaiveBayesType
NaiveBayes

Naive Bayes Classifier

By default, upon training, the prior probability will be set to the prevalence of the training data.

source
SDeMo.PCATransformType
PCATransform

The PCA transform will project the model features, which also serves as a way to decrease the dimensionality of the problem. Note that this method will only use the training instances, and unless the absences=true keyword is used, only the present cases. This ensure that there is no data leak (neither validation data nor the data from the raster are used).

This is an alias for MultivariateTransform{PCA}.

source
SDeMo.RawDataType
RawData

A transformer that does nothing to the data. This is passing the raw data to the classifier, and can be a good first step for models that assume that the features are independent, or are not sensitive to the scale of the features.

source
SDeMo.SDMType
SDM

This type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).

In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.

source
SDeMo.TransformerType
Transformer

This abstract type covers all transformations that are applied to the data before fitting the classifier.

source
SDeMo.WhiteningTransformType
WhiteningTransform

The whitening transformation is a linear transformation of the input variables, after which the new variables have unit variance and no correlation. The input is transformed into white noise.

Because this transform will usually keep the first variable "as is", and then apply increasingly important perturbations on the subsequent variables, it is sensitive to the order in which variables are presented, and is less useful when applying tools for interpretation.

This is an alias for MultivariateTransform{Whitening}.

source
SDeMo.ZScoreType
ZScore

A transformer that scales and centers the data, using only the data that are avaiable to the model at training time.

For all variables in the SDM features (regardless of whether they are used), this transformer will store the observed mean and standard deviation. There is no correction on the sample size, because there is no reason to expect that the sample size will be the same for the training and prediction situation.

source
+SDeMo · SDeMo

SDeMo

SDeMo.accuracyFunction
accuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.accuracyMethod
accuracy(M::ConfusionMatrix)

Accuracy

$\frac{TP + TN}{TP + TN + FP + FN}$

source
SDeMo.backwardselection!Method
backwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)

Removes variables one at a time until the optimality measure stops increasing. Variables included in pool are not removed.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.backwardselection!Method
backwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)

Removes variables one at a time until the optimality measure stops increasing.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.balancedaccuracyFunction
balancedaccuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.ciMethod
ci(C::Vector{ConfusionMatrix}, f)

Applies f to all confusion matrices in the vector, and returns the 95% CI.

source
SDeMo.ciMethod
ci(C::Vector{ConfusionMatrix})

Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.

source
SDeMo.classifierMethod
classifier(model::Bagging)

Returns the classifier used by the model that is used as a template for the bagged model

source
SDeMo.coinflipMethod
coinflip(sdm::SDM)

Version of coinflip using the training labels for an SDM.

source
SDeMo.coinflipMethod
coinflip(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.

source
SDeMo.constantnegativeMethod
constantnegative(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.

source
SDeMo.constantpositiveMethod
constantpositive(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.

source
SDeMo.counterfactualMethod
counterfactual(model::AbstractSDM, x::Vector{T}, yhat, λ; maxiter=100, minvar=5e-5, kwargs...) where {T <: Number}

Generates one counterfactual explanation given an input vector x, and a target rule to reach yhat. The learning rate is λ. The maximum number of iterations used in the Nelder-Mead algorithm is maxiter, and the variance improvement under which the model will stop is minvar. Other keywords are passed to predict.

source
SDeMo.crossvalidateMethod
crossvalidate(sdm, folds; thr = nothing, kwargs...)

Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.

This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.

source
SDeMo.dorFunction
dor(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.dorMethod
dor(M::ConfusionMatrix)

Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.

source
SDeMo.explainMethod
explain(model::AbstractSDM, j; observation = nothing, instances = nothing, samples = 100, kwargs..., )

Uses the MCMC approximation of Shapley values to provide explanations to specific predictions. The second argument j is the variable for which the explanation should be provided.

The observation keywords is a row in the instances dataset for which explanations must be provided. If instances is nothing, the explanations will be given on the training data.

All other keyword arguments are passed to predict.

source
SDeMo.f1Function
f1(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.f1Method
f1(M::ConfusionMatrix)

F₁ score, defined as the harmonic mean between precision and recall:

$2\times\frac{PPV\times TPR}{PPV + TPR}$

This uses the more general fscore internally.

source
SDeMo.fdirFunction
fdir(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fdirMethod
fdir(M::ConfusionMatrix)

False discovery rate, 1 - ppv

source
SDeMo.featuresMethod
features(sdm::SDM, n)

Returns the n-th feature stored in the field X of the SDM.

source
SDeMo.featuresMethod
features(sdm::SDM)

Returns the features stored in the field X of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.

source
SDeMo.fnrFunction
fnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fnrMethod
fnr(M::ConfusionMatrix)

False-negative rate

$\frac{FN}{FN+TP}$

source
SDeMo.fomrFunction
fomr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.forwardselection!Method
forwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)

Adds variables one at a time until the optimality measure stops increasing. The variables in pool are added at the start.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.forwardselection!Method
forwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)

Adds variables one at a time until the optimality measure stops increasing.

All keyword arguments are passed to crossvalidate and train!.

source
SDeMo.fprFunction
fpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fprMethod
fpr(M::ConfusionMatrix)

False-positive rate

$\frac{FP}{FP+TN}$

source
SDeMo.fscoreFunction
fscore(M::ConfusionMatrix, β=1.0)

Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:

$(1 + \beta^2)\times\frac{PPV\times TPR}{(\beta^2 \times PPV) + TPR}$

source
SDeMo.fscoreFunction
fscore(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.holdoutMethod
holdout(y, X; proportion = 0.2, permute = true)

Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.

This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].

source
SDeMo.holdoutMethod
holdout(sdm::Bagging)

Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.holdoutMethod
holdout(sdm::SDM)

Version of holdout using the instances and labels of an SDM.

source
SDeMo.instanceMethod
instance(sdm::SDM, n; strict=true)

Returns the n-th instance stored in the field X of the SDM. If the keyword argument strict is true, only the variables used for prediction are returned.

source
SDeMo.iqrFunction
iqr(x, m=0.25, M=0.75)

Returns the inter-quantile range, by default between 25% and 75% of observations.

source
SDeMo.kfoldMethod
kfold(y, X; k = 10, permute = true)

Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All kgroups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training). The groups are stratified (so that they have the same prevalence).

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
SDeMo.kfoldMethod
kfold(sdm::Bagging)

Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.kfoldMethod
kfold(sdm::SDM)

Version of kfold using the instances and labels of an SDM.

source
SDeMo.labelsMethod
labels(sdm::SDM)

Returns the labels stored in the field y of the SDM – note that this is not a copy of the labels, but the object itself.

source
SDeMo.leaveoneoutMethod
leaveoneout(y, X)

Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
SDeMo.leaveoneoutMethod
leaveoneout(sdm::Bagging)

Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.leaveoneoutMethod
leaveoneout(sdm::SDM)

Version of leaveoneout using the instances and labels of an SDM.

source
SDeMo.loadsdmMethod
loadsdm(file::String; kwargs...)

Loads a model to a JSON file. The keyword arguments are passed to train!. The model is trained in full upon loading.

source
SDeMo.markednessFunction
markedness(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.markednessMethod
markedness(M::ConfusionMatrix)

Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions

$PPV + NPV -1$

source
SDeMo.mccFunction
mcc(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.mccMethod
mcc(M::ConfusionMatrix)

Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.

source
SDeMo.montecarloMethod
montecarlo(y, X; n = 100, kwargs...)

Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
SDeMo.montecarloMethod
montecarlo(sdm::Bagging)

Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.montecarloMethod
montecarlo(sdm::SDM)

Version of montecarlo using the instances and labels of an SDM.

source
SDeMo.nlrFunction
nlr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.nlrMethod
nlr(M::ConfusionMatrix)

Negative likelihood ratio

$\frac{FNR}{TNR}$

source
SDeMo.noselection!Method
noselection!(model, folds; verbose::Bool = false, kwargs...)

Returns the model to the state where all variables are used.

All keyword arguments are passed to train!.

source
SDeMo.noskillMethod
noskill(sdm::SDM)

Version of noskill using the training labels for an SDM.

source
SDeMo.noskillMethod
noskill(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.

source
SDeMo.npvFunction
npv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.npvMethod
npv(M::ConfusionMatrix)

Negative predictive value

$\frac{TN}{TN+FN}$

source
SDeMo.outofbagMethod
outofbag(ensemble::Bagging; kwargs...)

This method returns the confusion matrix associated to the out of bag error, wherein the succes in predicting instance i is calculated on the basis of all models that have not been trained on i. The consensus of the different models is a simple majority rule.

The additional keywords arguments are passed to predict.

source
SDeMo.partialresponseMethod
partialresponse(model::T, i::Integer, j::Integer, s::Tuple=(50, 50); inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i and j are set to their mean value.

This function will return a grid corresponding to evenly spaced values of i and j, the size of which is given by the last argument s (defaults to 50 × 50).

All keyword arguments are passed to predict.

source
SDeMo.partialresponseMethod
partialresponse(model::T, i::Integer, args...; inflated::Bool, kwargs...)

This method returns the partial response of applying the trained model to a simulated dataset where all variables except i are set to their mean value. The inflated keywork, when set to true, will instead pick a random value within the range of the observations.

The different arguments that can follow the variable position are

  • nothing, where the unique values for the i-th variable are used (sorted)
  • a number, in which point that many evenly spaced points within the range of the variable are used
  • an array, in which case each value of this array is evaluated

All keyword arguments are passed to predict.

source
SDeMo.plrFunction
plr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.plrMethod
plr(M::ConfusionMatrix)

Positive likelihood ratio

$\frac{TPR}{FPR}$

source
SDeMo.ppvFunction
ppv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.ppvMethod
ppv(M::ConfusionMatrix)

Positive predictive value

$\frac{TP}{TP+FP}$

source
SDeMo.precisionFunction
precision(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.precisionMethod
precision(M::ConfusionMatrix)

Alias for ppv, the positive predictive value

source
SDeMo.recallFunction
recall(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.recallMethod
recall(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source
SDeMo.reset!Function
reset!(sdm::SDM, thr=0.5)

Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.

source
SDeMo.sensitivityFunction
sensitivity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.specificityFunction
specificity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.stepwisevif!Function
stepwisevif!(model::SDM, limit, tr=:;kwargs...)

Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :) is the indices to use for the VIF calculation. All keyword arguments are passed to train!.

source
SDeMo.thresholdMethod
threshold(sdm::SDM)

This returns the value above which the score returned by the SDM is considered to be a presence.

source
SDeMo.tnrFunction
tnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.tnrMethod
tnr(M::ConfusionMatrix)

True-negative rate

$\frac{TN}{TN+FP}$

source
SDeMo.tprFunction
tpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.tprMethod
tpr(M::ConfusionMatrix)

True-positive rate

$\frac{TP}{TP+FN}$

source
SDeMo.train!Method
train!(ensemble::Bagging; kwargs...)

Trains all the model in an ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

source
SDeMo.train!Method
train!(ensemble::Ensemble; kwargs...)

Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

The keywod arguments are passed to train! and can include the training indices.

source
SDeMo.train!Method
train!(sdm::SDM; threshold=true, training=:, optimality=mcc)

This is the main training function to train a SDM.

The three keyword arguments are:

  • training: defaults to :, and is the range (or alternatively the indices) of the data that are used to train the model
  • threshold: defaults to true, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimal
  • optimality: defaults to mcc, and is the function applied to the confusion matrix to evaluate which value of the threshold is the best
  • absences: defaults to false, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set to true

Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold is true, the threshold is then optimized.

source
SDeMo.transformerMethod
transformer(model::Bagging)

Returns the transformer used by the model that is used as a template for the bagged model

source
SDeMo.trueskillFunction
trueskill(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.trueskillMethod
trueskill(M::ConfusionMatrix)

True skill statistic (a.k.a Youden's J, or informedness)

$TPR + TNR - 1$

source
SDeMo.variableimportanceMethod
variableimportance(model, folds; kwargs...)

Returns the importance of all variables in the model. The keywords are passed to variableimportance.

source
SDeMo.variableimportanceMethod
variableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)

Returns the importance of one variable in the model. The samples keyword fixes the number of bootstraps to run (defaults to 10, which is not enough!).

The keywords are passed to ConfusionMatrix.

source
SDeMo.variablesMethod
variables(sdm::SDM)

Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.

source
SDeMo.vifMethod
vif(::Matrix)

Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.

source
SDeMo.vifMethod
vif(::AbstractSDM, tr=:)

Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to : for all points). The VIF is calculated on the de-meaned predictors.

source
SDeMo.writesdmMethod
writesdm(file::String, model::SDM)

Writes a model to a JSON file. This method is very bare-bones, and only saves the structure of the model, as well as the data.

source
SDeMo.κFunction
κ(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Bagging; kwargs...)

Predicts the ensemble model for all training data.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Ensemble; kwargs...)

Predicts the heterogeneous ensemble model for all training data.

source
StatsAPI.predictMethod
StatsAPI.predict(sdm::SDM; kwargs...)

This method performs the prediction on the entire set of training data available for the training of an SDM.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Bagging, X; consensus = median, kwargs...)

Returns the prediction for the ensemble of models a dataset X. The function used to aggregate the outputs from different models is consensus (defaults to median). All other keyword arguments are passed to predict.

To get a direct estimate of the variability, the consensus function can be changed to iqr (inter-quantile range), or any measure of variance.

source
StatsAPI.predictMethod
StatsAPI.predict(ensemble::Ensemble, X; consensus = median, kwargs...)

Returns the prediction for the heterogeneous ensemble of models a dataset X. The function used to aggregate the outputs from different models is consensus (defaults to median). All other keyword arguments are passed to predict.

To get a direct estimate of the variability, the consensus function can be changed to iqr (inter-quantile range), or any measure of variance.

source
StatsAPI.predictMethod
StatsAPI.predict(sdm::SDM, X; threshold = true)

This is the main prediction function, and it takes as input an SDM and a matrix of features. The only keyword argument is threshold, which determines whether the prediction is returned raw or as a binary value (default is true).

source
SDeMo.AbstractEnsembleSDMType
AbstractEnsembleSDM

This abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging and Ensemble.

source
SDeMo.ClassifierType
Classifier

This abstract type covers all algorithms to convert transformed data into prediction.

source
SDeMo.ConfusionMatrixType
ConfusionMatrix{T <: Number}

A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(ensemble::Bagging; kwargs...)

Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict method.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(sdm::SDM; kwargs...)

Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict method.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(pred::Vector{Bool}, truth::Vector{Bool})

Given a vector of binary predictions and a vector of ground truths, returns the confusion matrix.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}, τ::T) where {T <: Number}

Given a vector of scores and a vector of ground truths, as well as a threshold, transforms the score into binary predictions and returns the confusion matrix.

source
SDeMo.ConfusionMatrixMethod
ConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}) where {T <: Number}

Given a vector of scores and a vector of truth, returns the confusion matrix under the assumption that the score are probabilities and that the threshold is one half.

source
SDeMo.DecisionTreeType
DecisionTree

The depth and number of nodes can be adjusted with maxnodes! and maxdepth!.

source
SDeMo.EnsembleType
Ensemble

An heterogeneous ensemble model is defined as a vector of SDMs.

source
SDeMo.MultivariateTransformType
MultivariateTransform{T} <: Transformer

T is a multivariate transformation, likely offered through the MultivariateStats package. The transformations currently supported are PCA, PPCA, KernelPCA, and Whitening, and they are documented through their type aliases (e.g. PCATransform).

source
SDeMo.NaiveBayesType
NaiveBayes

Naive Bayes Classifier

By default, upon training, the prior probability will be set to the prevalence of the training data.

source
SDeMo.PCATransformType
PCATransform

The PCA transform will project the model features, which also serves as a way to decrease the dimensionality of the problem. Note that this method will only use the training instances, and unless the absences=true keyword is used, only the present cases. This ensure that there is no data leak (neither validation data nor the data from the raster are used).

This is an alias for MultivariateTransform{PCA}.

source
SDeMo.RawDataType
RawData

A transformer that does nothing to the data. This is passing the raw data to the classifier, and can be a good first step for models that assume that the features are independent, or are not sensitive to the scale of the features.

source
SDeMo.SDMType
SDM

This type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).

In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.

source
SDeMo.TransformerType
Transformer

This abstract type covers all transformations that are applied to the data before fitting the classifier.

source
SDeMo.WhiteningTransformType
WhiteningTransform

The whitening transformation is a linear transformation of the input variables, after which the new variables have unit variance and no correlation. The input is transformed into white noise.

Because this transform will usually keep the first variable "as is", and then apply increasingly important perturbations on the subsequent variables, it is sensitive to the order in which variables are presented, and is less useful when applying tools for interpretation.

This is an alias for MultivariateTransform{Whitening}.

source
SDeMo.ZScoreType
ZScore

A transformer that scales and centers the data, using only the data that are avaiable to the model at training time.

For all variables in the SDM features (regardless of whether they are used), this transformer will store the observed mean and standard deviation. There is no correction on the sample size, because there is no reason to expect that the sample size will be the same for the training and prediction situation.

source