FunQuant: A R package to perform quantization in the context of rare events and time-consuming simulations
FunQuant
is a R package that has been specifically developed for carrying out quantization in the context of rare events. While several packages facilitate straightforward implementations of the Lloyd's algorithm, they lack the specific specification of any probabilistic factors, treating all data points equally in terms of weighting. Conversely, FunQuant
considers probabilistic weights based on the Importance Sampling formulation to handle the problem of rare event. To be more precise, when FunQuant
provides various approaches for implementing these estimators, depending on the sampling density
In addition, FunQuant
is designed to mitigate the computational burden associated with the evaluation of costly data. While users have the flexibility to use their own metamodels to generate additional data, FunQuant
offers several functions tailored specifically for spatial outputs such as maps. This metamodel relies on Functional Principal Component Analysis and Gaussian Processes, adapted with the rlibkriging
R package. FunQuant
assists users in the fine-tuning of its hyperparameters for a quantization task, by providing a set of relevant performance metrics.
FunQuant
can be installed from GitHub, for the very latest version:
# If not already installed, install package `remotes` with `install.packages("remotes")`
remotes::install_github("charliesire/FunQuant")
We consider
where
The density function of
The computer code
with
The density
We want to quantize
If the classical Lloyd's algorithm is run with a budget of
The FunQuant
package allows to adapt the sampling by introducing a random variable
A possible function
fX = function(x){
return(
dtruncnorm(x = x[1],mean = 0,sd = sd1,a=-1, b=1)*dtruncnorm(x = x[2],mean = 0,sd = sd2,a=-1, b=1))
}
g = function(x){
if(sum((x>-1)*(x<1))==2){return(1/4)}
else{return(0)}
}
sample_g = function(n){cbind(runif(n,-1,1), runif(n,-1,1))
}
inputs = sample_g(1000)
outputs = t(apply(inputs,1,Y))
density_ratio = compute_density_ratio(f = fX,
g = g,
inputs = inputs)
res_proto = find_prototypes(data = t(outputs),
nb_cells = 5,
multistart = 3,
density_ratio = density_ratio)
The figure below shows the sampled points
FunQuant
allows to estimate the standard deviations of the two coordinates of the estimators of the centroids for each Voronoi cell, highlighting the variance reduction obtained with the adapted sampling for the cells that do not contain
large_inputs = sample_fX(10^5)
large_outputs = apply(large_inputs,1, Y)
std_centroid_kmeans = std_centroid(
data = large_outputs,
prototypes_list = list(protos_kmeans),
cells = 1:5,
nv = 1000)
std_centroid_kmeans #the cells are ordered by the increasing coordinate x
#of their centroid
# std centroid returns a list of lists: for each tested set of prototypes
#(here only one set is tested), a list of the estimated standard deviations
#is provided, each element of this list is associated to a Voronoï cell
## [[1]]
## [[1]][[1]]
## [1] 0.0001193543 0.0001012730
##
## [[1]][[2]]
## [1] 0.04884616 0.07905258
##
## [[1]][[3]]
## [1] 0.03006552 0.02934998
##
## [[1]][[4]]
## [1] 0.03214239 0.02801202
##
## [[1]][[5]]
## [1] 0.06158175 0.12912278
large_inputs_is = sample_g(10^5)
large_outputs_is = apply(large_inputs_is,1, Y)
std_centroid_FunQuant = std_centroid(
data = large_outputs_is,
prototypes_list = list(protos_FunQuant),
cells = 1:5,
nv = 1000)
std_centroid_FunQuant #the cells are ordered by the increasing coordinate x
#of their centroid
## [[1]]
## [[1]][[1]]
## [1] 0.0002358303 0.0002390596
##
## [[1]][[2]]
## [1] 0.00901367 0.01033904
##
## [[1]][[3]]
## [1] 0.012857642 0.006439004
##
## [[1]][[4]]
## [1] 0.00726317 0.01139948
##
## [[1]][[5]]
## [1] 0.009168924 0.009620646
This example remains basic. Advanced computations of the centroids with tailored density functions FunQuant
was built to tackle industrial problems with large amounts of data, and comes with additional features such as the possibility to split the computations into different batches.