Skip to content

Latest commit

 

History

History
152 lines (108 loc) · 5.2 KB

README.md

File metadata and controls

152 lines (108 loc) · 5.2 KB

CircleCI Python License: LGPL v3

otHDRPlot

What is it?

This project implements the functional highest density region boxplot technique (Hyndman and Shang, 2009).

When you have functional data (i.e. a set of curves), you will want to answer some questions such as:

  • What is the mode curve?
  • Can I draw a confidence interval?
  • Or, is there any outlier curves?

This module allows you to do this:

import othdrplot
algo = othdrplot.ProcessHighDensityRegionAlgorithm(
    processSample, reducedComponents, reducedDistribution, [0.8, 0.5]
)
algo.run()
algo.drawOutlierTrajectories()
algo.draw()

The output is the following figure:

npfda-elnino

In the situation where a multivariate sample is given, the HighDensityRegionAlgorithm allows to plot the regions where the density is associated with a given fraction of the population.

import openturns
# Estimate the distribution
myks = ot.KernelSmoothing()
distribution = myks.build(sample)
# Create the HDR algorithm
algo = othdrplot.HighDensityRegionAlgorithm(sample, distribution)
algo.run()
algo.draw()

The output is the following figure:

gauss-mixture-OutlierPlot

How to install?

Requirements

The dependencies are:

Installation

Using the latest python version is prefered!

To install from pip:

pip install othdrplot

To install from github:

git clone [email protected]:mbaudin47/othdrplot.git
cd othdrplot
python setup.py install

Documentation

A short introduction to the algorithm is provided in the Introduction to high density region plots.

Examples

Several examples are available in the doc directory.

References

  • Rob J Hyndman and Han Lin Shang. Rainbow plots , bagplots and boxplots for functional data. Journal of Computational and Graphical Statistics, 19:29-45, 2009

Algorithms

Three classes are provided:

  • HighDensityRegionAlgorithm : An algorithm to draw the density of a multivariate sample.
  • ProcessHighDensityRegionAlgorithm : An algorithm to compute and draw the density of a multivariate process sample.
  • KarhunenLoeveDimensionReductionAlgorithm : Simplifies the dimension reduction with Karhunen-Loève decomposition.

The HighDensityRegionAlgorithm class

This is an algorithm to draw the density of a multivariate sample.

  • Compute the minimum levelset associated with the sample.
  • Plots the required minimum level sets and the outliers.
  • Compute and draw the inliers and the outliers, based on the MatrixPlot.
  • The main ingredient is distribution of the sample, which is required.

The basic method to estimate this distribution is kernel smoothing, but any other method can be used, such as a gaussian mixture for example.

The ProcessHighDensityRegionAlgorithm class

This is an algorithm to draw the density of a process sample.

  • Plots the trajectories in the physical space.
  • Plots the projection of the trajectories in the reduced space, based on the HighDensityRegionAlgorithm.
  • The main ingredients are the dimension reduction method and the method to estimate the density in the reduced space.

In the current implementation, the dimension reduction can be provided on the Karhunen-Loeve decomposition (but other methods can be used). The method to estimate the density in the reduced space can be the kernel smoothing estimator or any other density estimation method (e.g. a Gaussian mixture).