diff --git a/dev/benchmarks/index.html b/dev/benchmarks/index.html index 3ce1196..c01d447 100644 --- a/dev/benchmarks/index.html +++ b/dev/benchmarks/index.html @@ -1,2 +1,2 @@ -Benchmarks · ExpectationMaximization.jl

Benchmarks

I was inspired by this benchmark. I am not too sure how to do 100% fair comparisons across languages[1]. There is a small overhead for using PythonCall.jl and RCall.jl. I checked that it was small in my experimentation (~ few milliseconds?). Here is the Jupyter notebook of the benchmark.

I test only the Gaussian Mixture case, since it is the most common type of mixture (remember that this package allows plenty of other mixtures).

In the code, I did not use (too much) fancy programming tricks, the speed only comes mostly from Julia usual performance tips:

  • E-step: Pre-allocating memory, using @views, type-stable code (could be improved here) + the package LogExpFunctions.jl for logsumexp! function.
  • M-step: fit_mle for each distribution's coming from Distributions.jl package. In principle, this should be quite fast. For example, look at the Multivariate Normal code.

Univariate Gaussian mixture with 2 components

I compare with Sklearn.py[2], mixtool.R, mixem.py[3]. I wanted to try mclust, but I did not manage to specify initial conditions

Overall, mixtool.R and mixem.py were constructed in a similar spirit as this package, making them easy to use for me. Sklearn.py is built to match the Sklearn format (all in one). GaussianMixturesModel.jl is built with a similar vibe.

If you have comments to improve these benchmarks, they are welcome.

You can find the benchmark code here.

timing_K_2 or the ratio view timing_K_2_ratio Conclusion: for Gaussian mixtures, ExpectationMaximization.jl is about 4 times faster than Python Sklearn and 7 times faster than R mixtools implementations and slightly slower than the specialized Julia package GaussianMixturesModel.jl.

I did compare with R microbenchmark and Python timeit and they produced very similar timing but in my experience BenchmarkTools.jl is smarter and simpler to use, i.e. it will figure out alone the number of repetition to do in function of the run.

Last, the step-by-step likelihood of Sklearn is not the same as outputted by ExpectationMaximization.jl and mixtool.R (both agree), so I am a bit suspicious.

  • 1Note that @btime with RCall.jl and PythonCall.jl might produce a small-time overhead compared to the pure R/Python time; see here for example.
  • 2There is a suspect trigger warning regarding K-means which I do not want to use here. I asked a question here. It led to this issue and that PR. It turns out that even if initial conditions were provided, the K-mean was still computed. However, to this day (23-11-29) with scikit-learn 1.3.2 it still gets the warning. Maybe it will be in the next release? I also noted this recent PR.
  • 3It overflows very quickly for $n>500$ or so. I think it is because of the implementation of logsumexp. So I eventually did not include the result in the benchmark.
+Benchmarks · ExpectationMaximization.jl

Benchmarks

I was inspired by this benchmark. I am not too sure how to do 100% fair comparisons across languages[1]. There is a small overhead for using PythonCall.jl and RCall.jl. I checked that it was small in my experimentation (~ few milliseconds?). Here is the Jupyter notebook of the benchmark.

I test only the Gaussian Mixture case, since it is the most common type of mixture (remember that this package allows plenty of other mixtures).

In the code, I did not use (too much) fancy programming tricks, the speed only comes mostly from Julia usual performance tips:

  • E-step: Pre-allocating memory, using @views, type-stable code (could be improved here) + the package LogExpFunctions.jl for logsumexp! function.
  • M-step: fit_mle for each distribution's coming from Distributions.jl package. In principle, this should be quite fast. For example, look at the Multivariate Normal code.

Univariate Gaussian mixture with 2 components

I compare with Sklearn.py[2], mixtool.R, mixem.py[3]. I wanted to try mclust, but I did not manage to specify initial conditions

Overall, mixtool.R and mixem.py were constructed in a similar spirit as this package, making them easy to use for me. Sklearn.py is built to match the Sklearn format (all in one). GaussianMixturesModel.jl is built with a similar vibe.

If you have comments to improve these benchmarks, they are welcome.

You can find the benchmark code here.

timing_K_2 or the ratio view timing_K_2_ratio Conclusion: for Gaussian mixtures, ExpectationMaximization.jl is about 4 times faster than Python Sklearn and 7 times faster than R mixtools implementations and slightly slower than the specialized Julia package GaussianMixturesModel.jl.

I did compare with R microbenchmark and Python timeit and they produced very similar timing but in my experience BenchmarkTools.jl is smarter and simpler to use, i.e. it will figure out alone the number of repetition to do in function of the run.

Last, the step-by-step likelihood of Sklearn is not the same as outputted by ExpectationMaximization.jl and mixtool.R (both agree), so I am a bit suspicious.

  • 1Note that @btime with RCall.jl and PythonCall.jl might produce a small-time overhead compared to the pure R/Python time; see here for example.
  • 2There is a suspect trigger warning regarding K-means which I do not want to use here. I asked a question here. It led to this issue and that PR. It turns out that even if initial conditions were provided, the K-mean was still computed. However, to this day (23-11-29) with scikit-learn 1.3.2 it still gets the warning. Maybe it will be in the next release? I also noted this recent PR.
  • 3It overflows very quickly for $n>500$ or so. I think it is because of the implementation of logsumexp. So I eventually did not include the result in the benchmark.
diff --git a/dev/biblio/index.html b/dev/biblio/index.html index 95f4fbb..c81a5f6 100644 --- a/dev/biblio/index.html +++ b/dev/biblio/index.html @@ -1,2 +1,2 @@ -Bibliography · ExpectationMaximization.jl

Bibliography

Theory

The EM algorithm was introduced by A. P. Dempster, N. M. Laird and D. B. Rubin in 1977 in the reference paper Maximum Likelihood from Incomplete Data Via the EM Algorithm. This is a very generic algorithm, working for almost any distributions. I also added the stochastic version introduced by G. Celeux, and J. Diebolt. in 1985 in The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Other versions can be added PR are welcomed.

Implementations

Despite being generic, to my knowledge, almost all coding implementations are specific to some mixtures class (mostly Gaussian mixtures, sometime double exponential or Bernoulli mixtures).

In this package, thanks to Julia generic code spirit, one can just code the algorithm, and it works for all distributions.

I know of the Python mixem package doing also using a generic algorithm implementation. However, the available distribution choice is very limited as the authors have to define each distribution (Top-Down approach). This package does not define distribution[1], it simply uses the Distribution type and what is in Distributions.jl.

In Julia, there is the GaussianMixtures.jl package that also does EM. It seems a little faster than my implementation when used with Gaussian mixtures (I'd like to understand what is creating this difference, though, maybe the in-place allocation while fit_mle creates copy). However, I am not sure if this is maintained anymore.

Have a look at the benchmark section for some comparisons.

I was inspired by Florian Oswald page and Maxime Mouchet HMMBase.jl package.

  • 1I added fit_mle methods for Product distributions, weighted Laplace and Dirac. I am doing PR to merge that directly into the Distributions.jl package.
+Bibliography · ExpectationMaximization.jl

Bibliography

Theory

The EM algorithm was introduced by A. P. Dempster, N. M. Laird and D. B. Rubin in 1977 in the reference paper Maximum Likelihood from Incomplete Data Via the EM Algorithm. This is a very generic algorithm, working for almost any distributions. I also added the stochastic version introduced by G. Celeux, and J. Diebolt. in 1985 in The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Other versions can be added PR are welcomed.

Implementations

Despite being generic, to my knowledge, almost all coding implementations are specific to some mixtures class (mostly Gaussian mixtures, sometime double exponential or Bernoulli mixtures).

In this package, thanks to Julia generic code spirit, one can just code the algorithm, and it works for all distributions.

I know of the Python mixem package doing also using a generic algorithm implementation. However, the available distribution choice is very limited as the authors have to define each distribution (Top-Down approach). This package does not define distribution[1], it simply uses the Distribution type and what is in Distributions.jl.

In Julia, there is the GaussianMixtures.jl package that also does EM. It seems a little faster than my implementation when used with Gaussian mixtures (I'd like to understand what is creating this difference, though, maybe the in-place allocation while fit_mle creates copy). However, I am not sure if this is maintained anymore.

Have a look at the benchmark section for some comparisons.

I was inspired by Florian Oswald page and Maxime Mouchet HMMBase.jl package.

  • 1I added fit_mle methods for Product distributions, weighted Laplace and Dirac. I am doing PR to merge that directly into the Distributions.jl package.
diff --git a/dev/examples/index.html b/dev/examples/index.html index dc8d8f6..34eec10 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -93,47 +93,47 @@ ylabel!("Log PDF", yaxis = :log10) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + +
# Sampling
 y = rand(mix_true, N)
 
@@ -158,87 +158,87 @@
 plot(pmix, ploss)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + +

Mixture of Mixture and univariate

To showcase the generality of the package, we show how training a MixtureModel composed of a univariate distribution + a MixtureModel works. Note that, in practice, this can cause some identifiability issue (if everything is Normal for example).

θ₁ = -5
 θ₂ = 2
 σ₁ = 1
@@ -269,51 +269,51 @@ 

- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  • 1I am not sure if this was historically one of the first way to approach this problem. Anyway, this is more like an academic application rather than a good method to solve the MNIST problem.
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
  • 1I am not sure if this was historically one of the first way to approach this problem. Anyway, this is more like an academic application rather than a good method to solve the MNIST problem.
diff --git a/dev/fit_mle/index.html b/dev/fit_mle/index.html index c80c9b1..7e7c1a2 100644 --- a/dev/fit_mle/index.html +++ b/dev/fit_mle/index.html @@ -1,2 +1,2 @@ -Instance vs Type version · ExpectationMaximization.jl

Instance vs Type version

This package relies heavily on the types and methods defined in Distributions.jl e.g., fit_mle and logpdf. However, it differs slightly by one point: it defines and uses an “instance version” of fit_mle(dist::Distribution, x, args...). For the package to work and be generic, the input dist must be an “instance” i.e., an actual distribution like Normal(0,1) not just the type Normal. For MixtureModels (and ProductDistributions) this is critical to extract the various distributions inside, while the Type version does not (currently) contain such information. Plus, for MixtureModels the dist serves as the initial point of the EM algorithm. For classic distributions, it changes nothing, i.e., fit_mle(Normal(0,1), y) gives the same result as fit_mle(Normal, y).

I opened a PR#1670 in Distributions.jl to include this instance version, but it might not be accepted.

Compatibility

The new fit_mle methods defined in this package are fully compatible with Distributions.jl (it does not break any regular Distributions.fit_mle).

I believe that some packages dealing with complex distributions like Copulas.jl or HMMBase.jl could also use this formulation.

+Instance vs Type version · ExpectationMaximization.jl

Instance vs Type version

This package relies heavily on the types and methods defined in Distributions.jl e.g., fit_mle and logpdf. However, it differs slightly by one point: it defines and uses an “instance version” of fit_mle(dist::Distribution, x, args...). For the package to work and be generic, the input dist must be an “instance” i.e., an actual distribution like Normal(0,1) not just the type Normal. For MixtureModels (and ProductDistributions) this is critical to extract the various distributions inside, while the Type version does not (currently) contain such information. Plus, for MixtureModels the dist serves as the initial point of the EM algorithm. For classic distributions, it changes nothing, i.e., fit_mle(Normal(0,1), y) gives the same result as fit_mle(Normal, y).

I opened a PR#1670 in Distributions.jl to include this instance version, but it might not be accepted.

Compatibility

The new fit_mle methods defined in this package are fully compatible with Distributions.jl (it does not break any regular Distributions.fit_mle).

I believe that some packages dealing with complex distributions like Copulas.jl or HMMBase.jl could also use this formulation.

diff --git a/dev/index.html b/dev/index.html index 2600a4a..6ecb335 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,5 +1,5 @@ -Home · ExpectationMaximization.jl

ExpectationMaximization.jl

This package provides a simple implementation of the Expectation Maximization (EM) algorithm used to fit mixture models. Due to Julia amazing dispatch systems, generic and reusable code spirit, and the Distributions.jl package, the code while being very generic is both very expressive and fast[1]!

What type of mixtures?

In particular, it works on a lot of mixtures:

  • Mixture of Univariate continuous distributions
  • Mixture of Univariate discrete distributions
  • Mixture of Multivariate distributions (continuous or discrete)
  • Mixture of mixtures (univariate or multivariate and continuous or discrete)
  • More?

How?

Just define a mix::MixtureModel and do fit_mle(mix, y) with your data y and that's it! Have a look at the examples section.

To work, the only requirements are that the components of the mixture dist ∈ dists = components(mix) considered (custom or coming from an existing package)

  1. Are a subtype of Distribution i.e. dist<:Distribution.
  2. The logpdf(dist, y) is defined (it is used in the E-step)
  3. The fit_mle(dist, y, weigths) returns the distribution with parameters equals to MLE. This is used in the M-step of the ClassicalEM algorithm. For the StocasticEM version, only fit_mle(dist, y) is needed. Type or instance version of fit_mle for your dist are accepted thanks to this conversion line.

In general, 2. is easy, while 3. is only known explicitly for a few common distributions. In case 2. is not explicit known, you can always implement a numerical scheme, if it exists, for fit_mle(dist, y) see Gamma distribution example. Or, when possible, represent your “difficult” distribution as a mixture of simple terms. (I had this in mind, but it is not directly a mixture model.)

Note

Distributions.jl currently does not allow MixtureModel to both have discrete and continuous components[2].

Algorithms

ExpectationMaximization.StochasticEMType
Base.@kwdef struct StochasticEM<:AbstractEM
+Home · ExpectationMaximization.jl

ExpectationMaximization.jl

This package provides a simple implementation of the Expectation Maximization (EM) algorithm used to fit mixture models. Due to Julia amazing dispatch systems, generic and reusable code spirit, and the Distributions.jl package, the code while being very generic is both very expressive and fast[1]!

What type of mixtures?

In particular, it works on a lot of mixtures:

  • Mixture of Univariate continuous distributions
  • Mixture of Univariate discrete distributions
  • Mixture of Multivariate distributions (continuous or discrete)
  • Mixture of mixtures (univariate or multivariate and continuous or discrete)
  • More?

How?

Just define a mix::MixtureModel and do fit_mle(mix, y) with your data y and that's it! Have a look at the examples section.

To work, the only requirements are that the components of the mixture dist ∈ dists = components(mix) considered (custom or coming from an existing package)

  1. Are a subtype of Distribution i.e. dist<:Distribution.
  2. The logpdf(dist, y) is defined (it is used in the E-step)
  3. The fit_mle(dist, y, weigths) returns the distribution with parameters equals to MLE. This is used in the M-step of the ClassicalEM algorithm. For the StocasticEM version, only fit_mle(dist, y) is needed. Type or instance version of fit_mle for your dist are accepted thanks to this conversion line.

In general, 2. is easy, while 3. is only known explicitly for a few common distributions. In case 2. is not explicit known, you can always implement a numerical scheme, if it exists, for fit_mle(dist, y) see Gamma distribution example. Or, when possible, represent your “difficult” distribution as a mixture of simple terms. (I had this in mind, but it is not directly a mixture model.)

Note

Distributions.jl currently does not allow MixtureModel to both have discrete and continuous components[2].

Algorithms

Main function

Warning

To fit the mixture, use the “instance” version of fit_mle(mix::MixtureModel, ...) as described below and NOT the “Type” version, i.e., fit_mle(Type{MixtureModel}, ...). The provided mix is used as the starting point of the EM algorithm. See Instance vs Type version section for more context.

Distributions.fit_mleMethod
fit_mle(mix::MixtureModel, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Use the an Expectation Maximization (EM) algorithm to maximize the Loglikelihood (fit) the mixture with an i.i.d sample y. The mix input is a mixture that is used to initilize the EM algorithm.

  • weights when provided, it will compute a weighted version of the EM. (Useful for fitting mixture of mixtures)
  • method determines the algorithm used.
  • infos = true returns a Dict with informations on the algorithm (converged, iteration number, loglikelihood).
  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
  • atol criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iteration i and i+1 is smaller than atol i.e. |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.
  • rtol relative tolerance for convergence, |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2 (does not check if rtol is nothing)
  • display value can be :none, :iter, :final to display Loglikelihood evolution at each iterations :iter or just the final one :final
source
Distributions.fit_mleMethod
fit_mle(mix::AbstractArray{<:MixtureModel}, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Do the same as fit_mle for each (initial) mixtures in the mix array. Then it selects the one with the largest loglikelihood. Warning: It uses try and catch to avoid errors messages in case EM converges toward a singular solution (probably using robust should be enough in most case to avoid errors).

source

Utilities

ExpectationMaximization.predictFunction
predict(mix::MixtureModel, y::AbstractVector; robust=false)

Evaluate the most likely category for each observations given a MixtureModel.

  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
source
ExpectationMaximization.predict_probaFunction
predict_proba(mix::MixtureModel, y::AbstractVecOrMat; robust=false)

Evaluate the probability for each observations to belong to a category given a MixtureModel..

  • robust = true will prevent the (log)likelihood to under(overflow)flow to -∞ (or ).
source

fit_mle methods that should be in Distribution.jl

I opened two PRs, PR#1670 and PR#1676 to add these methods.

Distributions.fit_mleMethod
fit_mle(g::Product, x::AbstractMatrix)
-fit_mle(g::Product, x::AbstractMatrix, γ::AbstractVector)

The fit_mle for multivariate Product distributions g is the product_distribution of fit_mle of each components of g. Product is meant to be depreacated in next versions of Distribution.jl. Use the analog VectorOfUnivariateDistribution type instead.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Dirac}, x::AbstractArray{<:Real}[, w::AbstractArray{<:Real}])

fit_mle for Dirac distribution (weighted or not) data sets.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})

fit_mle for Laplace distribution weighted data sets.

source

Index

  • 1Have a look at the Benchmark section.
  • 2Rain is a good example of a mixture having both a discrete (Delta distribution in 0) and continuous (Exponential, Gamma, ...) component.
+end

The Stochastic EM algorithm was introduced by G. Celeux, and J. Diebolt. in 1985 in The SEM Algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem.

The default random seed is Random.GLOBAL_RNG but it can be changed via StochasticEM(seed).

source

Main function

Warning

To fit the mixture, use the “instance” version of fit_mle(mix::MixtureModel, ...) as described below and NOT the “Type” version, i.e., fit_mle(Type{MixtureModel}, ...). The provided mix is used as the starting point of the EM algorithm. See Instance vs Type version section for more context.

Distributions.fit_mleMethod
fit_mle(mix::MixtureModel, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Use the an Expectation Maximization (EM) algorithm to maximize the Loglikelihood (fit) the mixture with an i.i.d sample y. The mix input is a mixture that is used to initilize the EM algorithm.

  • weights when provided, it will compute a weighted version of the EM. (Useful for fitting mixture of mixtures)
  • method determines the algorithm used.
  • infos = true returns a Dict with informations on the algorithm (converged, iteration number, loglikelihood).
  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
  • atol criteria determining the convergence of the algorithm. If the Loglikelihood difference between two iteration i and i+1 is smaller than atol i.e. |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<atol, the algorithm stops.
  • rtol relative tolerance for convergence, |ℓ⁽ⁱ⁺¹⁾ - ℓ⁽ⁱ⁾|<rtol*(|ℓ⁽ⁱ⁺¹⁾| + |ℓ⁽ⁱ⁾|)/2 (does not check if rtol is nothing)
  • display value can be :none, :iter, :final to display Loglikelihood evolution at each iterations :iter or just the final one :final
source
Distributions.fit_mleMethod
fit_mle(mix::AbstractArray{<:MixtureModel}, y::AbstractVecOrMat, weights...; method = ClassicEM(), display=:none, maxiter=1000, atol=1e-3, rtol=nothing, robust=false, infos=false)

Do the same as fit_mle for each (initial) mixtures in the mix array. Then it selects the one with the largest loglikelihood. Warning: It uses try and catch to avoid errors messages in case EM converges toward a singular solution (probably using robust should be enough in most case to avoid errors).

source

Utilities

ExpectationMaximization.predictFunction
predict(mix::MixtureModel, y::AbstractVector; robust=false)

Evaluate the most likely category for each observations given a MixtureModel.

  • robust = true will prevent the (log)likelihood to overflow to -∞ or .
source
ExpectationMaximization.predict_probaFunction
predict_proba(mix::MixtureModel, y::AbstractVecOrMat; robust=false)

Evaluate the probability for each observations to belong to a category given a MixtureModel..

  • robust = true will prevent the (log)likelihood to under(overflow)flow to -∞ (or ).
source

fit_mle methods that should be in Distribution.jl

I opened two PRs, PR#1670 and PR#1676 to add these methods.

Distributions.fit_mleMethod
fit_mle(g::Product, x::AbstractMatrix)
+fit_mle(g::Product, x::AbstractMatrix, γ::AbstractVector)

The fit_mle for multivariate Product distributions g is the product_distribution of fit_mle of each components of g. Product is meant to be depreacated in next versions of Distribution.jl. Use the analog VectorOfUnivariateDistribution type instead.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Dirac}, x::AbstractArray{<:Real}[, w::AbstractArray{<:Real}])

fit_mle for Dirac distribution (weighted or not) data sets.

source
Distributions.fit_mleMethod
fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real})

fit_mle for Laplace distribution weighted data sets.

source

Index

  • 1Have a look at the Benchmark section.
  • 2Rain is a good example of a mixture having both a discrete (Delta distribution in 0) and continuous (Exponential, Gamma, ...) component.
diff --git a/dev/search/index.html b/dev/search/index.html index 9eb1711..1482e25 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · ExpectationMaximization.jl

Loading search...

    +Search · ExpectationMaximization.jl

    Loading search...