diff --git a/Project.toml b/Project.toml index 72cf559..90d1330 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "ExpectationMaximization" uuid = "e1fe09cc-5134-44c2-a941-50f4cd97986a" authors = ["David Métivier <46794064+dmetivie@users.noreply.github.com> and contributors"] -version = "0.1.5" +version = "0.1.6" [deps] ArgCheck = "dce04be8-c92d-5529-be00-80e4d2c0e197" diff --git a/README.md b/README.md index 7a944a1..0019598 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,29 @@ # ExpectationMaximization This package provides a simple implementation of the Expectation Maximization algorithm used to fit mixture models. -Due to [Julia](https://julialang.org/) amazing [multiple dispatch](https://www.youtube.com/watch?v=kc9HwsxE1OY) systems and the [Distributions](https://juliastats.org/Distributions.jl/stable/) package, the code is very generic i.e., mixture of all common distributions should be supported. +Due to [Julia](https://julialang.org/) amazing [multiple dispatch](https://www.youtube.com/watch?v=kc9HwsxE1OY) systems and the [Distributions](https://juliastats.org/Distributions.jl/stable/) package, the code is very generic i.e., a lot of mixture should work: -I plan to add different methods for E-step and M-steps like stochastic EM and others. +- Univariate continuous distributions +- Univariate discrete distributions +- Multivariate distributions (continuous or discrete). +- Mixture of mixture (univariate or multivariate and continuous or discrete). Note that [Distributions](https://juliastats.org/Distributions.jl/stable/) currently does not allow `MixtureModel` to have discrete and continuous components (but who does that? Rain). -I should add examples with MNIST dataset and Bernoulli mixtures. Look at the tests to see examples with multivariate distributions. +**Have a look at the tests sections to see examples**. + +To work, the only requirements are that the `dist<:Distribution` considered has implanted + +1. `logpdf(dist, y)` (used in the E-step) +2. `fit_mle(dist, y, weigths)` (used in the M-step) + +In general 1. is easy, while 2. is only known explicitly for a few common distributions. +In case 2. is not explicit known, you can always implement a numerical scheme, if it exists, for `fit_mle(dist, y)` see [`Gamma` distribution example](https://github.com/JuliaStats/Distributions.jl/blob/34a05d8a1671052624e7fa246b58484acc32cfe5/src/univariate/continuous/gamma.jl#L171). +Or, when possible, represent your “difficult” distribution as a mixture of simple terms. +(I had [this](https://stats.stackexchange.com/questions/63647/estimating-parameters-of-students-t-distribution) in mind, but it is not directly a mixture model.) + +## TODO (feel free to contribute) + +- Add different variants for E-step and M-steps like stochastic EM and others. +- Add examples with MNIST dataset and Bernoulli mixtures. ## Example @@ -49,4 +67,5 @@ isapprox(θ₁, p[1]...; rtol = rtol) isapprox(α, p[2][1]; rtol = rtol) isapprox(θ₂, p[2][2]; rtol = rtol) ``` + ![EM_mixture_example.svg](img/EM_mixture_example.svg)