Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fit!(o, y, n) to fit multiple observations of the same value #40

Merged
merged 4 commits into from
Apr 19, 2024

Conversation

adknudson
Copy link
Contributor

This closes #284 that I opened. It's a minimum implementation where simple stats such as Sum, Mean, and CountMap (and a few others) can efficiently update their sufficient stats given a value and the number of times it was observed.

Methods without a specialized update will fall back to

for _ in 1:n
    fit!(o, y)
end

If you prefer to use fitn!(o, y, n), then I can make the change and update the pull request.

@adknudson
Copy link
Contributor Author

Other things to check:

  1. I added a default implementation for _fit!(o, y) that throws an error if it is not implemented for an OnlineStat.
  2. The specialized method for Sum(Integer) needs review. _fit(Sum(Integer), 3.5, 10) will first multiply 3.5 by 10, then round to the nearest integer, which will give a different result than fitting 3.5 ten times in a row.

src/stats.jl Outdated
Comment on lines 471 to 474
function _fit!(o::Mean{T}, y, n) where {T}
o.n += n
o.μ = smooth(o.μ, y, n / o.n)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be correct for Mean{T,W} when W is something other than EqualWeight

Copy link
Contributor Author

@adknudson adknudson Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply there is also a problem with _merge!(o::Mean, o2::Mean)? My thinking is along the lines that fitting y k times is equivalent to merging two means where the second one has k values and a mean of y. What is the appropriate way to deal with weights? Like this?

function _fit!(o::Mean{T}, y, n) where {T}
    o.n += n
    o.μ = smooth(o.μ, y, o.weight(o.n / n))
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, in testing that doesn't work. I can define a special version for Mean with equal weight, and then use the generic fallback otherwise

@joshday
Copy link
Owner

joshday commented Apr 19, 2024

I'll merge when CI goes green!

@joshday joshday merged commit a217f23 into joshday:master Apr 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow counts argument in fit!
2 participants