Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 291 #294

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fix 291 #294

wants to merge 3 commits into from

Conversation

hv10
Copy link

@hv10 hv10 commented Nov 22, 2024

This is an attempt to fix #291

The solution I came up with after adding a test for my specific issue is to avoid the division by zero by adding OnlineStats.$\epsilon$ to the offending lines.

Only downside:
The basic test has to be modified to accept a solution which is slightly wrong (by $\epsilon$) instead of an exact match.

Aside from this it came up green for any other tests.

@joshday
Copy link
Owner

joshday commented Jan 14, 2025

This is probably fine, but why do NaNs occur when fitting the same vector? I might want this behavior to be opt-in if fitting the same vector is breaking assumptions.

@hv10
Copy link
Author

hv10 commented Jan 14, 2025

When fitting the same vector twice the variance between them is $0.0$ so all variance can be explained with just repeating the original vector as the principal component (when starting from a freshly initialized CCIPCA).
This should not be an issue for datasets where we expect at least some level of measurement noise (leading to a natural spread in the variables - and therefore variance) even if we measure the same underlying value twice.
In my case though one of my timeseries is without any measurement noise and holds a value for an amount of timesteps - leading to this issue.
As the CCIPCA method has a parameter describing it's forgetfullness it can even happen after a bunch of observations have already been fitted if we observe the same value for long enough.

I think the current implementation makes no assumption over the underlying data that would make it disallowed to fit to repeated observations.

Also: I have not checked if the issue also occurs when only one variable is of variance $0$.

Alternatively one could think about only "applying" an update to the CCIPCA when it does not lead to a eigenvalue of $0.0$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encountering NaN when fitting Vectors with CCIPCA
2 participants