Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ShrunkCovariance #309

Merged
merged 12 commits into from
Nov 19, 2024
Merged

Added ShrunkCovariance #309

merged 12 commits into from
Nov 19, 2024

Conversation

norm4nn
Copy link
Contributor

@norm4nn norm4nn commented Nov 8, 2024

What has been added:

  • ShrunkCovariance module (scikit-learn context)
  • Covariance.Utilites module with common functions
  • tests and docs for ShrunkCovariance

Following up on #304: I added the ? sign to the assume_centered option, but I'm unsure about replacing empirical_covariance/1 with Nx.covariance/3 because tests don’t pass when I make that change. I believe the tests should stay as they are, since I’ve already verified the results against scikit-learn.

@josevalim josevalim requested a review from polvalente November 11, 2024 11:31
Copy link
Member

@krstopro krstopro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nitpicks.

lib/scholar/covariance/utils.ex Outdated Show resolved Hide resolved
{x - location, location}
end

defn empirical_covariance(x) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use Nx.covariance/2 instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, look at the PR description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot to ask: @norm4nn did you try setting ddof: 0 when calling Nx.covariance/2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, ddof: 0 is default. It's strange that the tests are failing, because Nx.covariance/2 does exactly what is implemented here. Could it be the case that the data in your test is not centered and you are using Nx.covariance/2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right! This is the case, I will fix this on Thursday.

@@ -182,9 +149,9 @@ defmodule Scholar.Covariance.LedoitWolf do

defnp ledoit_wolf_shrinkage_complex(x) do
{num_samples, num_features} = Nx.shape(x)
emp_cov = empirical_covariance(x)
emp_cov = Nx.covariance(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is x centered here? If yes, maybe reverting it to empirical_covariance might be better. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think empirical_covariance/1 was centering x, if that's what you're asking about. However, I'm still a bit confused as to why, when I used empirical_covariance/1 instead of Nx.covariance/2, the Covariance.*.fit/2 functions produced the same output as the scikit-learn versions when I passed a non-centered x and set assume_centered? to true. While I feel this case is invalid and can't think of any practical use case for passing such arguments, it still feels odd that the output in this edge case differs when using Nx.covariance/2 compared to the scikit-learn version. It seems empirical_covariance might not operate exactly the same way as Nx.covariance/2 , so I also like the idea of reverting it to empirical_covariance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme have a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need

Suggested change
emp_cov = Nx.covariance(x)
emp_cov = Nx.covariance(x, Nx.reshape(0, {1}))

for the code to be totally equivalent.

empirical_covariance was blindly assuming the mean is 0

Copy link
Member

@krstopro krstopro Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empirical_covariance was blindly assuming the mean is 0

Yes, because the data should be centered before. That is why using empirical_covariance might be better idea. Like this the data won't be centered twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want me to change it back to empirical_covariance version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response! I am still checking why empirical_covariance and Nx.covariance are giving completely different results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@norm4nn let's revert to empirical_covariance for now so we can merge this and get unstuck. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

@josevalim josevalim merged commit f84177f into elixir-nx:main Nov 19, 2024
2 checks passed
@josevalim
Copy link
Contributor

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants