Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make nn algorithm configurable #281

Merged
merged 5 commits into from
Jun 14, 2024

Conversation

msluszniak
Copy link
Contributor

@msluszniak msluszniak commented Jun 10, 2024

Closes #280

Copy link
Member

@krstopro krstopro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment regarding allowed algorithms.

@@ -113,6 +113,16 @@ defmodule Scholar.Manifold.Trimap do
doc: ~S"""
Metric used to compute the distances.
"""
],
algorithm: [
type: {:in, [:nndescent, :large_vis]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add at least :brute. Maybe custom k-NN graph construction algorithm to be passed as a module as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit in using brute other than nndescent or large_vis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's the best one to use for smaller datasets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps making the selection automatic depending on dataset size (sample size and number of features) would be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will force use brute for n < 100 and otherwise one of these two approximated algorithms, is it ok or do we want to add :brute anyway?

Copy link
Member

@krstopro krstopro Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use brute-force for $N \times D^2 \leq T$ for some constant $T$ (e.g. $10^5$ or $10^6$ or so).
I would add :brute anyway; it might be useful to see how much the quality of embeddings differ when approximate k-NN search algorithms are used.

Looking at it now, we might wanna change predict to transform in these algorithms as well.

Copy link
Contributor Author

@msluszniak msluszniak Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for $N = 20000$ and $D = 2$, we rather won't use brute, generally, it's more effective to have this condition only on N I guess, I can increase this to 500-1000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@krstopro
Copy link
Member

Function that computes the Trimap is called embed (not fit or transform). Does this deviate from the rest?

@msluszniak
Copy link
Contributor Author

Function that computes the Trimap is called embed (not fit or transform). Does this deviate from the rest?

Yeah, I will change the name to transform.

@msluszniak msluszniak requested a review from krstopro June 12, 2024 18:24
Copy link
Member

@krstopro krstopro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments.

Perhaps renaming algorithm to knn_algorithm would make it more understandable? Up to you.

lib/scholar/manifold/trimap.ex Show resolved Hide resolved
lib/scholar/manifold/trimap.ex Outdated Show resolved Hide resolved
@msluszniak msluszniak requested a review from krstopro June 13, 2024 22:00
@krstopro
Copy link
Member

Looks good to me! If you want, I can have a more detailed look tomorrow.

Otherwise, feel free to merge it. I hope I am not annoying with these comments. ^_^

@msluszniak
Copy link
Contributor Author

Looks good to me! If you want, I can have a more detailed look tomorrow.

Otherwise, feel free to merge it. I hope I am not annoying with these comments. ^_^

Your comments is a valueable feedback for me, so don't stop 😀

@msluszniak msluszniak merged commit 433041f into elixir-nx:main Jun 14, 2024
2 checks passed
@msluszniak msluszniak deleted the custom_nn_trimap branch June 14, 2024 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make kNN configurable in Trimap
3 participants