Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLARA, FastCLARA, FasterCLARA #5

Open
kno10 opened this issue Dec 11, 2023 · 0 comments
Open

Add CLARA, FastCLARA, FasterCLARA #5

kno10 opened this issue Dec 11, 2023 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers needs funding Issues that would need funding to be completed

Comments

@kno10
Copy link
Owner

kno10 commented Dec 11, 2023

CLARA roughly does:

  • subsample the data
  • run PAM (FastCLARA: FastPAM, FasterCLARA: FasterPAM) on the sample
  • compute the total deviation on the entire data set for these medoids
  • return the best result found with multiple subsamples

This may seem like a trivial addition at first (and it would indeed only be a few lines in the Python wrapper) BUT:

  • this package currently does not include any distance functions, but operates on precomputed distance matrixes only
  • if you already have the distance matrix, just use FasterPAM and you will be fine
  • a meaningful implementation of these only computes the distance matrix on the subsample - which needs a data matrix as input and distance functions
  • for many users it will still be more convenient to handle the subset/sample within their own application

Hence a rough implementation plan would be

  • design an API for computing distances compatible with typical users (python wrapper, rust native users)
  • implement a decent choice of distance functions
  • implement CLARA
  • tests
  • update the Python wrapper

Adding distance function will also be necessary for CLARANS #6 BanditPAM #2 or coreset approaches #4

@kno10 kno10 added enhancement New feature or request help wanted Extra attention is needed labels Dec 11, 2023
@kno10 kno10 added the needs funding Issues that would need funding to be completed label Dec 11, 2023
@kno10 kno10 added good first issue Good for newcomers and removed help wanted Extra attention is needed labels Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers needs funding Issues that would need funding to be completed
Projects
None yet
Development

No branches or pull requests

1 participant