Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison of string pairs #19

Open
rieck opened this issue May 4, 2015 · 2 comments
Open

Comparison of string pairs #19

rieck opened this issue May 4, 2015 · 2 comments

Comments

@rieck
Copy link
Owner

rieck commented May 4, 2015

There exist analysis tasks where the similarity between pairs of strings needs to be computed. In this setting, computing a similarity matrix over all strings is clearly an overkill and it would be great if Harry could support this setting, e.g. using a special command-line option.

@gsever
Copy link

gsever commented Nov 15, 2015

Hello, it would be also good to output similarity score based on a threshold rather than all results.

@rieck
Copy link
Owner Author

rieck commented Nov 16, 2015

That's a very good idea. However, we would need to introduce a new representation and output format. Currently, Harry stores computed similarity values in a matrix. The benefit of a threshold would be that many of the matrix entries could be omitted and we would end with a sparse representation. I'll put this on my TODO list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants