-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can see the result of simpliers/tokenizers on Strings rather than just result #26
Comments
Sure. You can put a break point in CosineSimilarity.java at line 62. Or if you want to log what goes in, the builder relies on interfaces rather then concrete implementations so you can wrap the metric in your own metric. But I think you should write unit tests to validate if your SpecialReplacementsSimplifier works as it should rather then visual inspection. MultisetMetric<String> loggingMetric = new MultisetMetric<String>() {
final CosineSimilarity<String> cos = new CosineSimilarity<>();
@Override
public float compare(Multiset<String> a, Multiset<String> b) {
System.out.println("CosineSimilarity [");
System.out.println("a: " + a);
System.out.println("b: " + a);
System.out.println("]");
return cos.compare(a,b);
}
};
StringMetric metric = with(loggingMetric)
.simplify(Simplifiers.toLowerCase())
.simplify(Simplifiers.removeDiacritics())
.simplify(new SpecialReplacementsSimplifier())
.tokenize(Tokenizers.whitespace())
.build(); |
Thanks that works, but Ideally I would like it to output the two original strings well. Of course I can output these myself before making the compare call, but in a multithreaded system other calls may get interleaved. I wanted this to check my whole simmetrics stack, access to the tokenized sets (as you ve shown me above) is needed to write unit tests anyway |
Then you shouldn't use the builder. Its design relies on being indifferent towards the individual components as long as they adhere to their interface. |
If you say so, though it would seem quite useful to have a way of seeing the effects of a builder on some inputs without having to break down the individual steps. |
What would you do with this information? |
So may typically have@
What I would like to do for debugging is an easy way to see the final step before the cosine similarity, i,e the contents of the sets created by applying the simplifiers and then finally the tokenizer(s), is this possible ?
The text was updated successfully, but these errors were encountered: