Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopK optimization to store not more than k values per window #121

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arunkm
Copy link
Collaborator

@arunkm arunkm commented Dec 6, 2019

No description provided.

@arunkm arunkm requested review from peterfreiling and zoryn December 7, 2019 01:24
var combinedComparer = ThenOrderBy(Reverse(rankComparer), overallComparer);
var generator = combinedComparer.CreateSortedDictionaryGenerator<T, long>(container);
var replaced = template.ReplaceParametersInBody(generator);
initialState = Expression.Lambda<Func<ITopKState<T>>>(replaced);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialState [](start = 12, length = 12)

Did you disable the stylecop rules? instance members should be prefixed by "this.". Please make sure to follow style rules.


var combinedComparer = ThenOrderBy(Reverse(rankComparer), overallComparer);
var generator = combinedComparer.CreateSortedDictionaryGenerator<T, long>(container);
var replaced = template.ReplaceParametersInBody(generator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what is going on here...

long Count { get; }
}

internal class SimpleTopKState<T> : ITopKState<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleTopKState [](start = 19, length = 15)

Shouldn't this also be capped t the top K ranks rather than storing everything?

}

public void Remove(T input, long timestamp)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use expression bodies for small one-liners

/// State used by TopK Aggregate
/// </summary>
/// <typeparam name="T"></typeparam>
public interface ITopKState<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ITopKState [](start = 21, length = 10)

This will wrap every aggregation operation a virtual function call. I wonder if it would be better to consolidate them with a single class and differentiate behavior at runtime like the Min/Max state, especially if we optimize the simple top k state as well so it will also have a comparer, etc.

this.k = k;
this.rankComparer = rankComparer;
this.currentValues = new SortedMultiSet<T>(generator);
this.previousValues = new SortedMultiSet<T>(generator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previousValues [](start = 17, length = 14)

Are there cases where this is never needed? perhaps lazy init?

throw new ArgumentException("Cannot remove entries with current or future timestamp");
}
previousValues.RemoveAll(otherTopK.currentValues);
previousValues.RemoveAll(otherTopK.previousValues);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you only handle the case where otherTopK.currentTimestamp > currentTimestamp, but allow otherTopK.currentTimestamp == currentTimestamp

return currentValues;
else
{
MergeCurrentToPrevious();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergeCurrentToPrevious [](start = 16, length = 22)

This seems to modify state significantly... please validate or at the very least comment why this isn't a problem.

if (!currentValues.IsEmpty)
{
// Swap so we merge small onto larger
if (previousValues.UniqueCount < currentValues.UniqueCount)
Copy link
Contributor

@peterfreiling peterfreiling Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (previousValues.UniqueCount < currentValues.UniqueCount) [](start = 16, length = 59)

When would this happen? Doesn't seem likely or significant enough to warrant the optimization #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i guess most often on the first Merge?


In reply to: 356294107 [](ancestors = 356294107)

{
currentValues.AddAll(otherTopK.currentValues);
while (currentValues.TotalCount > k)
currentValues.Remove(currentValues.First());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why previousValues is ignored here...

@peterfreiling
Copy link
Contributor

Please make sure to honor all style cop (Roslyn analyzers) and VS style guidelines (should be red lines in your environment). If you are using VS and not seeing them let's figure out why. If not please find a way to enforce them locally.

@peterfreiling
Copy link
Contributor

I realize the old code could use better documentation but the new code you are adding is not trivial and needs to be properly commented/documented. Please take time to comment thoroughly, and improve existing code when appropriate.

Copy link
Contributor

@peterfreiling peterfreiling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@arunkm arunkm force-pushed the users/arunkm/topk-optimization/store-only-k-values-per-window branch from 5f60996 to 8e069d7 Compare December 16, 2019 19:30
@arunkm arunkm force-pushed the users/arunkm/topk-optimization/store-only-k-values-per-window branch from 8e069d7 to 74fc7b1 Compare December 21, 2019 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants