-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache linkable elements #1229
Cache linkable elements #1229
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
path_key_to_linkable_dimensions=path_key_to_linkable_dimensions, | ||
path_key_to_linkable_entities=path_key_to_linkable_entities, | ||
path_key_to_linkable_metrics=path_key_to_linkable_metrics, | ||
) | ||
logger.info(f"Filtering valid linkable elements took: {time.time() - start_time:.2f}s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to cache this function too, but that change would be a bit more involved since base class (LinkableElementSet
) relies heavily on dictionaries, which are not hashable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just store Sequence[Tuple[ElementPathKey, Sequence[LinkableDimension]]]
or whatever instead of a dict? We go to some lengths to avoid mutation of these things anyway, might as well make it official.
We shouldn't do it in this PR, anyway, that sounds like a fairly involved change given the need to update callsites and all of those merge functions and whatnot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that could work!
functools.cache is basically just a wrapper around functools.lru_cache, but it wasn't added until Python 3.9. Since we support Python 3.8, we need to use lru_cache instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, why not?
path_key_to_linkable_dimensions=path_key_to_linkable_dimensions, | ||
path_key_to_linkable_entities=path_key_to_linkable_entities, | ||
path_key_to_linkable_metrics=path_key_to_linkable_metrics, | ||
) | ||
logger.info(f"Filtering valid linkable elements took: {time.time() - start_time:.2f}s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just store Sequence[Tuple[ElementPathKey, Sequence[LinkableDimension]]]
or whatever instead of a dict? We go to some lengths to avoid mutation of these things anyway, might as well make it official.
We shouldn't do it in this PR, anyway, that sounds like a fairly involved change given the need to update callsites and all of those merge functions and whatnot.
with_any_of: Optional[FrozenSet[LinkableElementProperty]] = None, | ||
without_any_of: Optional[FrozenSet[LinkableElementProperty]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wondered why these weren't frozenset to begin with.
@@ -80,6 +83,7 @@ def linkable_elements_for_no_metrics_query( | |||
without_any_of=frozen_without_any_of, | |||
) | |||
|
|||
@functools.cache | |||
def linkable_elements_for_metrics( | |||
self, | |||
metric_references: Sequence[MetricReference], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof Sequence isn't counted as immutable. Should we update this to Tuple or does the typechecker catch it if we pass in a non-hashable type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type checker catches it! It fails if the type isn't hashable.
Completes SL-2262
Description
Recently we've seen an increase in latency for compile time on queries. The theory is that this might be caused by the addition of
LinkableMetrics
to group by options, which may have dramatically increased the number of group by options available. Looping through them could add a lot of slowness to the query.I added some logs to see how long those loops were taking, and looking at data from the past 24 hours, the call to
linkable_elements_for_measure
took about .1 seconds on each call for long-running queries. This could be called quite a few times for the same measure within a given query, resulting in a lot of duplicate work & unnecessary latency. To fix that, I decided to cache that method. I also cached a couple of similar methods along the way.I tested this primarily by running queries in pytest and showing INFO logs. There was a major reduction in the logs for these functions after adding the cache (especially in scenarios where a given measure is used several times in one query).
The other changes in this PR are making params hashable, which is required for the cache decorator.
I did not add an expiration for the cache - since these classes are scoped to a given semantic manifest, the valid linkable elements for a given query should never change.