-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSK-1911 Improve performance of the push feature & make CTA almost instant #1512
GSK-1911 Improve performance of the push feature & make CTA almost instant #1512
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to write some unit tests on you new class.
Also, I would prefer the caching to be done with a parametrize decorator, to be easily re-usable.
I think you can add this behaviour to websocket_actor, with a new parameter "cache" defaulting to false, using params as the object to hash
|
||
def add_result(self, obj, result): | ||
# Calculate the hash of the object | ||
obj_hash = hashlib.md5(repr(obj).encode()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add encoding explicitely to encode pls ?
giskard/ml_worker/utils/cache.py
Outdated
|
||
class SimpleCache: | ||
def __init__(self, max_results=20): | ||
self.results = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add typing hint here, for better readability ?
giskard/ml_worker/utils/cache.py
Outdated
self.results[obj_hash] = result | ||
|
||
# Add the object's hash to the order list and remove the oldest result if necessary | ||
self.order.append(obj_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use LRU (least recently used) algorithm here, instead of oldest.
@@ -54,11 +55,9 @@ | |||
from giskard.utils.analytics_collector import analytics | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
push_cache = SimpleCache(max_results=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write it as PUSH_CACHE instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, 20 is not enough I think. Maybe 128 ?
perturbs = create_perturbation_push(model, dataset, df) | ||
overconf = create_overconfidence_push(model, dataset, df) | ||
borderl = create_borderline_push(model, dataset, df) | ||
cached_result_tuple = push_cache.get_result(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using it on the overall get_push instead ?
And also, please try to use type hinting for variable and methods |
SonarCloud Quality Gate failed. 0 Bugs 23.9% Coverage Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
Description
Includes a "SimpleCache" class that uses hashlib to cache anything we want inside it. It only stores the last 20 results.
This greatly improves the usability of the insights feature by:
It also includes a small rewording that JM asked me to do, because it's only text changes I did not want to create yet another PR for it.