Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSK-1911 Improve performance of the push feature & make CTA almost instant #1512

Conversation

Googleton
Copy link
Contributor

Description

Includes a "SimpleCache" class that uses hashlib to cache anything we want inside it. It only stores the last 20 results.

This greatly improves the usability of the insights feature by:

  1. Due to being the last-seen example, all calls to CTA will be cached making them much faster
  2. Since we store a few examples, if I'm navigating between a few examples to do comparisons, we always hit the cache and end up with ultra fast results

It also includes a small rewording that JM asked me to do, because it's only text changes I did not want to create yet another PR for it.

@Googleton Googleton requested a review from Hartorn October 26, 2023 13:04
Copy link
Member

@Hartorn Hartorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to write some unit tests on you new class.
Also, I would prefer the caching to be done with a parametrize decorator, to be easily re-usable.
I think you can add this behaviour to websocket_actor, with a new parameter "cache" defaulting to false, using params as the object to hash


def add_result(self, obj, result):
# Calculate the hash of the object
obj_hash = hashlib.md5(repr(obj).encode()).hexdigest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add encoding explicitely to encode pls ?


class SimpleCache:
def __init__(self, max_results=20):
self.results = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add typing hint here, for better readability ?

self.results[obj_hash] = result

# Add the object's hash to the order list and remove the oldest result if necessary
self.order.append(obj_hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use LRU (least recently used) algorithm here, instead of oldest.

@@ -54,11 +55,9 @@
from giskard.utils.analytics_collector import analytics

logger = logging.getLogger(__name__)


push_cache = SimpleCache(max_results=20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write it as PUSH_CACHE instead ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, 20 is not enough I think. Maybe 128 ?

perturbs = create_perturbation_push(model, dataset, df)
overconf = create_overconfidence_push(model, dataset, df)
borderl = create_borderline_push(model, dataset, df)
cached_result_tuple = push_cache.get_result(params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using it on the overall get_push instead ?

@Hartorn
Copy link
Member

Hartorn commented Oct 26, 2023

You need to write some unit tests on you new class. Also, I would prefer the caching to be done with a parametrize decorator, to be easily re-usable. I think you can add this behaviour to websocket_actor, with a new parameter "cache" defaulting to false, using params as the object to hash

And also, please try to use type hinting for variable and methods

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 2 Security Hotspots
Code Smell A 0 Code Smells

23.9% 23.9% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@Hartorn Hartorn closed this Nov 3, 2023
@Hartorn Hartorn deleted the feature/gsk-1911-cta-add-test-takes-too-long-15-sec-just-to-show-all-the-test branch November 3, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants