Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache provider output per request #94

Merged
merged 1 commit into from
Jun 16, 2023

Conversation

Gallaecio
Copy link
Contributor

As a side effect, it allows to work around #91 by requesting indirect outputs directly on the spider callback as well.

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #94 (2023595) into main (3e03e87) will increase coverage by 0.07%.
The diff coverage is 100.00%.

❗ Current head 2023595 differs from pull request most recent head 7c59058. Consider uploading reports for the commit 7c59058 to get more accurate results

@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
+ Coverage   97.44%   97.52%   +0.07%     
==========================================
  Files           9        9              
  Lines         627      647      +20     
==========================================
+ Hits          611      631      +20     
  Misses         16       16              
Impacted Files Coverage Δ
scrapy_zyte_api/providers.py 100.00% <100.00%> (ø)

@wRAR wRAR merged commit 3c82d07 into scrapy-plugins:main Jun 16, 2023
@@ -21,11 +22,29 @@ class ZyteApiProvider(PageObjectInputProvider):

provided_classes = {BrowserResponse, BrowserHtml, Product}

def __init__(self, injector):
super().__init__(injector)
self._cached_instances: WeakKeyDictionary[Request, Dict] = WeakKeyDictionary()
Copy link
Member

@kmike kmike Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar solution in ItemProvider. I wonder if it's possible to move this cache to Injector instead, so that there is a single cache which is shared across all providers.

The first step could be using the same cached_istances storage, and keeping all cache handling code as-is. But maybe it'd be possible to improve on this further, e.g. don't pass a class to to_provide if an instance is in the cache, skip a provider if all to_provide classes are in cache, and also automatically add results to the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in my to-do, but I wonder if we need it now that ItemProvider is gone. I am not sure in which scenario the injector a provider would get the same Request + deps now, given duplicate request filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants