-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Handle data type of response
data type in HeuristicsProductNavigationPage (BrowserResponse
vs HttpResponse
)
#25
Comments
I think scrapinghub/andi#30 should allow to express this kind of dependencies in page objects. |
Yes, it would 👍 Though some new interface in page objects should be made to allow declaration of dynamic dependencies. For example, depending on the |
Yeah, this case is more tricky than the original use case. I was thinking that it still can be handled implicitly: page object defines |
@BurnzZ I've checked this approach with |
After double checking I see that this is working, thanks! |
Both |
Yes, extract_from is httpResponseBody, but I mean that here https://github.com/scrapy-plugins/scrapy-zyte-api/pull/161/files#diff-67cea92ffa3989fe30ffd7f4bbe868f953810e0f4fdf2ddfe7f18bc54fba505eR81 in |
The PR in scrapy-plugins/scrapy-zyte-api#161 has been updated so that providers can also use the instances created by earlier providers. That is, The new PR in scrapinghub/scrapy-poet#184 also made this possible. |
Problem
Currently,
HeuristicsProductNavigationPage
usesHttpResponse
as one of the dependencies (code ref). This causes the spider to send 2 separate requests to ZAPI if theextract_from
parameter is set to "browserHtml":{"productNavigationOptions": {"extractFrom": "browserHtml"}, "productNavigation": true, "url": "https://books.toscrape.com/"}
{"httpResponseBody": true, "url": "https://books.toscrape.com/"}
This is also true when it's left to the default value which is
extract_from=None
.Note that this issue is not related to scrapy-plugins/scrapy-zyte-api#91 since scrapinghub/scrapy-poet#175 has addressed it.
The problem that we face is that the dependency of
HeuristicsProductNavigationPage
should be dynamic depending on theextract_from
input value:response: BrowserResponse
, ifextract_from=browserHtml
response: HttpResponse
, ifextract_from=httpResponseBody
The text was updated successfully, but these errors were encountered: