Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Splash responses #45

Merged
merged 36 commits into from
Apr 2, 2016
Merged

Custom Splash responses #45

merged 36 commits into from
Apr 2, 2016

Conversation

kmike
Copy link
Member

@kmike kmike commented Mar 29, 2016

This PR provides an easier API to process data returned by Splash.

TODO:

  • fix saving/loading Splash responses to/from cache;
  • magical JSON response handling:
    • set response.headers from 'headers';
    • set response.headers from 'cookies';
    • set response.body either from 'html' or 'body';
    • set response.status from 'http_status';
    • set response.url from 'url.
  • add shortcuts to access base64-encoded values (e.g. 'png' / 'jpeg' keys);
  • re-check that SplashRequest works correctly with persistent request queues: it works, but __repr__ is lost; see Disk queues don't preserve Request class scrapy/scrapy#1890.
  • check how custom Splash responses interact with builtin Scrapy middlewares, esp. when headers are restored from JSON response:
  • check that examples work;
  • allow to handle cookies transparently;
  • allow to handle headers transparently;

See #44, #14, #12, #10, #8.

@codecov-io
Copy link

Current coverage is 89.70%

Merging #45 into master will increase coverage by +0.33% as of e352f77

@@            master    #45   diff @@
=====================================
  Files            6      9     +3
  Stmts          160    379   +219
  Branches         0     69    +69
  Methods          0      0       
=====================================
+ Hit            143    340   +197
- Partial          0     20    +20
- Missed          17     19     +2

Review entire Coverage Diff as of e352f77

Powered by Codecov. Updated on successful CI builds.

@kmike kmike changed the title [WIP] custom Splash responses Custom Splash responses Apr 2, 2016
@kmike kmike merged commit b42fd60 into master Apr 2, 2016
``/execute`` endpoint and a compatible Lua rendering script.

If you want to start from the same set of cookies, but then 'fork' sessions
set ``request.meta['splash']['args']['new_session_id']`` in addition to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both session_id and new_session_id are currently (https://github.com/scrapy-plugins/scrapy-splash/pull/45/files#diff-93e5c0fca1f417cfa28d48c75408be45R59) searched for in request.meta['splash'], not in ```request.meta['splash']['args']`. Which is more correct, the docs or the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good catch; the code is correct. args are arguments sent to Splash; new_session_id is a SplashMiddleware option.

@kmike kmike deleted the responses branch April 11, 2016 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants