Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to send requests to Splash by default #11

Open
kmike opened this issue Feb 27, 2015 · 8 comments
Open

Add an option to send requests to Splash by default #11

kmike opened this issue Feb 27, 2015 · 8 comments

Comments

@kmike
Copy link
Member

kmike commented Feb 27, 2015

We could create a middleware which adds 'splash' meta key to all requests, or to all requests matching some pattern. It could also decode the results to make the whole thing more or less transparent.

Is it a good idea? Or are explicit requests enough?

@pawelmhm
Copy link

I would imagine this could work by just adding spider attribute splash_enabled=True. It attribute == True every request should have 'splash' meta key. Or we could even go without meta key just check spider attribute splash_enabled if True use splash for requests.

@chekunkov
Copy link

From my experience - it's convenient to be able to enable splash for entire spider. Maybe if we add support for 'splash' spider argument with splash options - it would be enough? For example

class MySpider(scrapy.Spider):
    start_urls = ["http://example.com", "http://example.com/foo"]
    splash = {
        'args': {
            # set rendering arguments here
            'html': 1,
            'png': 1,

            # 'url' is prefilled from request url
        },

        # optional parameters
        'endpoint': 'render.json',  # optional; default is render.json
        'splash_url': '<url>',      # overrides SPLASH_URL
        'slot_policy': scrapyjs.SlotPolicy.PER_DOMAIN,
    }


yield Request(url, self.parse_result, meta={
    # enable if True
    # disable splash if bool(response.meta('splash')) is False
    'splash': True, 
})

@chekunkov
Copy link

@pawelmhm if you just set Spider.splash = True it's not clear how to pass splash options per spider

@pawelmhm
Copy link

right @chekunkov if you only add splash_enabled argument we would also need splash_options (so two arguments needed) so maybe it would be easier to have splash_options and this will enable middleware, sounds good.

@kmike
Copy link
Member Author

kmike commented Apr 10, 2015

I like it, but I'm not sure this

yield Request(url, self.parse_result, meta={'splash': True})

is better than this:

yield Request(url, self.parse_result, meta={'splash': self.splash_options})

@chekunkov
Copy link

@kmike good point

main idea of meta={'splash': True/False} was ability to disable splash per request if it was enabled for entire spider. but I looked at #15 now - this is what 'dont_proxy' can be used for, so no need to use meta={'splash': True}. If developer wants to use default spider config - he doesn't use 'splash' key. If developer doesn't want to use splash for some request - he can use meta={'dont_proxy': True}. If developer wants to use different splash config - he can use meta={'splash': {'args': {}, ...} as usual. wdyt?

@pawelmhm
Copy link

also question is: do we need 'meta' key for all splash requests? I think we don't need it if splash middleware is enabled in spider attribute.

@chekunkov
Copy link

@pawelmhm please check comment above :)

If developer wants to use default spider config - he doesn't use 'splash' key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants