You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.
I am using Django + Dryscrape to provide a web based on demand crawling. The code is as below
fromdjango.shortcutsimportrenderfromdjango.httpimportHttpResponsefromdjango.views.genericimportViewfrom .formsimportScrapeFormimportdryscrapeclassScrape(View):
defget(self, request):
form=ScrapeForm()
returnrender(request, 'index.html', {'form': form})
defpost(self, request, *args, **kwargs):
dryscrape.start_xvfb()
form=ScrapeForm(request.POST)
ifform.is_valid():
try:
sess=dryscrape.Session(base_url=form.data['BASE_URL'])
sess.set_attribute('auto_load_images', True)
# sess.set_timeout(30)sess.visit(form.data['BASE_URL'] +form.data['URL'])
x=sess.wait_for_safe(lambda: sess.at_xpath(form.data['XPATH']))
# x = sess.at_xpath(form.data['XPATH'])ifx:
returnHttpResponse(x.text())
else:
returnHttpResponse('No Element Found with the given xpath')
exceptExceptionase:
if(e.__doc__):
printe.__doc__if(e.message):
printe.messageif (e.__doc__):
returnHttpResponse('Scraping of page failed :: \n'+e.__doc__+'\n'+e.message)
else:
returnHttpResponse('Scraping of page failed');
returnrender(request, 'index.html', {'form': form})
The problem I am facing right now is that, if the scraping fails for one url, the webkit server does not seem to restart unless I kill all services and restart again.
Is there a simple way that I can restart the webkit server when it crashes?
The error message
Raised when the Webkit server closed the connection unexpectedly.
Unexpected end of file
After the above error, unless I restart the services, the scraping does not work. The web app however is functional, presenting the error message Scraping of page failed.
Any solution would be highly helpful.
Note: This happens only on AWS ubuntu instance. Works fine on my Macbook Pro.
Thanks.
The text was updated successfully, but these errors were encountered:
@arvindr21 EC2 won't block requests, @MRHarrison I ended up using a combination of selenium and PhantomJS to get around these issues with webkit_server
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello,
I am using Django + Dryscrape to provide a web based on demand crawling. The code is as below
The problem I am facing right now is that, if the scraping fails for one url, the webkit server does not seem to restart unless I kill all services and restart again.
Is there a simple way that I can restart the webkit server when it crashes?
The error message
After the above error, unless I restart the services, the scraping does not work. The web app however is functional, presenting the error message
Scraping of page failed
.Any solution would be highly helpful.
Note: This happens only on AWS ubuntu instance. Works fine on my Macbook Pro.
Thanks.
The text was updated successfully, but these errors were encountered: