You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
'scrapy_splash.SplashMiddleware': 725 —— just noticed different behaviors within or without the config, can someone help to give some advices>
enable the setting, I got nothing been crawled and the info: 2024-10-20 15:45:00 [scrapy.downloadermiddlewares.offsite] DEBUG: Filtered offsite request to 'localhost': <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> 2024-10-20 15:45:00 [scrapy.core.engine] DEBUG: Signal handler scrapy.downloadermiddlewares.offsite.OffsiteMiddleware.request_scheduled dropped request <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> before it reached the scheduler.
disable the setting, I got the html source code but none javascript file been rendered 2024-10-20 15:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamchoi.co.uk/overs/detailed> (referer: None) b'<!doctype html>\n<html class="no-js" lang="en">\n\n <head>\n <meta charset="utf-8">\n <title>Football Statistics For Betting</title>\n <meta name="description" content="The best football statistics for popular betting markets | BTTS | Corners | Cards | Booking Points | Over 2.5 Goals | Both Teams To Score | BTTS and Win">\n <meta name="keywords" content="bets prediction betting site football statistics stats btts both teams to score overs corners cards tips booking points team goals">\n <meta name="twitter:card" content="summary_large_image" />\n <meta name="twitter:site" content="https://www.adamchoi.co.uk" />\n <meta name="twitter:title" content="Football Statistics For Betting" />\n <meta name="twitter:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world." />\n <meta name="twitter:image" content="https://www.adamchoi.co.uk/images/og.png?v=1" />\n <meta property="og:title" content="Football Statistics For Betting"/>\n <meta property="og:url" content="https://www.adamchoi.co.uk"/>\n <meta property="og:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world."/>\n <meta property="og:image" content="https://www.adamchoi.co.uk/images/og.png?v=1"/>\n <meta property="og:locale" content="en_GB"/>\n <meta property="og:type" content="website"/>\n <meta name="viewport" content="width=device-width">\n\n <base href=\'/\'>\n <link rel="stylesheet" href="dist/css/vendor-bundle-599428b2b3.css">\n <link rel="stylesheet" href="dist/css/app-bundle-17088dbaef.css?v=1">\n\n <script src="dist/js/vendor-bundle-bebd0fdb69.js"></script>\n <script src="dist/js/app-bundle-798a12ba74.js"></script>\n <!-- endbuild -->\n\n <script>\n (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n </script>\n\n <!-- Google Analytics -->\n <script async src="https://www.googletagmanager.com/gtag/js?id=G-8MTGZ91RT2"></script>\n <script>\n window.dataLayer = window.dataLayer || [];\n function gtag(){dataLayer.push(arguments);}\n gtag(\'js\', new Date());\n\n gtag(\'config\', \'G-8MTGZ91RT2\');\n </script>\n\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n \'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,\'script\',\'dataLayer\',\'GTM-5GQQMBP\');</script>\n <!-- End Google Tag Manager -->\n\n <!-- Google Ad Manager -->\n <script async src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>\n <script>\n window.googletag = window.googletag || {cmd: []};\n\n googletag.cmd.push(function() {\n googletag.pubads().enableLazyLoad();\n googletag.pubads().setCentering(true);\n googletag.pubads().collapseEmptyDivs();\n setInterval(function(){ googletag.pubads().refresh(); }, 30000);\n });\n\n </script>\n\n </head>\n \n <body>\n <!-- Google Tag Manager (noscript) -->\n <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5GQQMBP"\n height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>\n <!-- End Google Tag Manager (noscript) -->\n\n <div data-ng-app="adamChoiStatsApp">\n\n <div data-ui-view="rootView">\n\n </div>\n </div>\n\n <script defer src="https://static.cloudflareinsights.com/beacon.min.js/vcd15cbe7772f49c399c6a5babf22c1241717689176015" integrity="sha512-ZpsOmlRQV6y907TI0dKBHq9Md29nnaEIPlkf84rnaERnq6zvWvPUqr2ft8M1aS28oN72PdrCzSjY4U6VaAw1EQ==" data-cf-beacon=\'{"rayId":"8d5759fa98982ab4","version":"2024.10.1","r":1,"serverTiming":{"name":{"cfExtPri":true,"cfL4":true,"cfSpeedBrain":true,"cfCacheStatus":true}},"token":"4a403f83ab324f8d9ddbdcd08ed7ae8d","b":1}\' crossorigin="anonymous"></script>\n</body>\n\n</html>\n'
my spider file
`import scrapy
from scrapy_splash import SplashRequest
class AdamchoiSpider(scrapy.Spider):
name = "adamchoi"
allowed_domains = ["www.adamchoi.co.uk"]
# start_urls = ["https://www.adamchoi.co.uk/overs/detailed"]
'scrapy_splash.SplashMiddleware': 725
—— just noticed different behaviors within or without the config, can someone help to give some advices>enable the setting, I got nothing been crawled and the info:
2024-10-20 15:45:00 [scrapy.downloadermiddlewares.offsite] DEBUG: Filtered offsite request to 'localhost': <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> 2024-10-20 15:45:00 [scrapy.core.engine] DEBUG: Signal handler scrapy.downloadermiddlewares.offsite.OffsiteMiddleware.request_scheduled dropped request <GET https://www.adamchoi.co.uk/overs/detailed via http://localhost:8050/execute> before it reached the scheduler.
disable the setting, I got the html source code but none javascript file been rendered
2024-10-20 15:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamchoi.co.uk/overs/detailed> (referer: None) b'<!doctype html>\n<html class="no-js" lang="en">\n\n <head>\n <meta charset="utf-8">\n <title>Football Statistics For Betting</title>\n <meta name="description" content="The best football statistics for popular betting markets | BTTS | Corners | Cards | Booking Points | Over 2.5 Goals | Both Teams To Score | BTTS and Win">\n <meta name="keywords" content="bets prediction betting site football statistics stats btts both teams to score overs corners cards tips booking points team goals">\n <meta name="twitter:card" content="summary_large_image" />\n <meta name="twitter:site" content="https://www.adamchoi.co.uk" />\n <meta name="twitter:title" content="Football Statistics For Betting" />\n <meta name="twitter:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world." />\n <meta name="twitter:image" content="https://www.adamchoi.co.uk/images/og.png?v=1" />\n <meta property="og:title" content="Football Statistics For Betting"/>\n <meta property="og:url" content="https://www.adamchoi.co.uk"/>\n <meta property="og:description" content="BTTS, Corners, Cards, Booking Points, Overs, Team Goals, BTTS & Win statistics for betting. Many more markets covered across over 50 leagues around the world."/>\n <meta property="og:image" content="https://www.adamchoi.co.uk/images/og.png?v=1"/>\n <meta property="og:locale" content="en_GB"/>\n <meta property="og:type" content="website"/>\n <meta name="viewport" content="width=device-width">\n\n <base href=\'/\'>\n <link rel="stylesheet" href="dist/css/vendor-bundle-599428b2b3.css">\n <link rel="stylesheet" href="dist/css/app-bundle-17088dbaef.css?v=1">\n\n <script src="dist/js/vendor-bundle-bebd0fdb69.js"></script>\n <script src="dist/js/app-bundle-798a12ba74.js"></script>\n <!-- endbuild -->\n\n <script>\n (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n </script>\n\n <!-- Google Analytics -->\n <script async src="https://www.googletagmanager.com/gtag/js?id=G-8MTGZ91RT2"></script>\n <script>\n window.dataLayer = window.dataLayer || [];\n function gtag(){dataLayer.push(arguments);}\n gtag(\'js\', new Date());\n\n gtag(\'config\', \'G-8MTGZ91RT2\');\n </script>\n\n <!-- Google Tag Manager -->\n <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n \'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n })(window,document,\'script\',\'dataLayer\',\'GTM-5GQQMBP\');</script>\n <!-- End Google Tag Manager -->\n\n <!-- Google Ad Manager -->\n <script async src="https://securepubads.g.doubleclick.net/tag/js/gpt.js"></script>\n <script>\n window.googletag = window.googletag || {cmd: []};\n\n googletag.cmd.push(function() {\n googletag.pubads().enableLazyLoad();\n googletag.pubads().setCentering(true);\n googletag.pubads().collapseEmptyDivs();\n setInterval(function(){ googletag.pubads().refresh(); }, 30000);\n });\n\n </script>\n\n </head>\n \n <body>\n <!-- Google Tag Manager (noscript) -->\n <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5GQQMBP"\n height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>\n <!-- End Google Tag Manager (noscript) -->\n\n <div data-ng-app="adamChoiStatsApp">\n\n <div data-ui-view="rootView">\n\n </div>\n </div>\n\n <script defer src="https://static.cloudflareinsights.com/beacon.min.js/vcd15cbe7772f49c399c6a5babf22c1241717689176015" integrity="sha512-ZpsOmlRQV6y907TI0dKBHq9Md29nnaEIPlkf84rnaERnq6zvWvPUqr2ft8M1aS28oN72PdrCzSjY4U6VaAw1EQ==" data-cf-beacon=\'{"rayId":"8d5759fa98982ab4","version":"2024.10.1","r":1,"serverTiming":{"name":{"cfExtPri":true,"cfL4":true,"cfSpeedBrain":true,"cfCacheStatus":true}},"token":"4a403f83ab324f8d9ddbdcd08ed7ae8d","b":1}\' crossorigin="anonymous"></script>\n</body>\n\n</html>\n'
my spider file
`import scrapy
from scrapy_splash import SplashRequest
class AdamchoiSpider(scrapy.Spider):
name = "adamchoi"
allowed_domains = ["www.adamchoi.co.uk"]
# start_urls = ["https://www.adamchoi.co.uk/overs/detailed"]
`
my setting file
`SPLASH_URL = 'http://localhost:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
`
The text was updated successfully, but these errors were encountered: