-
Which package is this bug report for? If unsure which one to select, leave blank@crawlee/cheerio (CheerioCrawler) Issue descriptionIt appears that the requestHandler continues processing the same crawl request beyond the designated requestHandler timeout. Although I set requestHandlerTimeoutSecs to 30 seconds, the requestHandler persists in processing the request even after the timeout, as evident in the logs.
Code sampleconst { CheerioCrawler, Configuration, RequestQueue } = require('crawlee')
async function main () {
const config = Configuration.getGlobalConfig()
config.set('persistStorage', false)
const requestQueue = await RequestQueue.open()
const crawler = new CheerioCrawler({
requestQueue,
minConcurrency: 10,
maxConcurrency: 50,
maxRequestRetries: 0,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
keepAlive: true,
async requestHandler ({ request, $ }) {
console.log(`Processing ${request.url}...`)
await new Promise(resolve => setTimeout(resolve, 50000))
console.log('finished')
response = await requestQueue.getInfo()
// pendingRequestCount: 0
// totalRequestCount: 1
// No Request in request queue but request Handler still working
console.log(response)
console.log('Processing Request')
await new Promise(resolve => setTimeout(resolve, 60000))
console.log('Request Finished')
},
failedRequestHandler ({ request }) {
console.log(`Request ${request.url} failed.`)
}
})
// Run the crawler and wait for it to finish.
crawler.run()
requestQueue.addRequest({ url: 'https://crawlee.dev' })
}
main() Package version3.5.4 Node.js version18.17.0 Operating systemNo response Apify platform
I have tested this on the
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Technically, you can't cancel a running promise, this is how JS works, once it starts, it will finish eventually, the limit is only about not waiting for it (and canceling everything that happens afterward, as that is what we can do). That timeout is about failing the |
Beta Was this translation helpful? Give feedback.
Technically, you can't cancel a running promise, this is how JS works, once it starts, it will finish eventually, the limit is only about not waiting for it (and canceling everything that happens afterward, as that is what we can do). That timeout is about failing the
requestHandler
, not canceling what's inside it.