CheerioCrawler rate limits? #2062
sharkstate
started this conversation in
General
Replies: 1 comment 1 reply
-
There is no limit (at least not enabled by default), the problem you are facing is because you are reusing the same crawler instance, with the same storage and its own state. The state is locally stored in the We actually made some improvements in this field in the latest version (#2056), so with v3.5.3 you can do this: const getCrawler = () => {
return new CheerioCrawler({
requestHandler: async () => {},
}, new Configuration({
storageClient: new MemoryStorage({
persistStorage: false,
}),
}));
};
const a = getCrawler();
await a.run([
{ url: 'https://example.org/' },
]);
const b = getCrawler();
await b.run([
{ url: 'https://example.org/' },
]); We are working on some guides around this topic. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Im using CheerioCrawler and notice that after a certain number of crawler loops (crawling 10 pages per crawl) CheerioCrawler stops processing the posted urls, eg 10 urls are posted through await crawler.run(urls); but it jumps directly to "CheerioCrawler: All requests from the queue have been processed, the crawler will shut down".
I cant find info about this limit, how it works, and if it can be adjusted?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions