RequestList and stateKeyPrefix #2440
-
I have a scraper that I want to maintain the state of it between runs. My goal is to not process any requests done previously on a run. So I had set the env var
I looked for stateKeyPrefix but found nothing and was wondering how you use that? My other issue I have is that when not purging the storage and maintaining the state between runs the source urls I start with obviously will not get processed again since they were already. How do I make it so I can still start with the source urls and add more as I currently do using the enqueueLinks function while maintaining the state? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I found a way to do this by using the crypto module and generating a randomuuid and just passing in my urls with a uniqueKey like:
This works, question is there a better way? |
Beta Was this translation helpful? Give feedback.
-
If you want to terminate a crawler and resume it, just disabling If you want to process some URLs every time you run the crawler, even if you already processed them before, then giving them a random unique key is a good solution. |
Beta Was this translation helpful? Give feedback.
If you want to terminate a crawler and resume it, just disabling
CRAWLEE_PURGE_ON_START
should be enough.RequestList
is not necessary, the defaultRequestQueue
will probably work better.If you want to process some URLs every time you run the crawler, even if you already processed them before, then giving them a random unique key is a good solution.