-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better indicate the interruption reason #584
Comments
Hi @benoit74 , will follow up further tomorrow but some of the rationale for the 11 exit code is here: #549. Essentially, it's useful to have exit codes that Browsertrix can pick up on to know whether or not to restart crawler pods. Of course, this could be done through looking for several exit codes and in general we could use a better rationalization of what exit code is given when, so I think you're right that there is room for improvement here! |
Yep, using the exit code for zimit is also our goal, but we realize we need more fine-grained details than only one "general" 11 exit code. Especially since exit code 11 is now returned for far more than the original Issue #549 makes me realize that this part of the documentation seems to have been lost when transitioning to MkDocs, this issue should probably also add this back somewhere. All that been said, no rush, better to well define the plan than rushing into something which will not make it in the end. |
After some thought, I propose that:
Proposed new stats format:
Are you OK with this idea? May I propose a PR? |
We have three things which can stop the crawler in the middle of a run:
--sizeLimit
: the maximum warc size--timeLimit
: the maximum duration of the crawl--diskUtilization
: the maximum disk usage (in percentage) ; crawler stops if threshold is reached OR expected to be reachedAs can be seen in the flag names, the disk one is not named Limit and this shows that it's different.
We understand the size and time limits as requests by the user to stop (crawling) when reaching that point.
We understand the diskUtilization one as a technical safety net.
Currently, all these two limits + technical safety net + the browser disconnection leads to an exit code 11, which makes it hard to diagnose / automate for users (especially zimit ^^)
Would it make sense from your PoV to implement different return code for each limit / technical safety net / browser disconnection?
I can work on this issue if ok for you.
The text was updated successfully, but these errors were encountered: