Skip to content

Commit

Permalink
Fix two minor bugs related to wpull usage (#85)
Browse files Browse the repository at this point in the history
* Fix bug with overwriting wpull progress DB

* Don't base command exit code on wpull's exit code

Wpull's application class returns a non-zero exit code if there are
any URL failures (for example DNS failure). We don't want our entire
management command to return non-zero because of this.
  • Loading branch information
chosak authored Nov 3, 2023
1 parent e550bdd commit c8ebafd
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions crawler/management/commands/crawl.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def command(start_url, db_filename, max_pages, depth, recreate, resume):
)

if not resume and os.path.exists(wpull_progress_filename):
os.path.remove(wpull_progress_filename)
os.remove(wpull_progress_filename)

arg_parser = AppArgumentParser()
args = arg_parser.parse_args(
Expand Down Expand Up @@ -79,6 +79,4 @@ def command(start_url, db_filename, max_pages, depth, recreate, resume):
# https://docs.djangoproject.com/en/3.2/topics/async/#async-safety
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"

exit_status = app.run_sync()
click.echo(f"done, exiting with status {exit_status}")
return exit_status
app.run_sync()

0 comments on commit c8ebafd

Please sign in to comment.