diff --git a/docs/contributing.md b/docs/contributing.md index 6324a2ad..945253c9 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -116,8 +116,8 @@ When coding a new scraper, there are a few important conventions to follow: - If it's a new state folder, add an empty `__init__.py` to the folder - Create a `Site` class inside the agency's scraper module with the following attributes/methods: - `name` - Official name of the agency - - `scrape_meta` - generates a CSV with metadata about videos and other available files (file name, URL, and size at minimum) - - `scrape` - uses the CSV generated by `scrape_meta` to download videos and other files + - `scrape_meta` - generates a JSON with metadata about videos and other available files (file name, URL, and size at minimum) + - `download_agency` - uses the JSON generated by `scrape_meta` to download videos and other files Below is a pared down version of San Diego's [Site](https://github.com/biglocalnews/clean-scraper/blob/main/clean/ca/san_diego_pd.py) class to illustrate these conventions. @@ -278,6 +278,7 @@ Options: Commands: list List all available agencies and their slugs. scrape-meta Command-line interface for generating metadata CSV about... + download_agency Downloads assets retrieved in scrape-meta ``` Running a state is as simple as passing arguments to the appropriate subcommand. @@ -292,7 +293,7 @@ pipenv run python -m clean.cli list pipenv run python -m clean.cli scrape-meta ca_san_diego_pd # Trigger file downloads using agency slug -pipenv run python -m clean.cli scrape ca_san_diego_pd +pipenv run python -m clean.cli download_agency ca_san_diego_pd ``` For more verbose logging, you can ask the system to show debugging information. diff --git a/docs/usage.md b/docs/usage.md index eab910dc..9cb4c283 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -31,14 +31,14 @@ You can then run a scraper for an agency using its slug: clean-scraper scrape-meta ca_san_diego_pd ``` -> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `scrape` subcommand. +> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `download_agency` subcommand. To use the `clean` library in Python, import an agency's scraper and run it directly. ```python from clean.ca import san_diego_pd -san_diego_pd.scrape() +san_diego_pd.download_agency() ``` ## Configuration @@ -56,6 +56,7 @@ Options: --help Show this message and exit. Commands: - list List all available agencies and their slugs. - scrape-meta Command-line interface for downloading CLEAN files. + list List all available agencies and their slugs. + scrape-meta Command-line interface for generating metadata CSV about... + download_agency Downloads assets retrieved in scrape-meta ```