A web crawler made using python3.5 with asyncio.
Make sure both pip and python correspond to a python of version >= 3.5
First, install the requirements in a virtualenv
or globally: pip install -r requirements.txt
.
In the project, simply run python crawler <args> > <name_of_map>.json
where <args>
are:
You can optionally run pip install .
and then you can use crawl ...
Usage:
crawl (--domain=<dom> | --local --basedir=<dir>)
crawl -h | --help
crawl --version
Options:
-h --help Show this screen.
--version Show version.
-d --domain=<dom> Domain of website.
-l --local Use local or http [default: false].
-b --basedir=<dir> Root directory of website.
NOTE: the domain must have the host in it (e.g. http://www.samcoope.com)
For example, map_of_blomfield.json
contains the sitemap of www.tomblomfield.com, the result of running: python crawler -d http://tomblomfield.com > map_of_blomfield.json
To run the tests, simply run pytest test