crawler_asyncio

A web crawler made using python3.5 with asyncio.

Running the Crawler

Make sure both pip and python correspond to a python of version >= 3.5

First, install the requirements in a virtualenv or globally: pip install -r requirements.txt.

In the project, simply run python crawler <args> > <name_of_map>.json where <args> are:

You can optionally run pip install . and then you can use crawl ...

Usage:
crawl (--domain=<dom> | --local --basedir=<dir>)
crawl -h | --help
crawl --version

Options:
-h --help             Show this screen.
--version             Show version.
-d --domain=<dom>     Domain of website.
-l --local            Use local or http [default: false].
-b --basedir=<dir>    Root directory of website.

NOTE: the domain must have the host in it (e.g. http://www.samcoope.com)

For example, map_of_blomfield.json contains the sitemap of www.tomblomfield.com, the result of running: python crawler -d http://tomblomfield.com > map_of_blomfield.json

Tests

To run the tests, simply run pytest test

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
crawler		crawler
test		test
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
map_of_blomfield.json		map_of_blomfield.json
notes.md		notes.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crawler_asyncio

Running the Crawler

Tests

About

Releases

Packages

Languages

License

coopie/crawler_asyncio

Folders and files

Latest commit

History

Repository files navigation

crawler_asyncio

Running the Crawler

Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages