This application is a crawler that extracts data from Wikipedia São Paulo's Page, Wikipedia Rio de Janeiro's Page, Wikipedia Minas Gerais' Page.
Therefore, as a result from the spider's execution, the files below contain Santa Catarina's cities information:
minas_gerais.csv
and minas_gerais.json
, rio_de_janeiro.csv
and rio_de_janeiro.json
, sao_paulo.csv
and sao_paulo.json
To run our application, please create venv:
sudo apt-get install python3-venv
python3 -m venv webscraping
source webscraping/bin/activate
Install the dependencies with the following command:
make install
Then, to run the spider, type the following command:
make run
or
scrapy crawl cities