AlwaysUpdate ~ Web Crawler and Scraper 📰

AlwaysUpdate is an e-NewsPaper from Argentina, Colombia, Venezuela and Mexico, that update its news every day.

Getting started 🚀

Things that you need to have installed in your system: 🛠️

Python 3.7
pip
virtualenv
AlwaysUpdate ~ DataScience API

Configuration 🔧

Virtual enviroment

virtualenv venv --python=python.3.7
source venv/bin/activate

Dependencies installation

pip install -r requirements.txt

System Variables

export API_URL="$DATASCIENCE_API_HOST/api/v1/"
export GOOGLE_APPLICATION_CREDENTIALS="credentials.json"

Execution

You can execute the crawler with a POST request, in that case you must start the uvicorn server:

cd news_crawler_scraper
uvicorn app.main:app --reload

If you don't want to work with the server you can use:

python go_spyder_$JOURNAL_NAME.py

Journals:

eltiempo
lanacion
eluniversal
xataka

Contributing ✒️

Pull requests are welcome!. And if you have an idea for a feature and dont have time to do this, feel free to open a issue!

Demo

License 📄

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AlwaysUpdate ~ Web Crawler and Scraper 📰

Getting started 🚀

Things that you need to have installed in your system: 🛠️

Configuration 🔧

Virtual enviroment

Dependencies installation

System Variables

Execution

Contributing ✒️

Demo

License 📄

Files

README.md

Latest commit

History

README.md

File metadata and controls

AlwaysUpdate ~ Web Crawler and Scraper 📰

Getting started 🚀

Things that you need to have installed in your system: 🛠️

Configuration 🔧

Virtual enviroment

Dependencies installation

System Variables

Execution

Contributing ✒️

Demo

License 📄