penguin international Data Engineer assessment test
This scraping bot created using python scrapy and selenium.
This bot scrape data from the website "rewardsforjustice.net"
This bot scrape Terrorism page url, category, title, rewards amount, associated organization(s), associated location(s), about, image url(s), and date of birth (in ISO date format).
this bot generated an json output file with file name as combination of spider name, date and time. Ex: terrorism_20240402_135900.json (SpiderName_SpiderDate_SpiderTime.json)
- Clone the repo.
- open TerrorismSpider.py file
- At line no. 38 change executable_path with your chrome driver path
- open terminal from terrorism folder
- Run spider by typing "scrapy crawl terrorism" in terminal
- after finishing crawling, it will generate a json file in terrorism folder