Skip to content

Latest commit

 

History

History
71 lines (42 loc) · 4.47 KB

README.md

File metadata and controls

71 lines (42 loc) · 4.47 KB

Red

⚠️ Code is buggy ⚠️

1.0 Introduction:

2.0 Implementation

  1. The birds.csv and mammals.csv contain the species for which the data has to be scrapped.

  2. The permissions of the start.sh have to be changed before the first run of the code.

     user@computer:~/Red chmod +X start.sh
    
  3. The pipeline is triggered using the start.sh script, that in-turn triggers the scraper.py code.

     user@computer:~/Red ./start.sh
    
  4. The scrapped data is stored to the disc in the form of a X_WORKING.csv file, a copy of the original .csv, ensuring the originals are not tampered with.

3.0 Model Overview:

alt text Figure 2.1 Model to scrape data from IUCN Red List

3.1 Interface

  1. Disk write/read operations are handled by the interface.py code.

  2. The pandas dataframe is saved to the disc by the interface.py code after each run.

3.2 Scraper

  1. The scraper.py interacts with the webpage using the Selenium framework for performance testing.

  2. The HTML tags contained in the page_source gathered by the Selenium middleware code is made searchable using BeautifulSoup

  3. The scraper.py pipeline collects the prescribed HTML tags from the website queried and updates a pandas dataframe with the information.

  4. The speciesCounter() of the scraper.py script returns the sno of the last species that's missing the stable, unknown or decline population trend tags, which all scrapped species must have.

4.0 Known Issues:

  1. While writing elements to the pandas dataframe an element maybe right-shifting a column(s). This error may lead to a pandas memory warning, considreing entities of multiple datatypes occupy the same column.

  2. Some species are not indexed by the IUCN Red List. This may cause the start.sh script to loop while trying to collect the species URL from the searchpage.

Citation:

If you decide to use our client, scraper or cleaner for your project, or as a means to interface with the IUCN database, please cite our 2021 Conservation Letters paper!

@article{mendiratta2021mammal,
  title={Mammal and bird species ranges overlap with armed conflicts and associated conservation threats},
  author={Mendiratta, Uttara and Osuri, Anand M and Shetty, Sarthak J and Harihar, Abishek},
  journal={Conservation Letters},
  volume={14},
  number={5},
  pages={e12815},
  year={2021},
  publisher={Wiley Online Library}
}