This repository contains scripts and tools developed as part of web scraping and API interaction with the Ecotaxa platform. The primary goal is to automate the extraction of plancton and other microorganism data, specifically focusing on images metadata, to facilitate research and analysis work.
Ecotaxa is a web application dedicated to the visual exploration and management of planktonic data. Accessing this rich platform programmatically requires understanding of the Ecotaxa API, authentication mechanisms, and data extraction techniques. This project aims to encapsulate these aspects into a user-friendly set of scripts.
- Required Python libraries:
requests
,beautifulsoup4
,selenium
,json
,csv
,tqdm
,os
.
- Clone this repository to your local machine:
git clone https://github.com/PlanktoScope/Ecotaxa-webscraping.git
- Navigate to the cloned directory:
cd ecotaxa-webscraping
- Webscarping using Ecotaxa API (specify: project ID & own Ecotaxa credentials):
ecotaxa_api_history.py
- Webscarping using Selenium (specify: project ID & own Ecotaxa credentials):
ecotaxa_scraping_v3.py
This project is licensed under the Apache-2.0.
Copyright Wassim Chakroun and PlanktoScope project contributors.