Generate taxonomic checklists and occurrence collections from biodiversity collections like GBIF, iDigBio. Converts DwCA tracked by Preston into parquet and sequence files to enable parallel processing in a compute cluster.
This library relies on an apache spark and Mesos/HDFS clusters to:
- generate checklists
- generate occurrence collection
- import Darwin Core Archive into apache parquet data formats
At time of writing (June 2017), this library is used by and . Note that effechecka and freshdata projects are not longer active.
This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.