Skip to content
numfah23 edited this page Apr 26, 2016 · 20 revisions

Week 10 (4/26/16): GBIF/IUCN Get Common Name and Taxonomy

Wrote a function to extract common name and taxonomy with a search query (genus species) as input from GBIF and IUCN, separately. The getCommonNameTaxonomy for GBIF involves an API call to retrieve the taxonKey, then a search for common names using the taxonKey. I used a set to store the common names, already filtered to keep only those in English, so that there will be no repetitive common names. Currently, capitalization of first letter and lower case first letter of the same word are still considered different. However, this is not a problem because we decided to use the getCommonNameTaxonomy for IUCN which is much cleaner. The IUCN function requires only 1 API call and returns 1 main common name, which is enough for our purposes.

Week 9 (4/19/16): getCountryCode Function

To improve the previous getCountryCode function (used to be a dictionary that was manually put in), I looked into different packages that could return the 2 letter country code from a country name in a better way. Found that the Country Name Full Text on REST Countries seemed to work pretty well. However, we decided that the autofill can actually take care of the 2 letter country code input, so this may not be needed.

Week 8 (4/12/16): GBIF Get Count with Location

Implemented the search_gbif_location function that gives the count for a country. Since GBIF's /occurrence/count GET method takes in a taxonKey and a 2 letter country code as input, I made an API call to extract the taxonKey from the scientific name as well as wrote an additional function that returns the iso 3166-1 alpha-2 letter country code from the country name. Currently, the getCountryCode function looks up the country name in a dictionary that I manually put in, so it only works if the first letter of the country name is capitalized.

Week 7 (4/5/16): GBIF Get Count Function

GBIF has a /occurrence/count GET method that I was trying to use last time (because it supports searching the number of records in a specific country). However, scientificName or the simple search parameter q isn't supported. Searching by taxonKey would have also worked, but that would require an additional API call to extract the taxonKey from the scientific name. So I decided to just use the same occurrence/search?scientificName GET method as the other GBIF API function for returning taxonomy since there is already a count provided. I fixed the formatting and removed unnecessary redundant parts from the last version. In the future, if we need to search the count by country, we should consider switching to the /occurrence/count GET method.

Week 6 (3/29/16): Other Databases and GBIF Functions

I looked into the other databases from last week, and found that none of them seem to be potential databases: VertNet gives location by continent instead of by country. ITIS requires multiple AJAX calls, which we don't want. BOLD has the same cross-origin restriction as Species+ and also provides the data in XML format instead of JSON.

So I worked on the GBIF Functions instead. The api_gbif_functions.js file now has 2 functions. The first one, searchGbif, takes in a scientific name in the format of a string with genus species as input. If the search finds results, it returns an object with a dictionary containing a) species: string with genus species and b) taxonomy: array with kingdom, phylum, order, family, and genus. If no results are found, the function returns null. The second function, getCountGbif, also takes in the same input as searchGbif, but instead returns a count of the number of entries in the form of observations found by the search (0 if search finds no results).

Note: Currently, the only relevant files in my branch are 1) api_gbif_functions.js and 2) index_gbif_function.html

Week 5 (3/15/16): More Databases

VertNet

http://vertnet.org/
Summary: Only has vertebrates, but pulls data from FishNet, MaNIS, HerpNET, and ORNIS (main databases for fish, mammals, amphibians and reptiles, and birds).
API: https://github.com/VertNet/webapp/wiki/Introduction-to-the-VertNet-API
API call example for Puma concolor: [http://api.vertnet-portal.appspot.com/api/search?q={"q":"genus:puma specificepithet:concolor"}](http://api.vertnet-portal.appspot.com/api/search?q={"q":"genus:puma specificepithet:concolor"})

Integrated Taxonomic Information System (ITIS)

http://www.itis.gov/
Summary: Requires multiple ajax calls (1 to get all the TSNs, and more to search for location of each data entry).
API: http://www.itis.gov/ws_description.html
API call example for Puma concolor:

  1. input = common name, output = extracted list of TSNs (taxon serial number)
    [http://www.itis.gov/ITISWebService/jsonservice/getITISTermsFromCommonName?srchKey=Puma concolor](http://www.itis.gov/ITISWebService/jsonservice/getITISTermsFromCommonName?srchKey=Puma concolor)
  2. input = a single TSN, output = location by continent(s) or null (needs to be repeated for each TSN)
    http://www.itis.gov/ITISWebService/jsonservice/getGeographicDivisionsFromTSN?tsn=552479

Week 4 (3/8/16): Writing Search Function

I added 2 new javascript files, api_species+_function.js and api_gbif_function.js, which contain a search function for the Species+ and GBIF APIs, respectively. The Species+ API data has a field total_entries, which I used to determine whether the search completed successfully or not. In the case that total_entries is not zero, the function, which waits until the ajax call is done and dfd is resolved, returns the data. In the case that the search found no results, the function returns null. For the GBIF API, initially I had filtered out entries that had inaturalist as a source as well as manually told it to loop through the first 50 pages and have 300 entries per page (couldn't find the total number of entries anywhere in the response headers). So for simplicity, I decided to only extract the species and taxonomy data from the first entry of the first page for now, since all the results are just going to be identical. The rest of the function works the same way the Species+ API does (either returns an object with the species and taxonomy or null).

Week 3 (3/1/16): Cleaning more API Data

I modified the javascript file from last week to now query data from the GBIF API instead of Species+ API (still searching for Puma concolor data). I collected data for the following fields: (1) taxa (includes kingdom, family, class, order, genus, and species; currently all separated), (2) location (country, latitude, and longitude), and (3) references (excluding data from inaturalist). Because we already have a script to query data from the inaturalist API, keeping the data obtained from GBIF that references the inaturalist database will be redundant. By looking at the data extracted from the ajax call, I found that data from the inaturalist database has a field labeled datasetName, whereas others do not. So I filtered out entries that have this field to eliminate redundant data. Additionally, the GBIF API calls also have a default limit of 20 entries per page and it only initially returns data from the first page, so I added a for loop to loop through the first 50 pages and changed the limit to 300 entries per page (total of 15000 entries).

Week 2 (2/23/16): Cleaning API Data

This week, I wrote a javascript script to query data from the Species+ API for Puma concolor. I requested the authentication code from the Species+ website and added it into the ajax call to be passed via a HTTP header. After reading the API documentation, I found that the relevant API calls were getting: (1) Taxon concepts, (2) Distributions, and (3) References. From taxon concepts, I collected data for the common name (this should be important for users looking to find information on a specific organism but searching with a synonymous term), higher taxa (includes kingdom, family, class, and order), and taxon ID (this is only so that I can use it as an input for the API calls to Distribution and References, we actually don’t need this data). From distributions, I realized that I was able to find both the location (specifically, which country the organism was found in) as well as the reference that reported the observation. I was planning to do API calls to both distributions and references, but only ended up doing it for distributions since the corresponding reference data was also available from the same call. To summarize, I made 2 API calls from the Species+ database and collected data on common names, higher taxa, locations, and references for Puma concolor.
Note: It doesn't work on Google Chrome (I think it has something to do with cross-origin restriction), but works perfectly fine on Safari

Week 1 (2/16/16): Insects/Bugs Database

AntWeb

https://www.antweb.org/
AntWeb is the world’s largest online database of images, specimen records, and natural history information on ants. AntWeb focuses on specimen level data and images linked to specimens. In addition, contributors can submit natural history information and field images that are linked directly to taxonomic names. Distribution maps and field guides are generated automatically. All data in AntWeb are downloadable by users. AntWeb also provides specimen-level data, images, and natural history content to the Global Biodiversity Information Facility (GBIF), the Encyclopedia of Life (EOL.org), and Wikipedia.
API: https://www.antweb.org/api/v2/

Symbiota: Symbiota Collections of Arthropods Network (SCAN)

http://symbiota.org/docs/
The Symbiota specimen search engine enables collections to be queried by taxonomic, geographic, and collector details.
http://symbiota4.acis.ufl.edu/scan/portal/index.php
The Symbiota Collections of Arthropods Network (SCAN) data portal houses arthropod occurrence records from the original Southwest Collections of Arthropods Network as well as an ever-growing number of collections. These additional collections are drawn from a much wider selection of geographic locations and arthropod taxa. SCAN is built on Symbiota, a web-based collections database system that is used for other taxonomic data portals, including (Symbiota Portals).
Data can be downloaded in either Symbiota Native or Darwin Core structure, and file format either CSV or Tab Delimited.

Biodiversity Information Serving Our Nation (BISON)

http://bison.usgs.ornl.gov/
Researchers collect species occurrence data, records of an organism at a particular time in a particular place, as a primary or ancillary function of many biological field investigations. Biodiversity Information Serving Our Nation (BISON) is committed to providing free and open access to primary species occurrence data. Data currently available through BISON are contributed by various U.S. Federal and State agencies, universities, and non-profit organizations, either directly to BISON or indirectly through their participation in the Global Biodiversity Information Facility (GBIF).
API: http://bison.usgs.ornl.gov/#api

Planetary Biodiversity Inventory (PBI)

http://research.amnh.org/pbi/
The Planetary Biodiversity Inventory (PBI) for Plant Bugs provides a resource about the global plant bug subfamilies Orthotylinae and Phylinae (Insecta: Heteroptera: Miridae) to systematists, insect ecologists, conservation biologists, the general public, and students via the Internet.
Searches are linked to: http://www.discoverlife.org/
Data can be downloaded as a CSV file.

San Diego Biodiversity Database

http://sdbiodiversity.ucsd.edu/info/index.html
We are creating a species inventory for invertebrate animals at the Scripps Coastal Reserve. The San Diego Biodiversity Database contains photos, ecological information, and sequence data for the organisms we have collected so far.
Data can be downloaded as either CSV or FASTA.

Clone this wiki locally