-
Notifications
You must be signed in to change notification settings - Fork 0
/
insertThreatData.Rmd
113 lines (82 loc) · 5.26 KB
/
insertThreatData.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "Inserting threatened species with the colSpList API"
output:
github_document:
number_section: true
---
************************
**Note**:
This document was created from a Rmarkdown document, with the output format "github_document".
In order to use this type of file, please install the packages *knitr* and *rmarkdown* in R.
1. If you want to compile the document as a markdown document for github, while applying all the code contained in the file
+ use ```rmarkdown::render("file.Rmd")```
2. If you want to extract the R code of the document as a R script
+ use ```knitr::purl("file.Rmd")```
***********************
In order to show how to work with the colSpList API and the threatened species, we will use the species list from the official "Resolución 1912 de 2017, from the Colombian ministry of environment (publicly available at <https://ipt.biodiversidad.co/sib/resource?r=resolucion1912-2017mads>)
# Basic usage : example of one only species
The first species of the list is *Philonotis striatula* Jaeger, 1875 and has the "VU" (vulnerable) status.
In order to send the status to the database, we use a json "dictionary" with the following elements:
* the identification of the species, with either:
+ *gbifkey* : integer corresponding to the taxonKey used in GBIF
+ *scientificname* : string corresponding to the scientificName in the GBIF backbone
+ *canonicalname* : string corresponding to the canonicalName in the GBIF backbone (formally, it is better to use the canonicalNameWithMarker) used by the name parser of the GBIF backbone)
* *threatstatus* : the status level as defined by the IUCN
* *ref_citation* : a list of the references on which are based the inclusion of the taxon and its threatened status in the API
* *link* : a list of the url links (corresponding in length and order to the *ref_citation*)
* *comments* : comment on the threatened status of the taxon
## In R
```{r}
require(httr)
require(jsonify)
sendJson <- to_json(list(gbifkey = as.integer(2676236), threatstatus = "VU", ref_citation = list("Ministerio de Ambiente y Desarrollo Sostenible . (2020): Lista de especies silvestres amenazadas de la diversidad biológica continental y marino-costera de Colombia - Resolución 1912 de 2017 expedida por el Ministerio de Ambiente y Desarrollo Sostenible. v2.5. Ministerio de Ambiente y Desarrollo Sostenible. Dataset/Checklist. https://doi.org/10.15472/5an5tz"),link =list( "https://ipt.biodiversidad.co/sib/resource?r=resolucion1912-2017mads#anchor-downloads" )),unbox=T)
baseURL <- 'http://localhost:5000'
baseResource <- "insertThreat"
POST(paste(baseURL,baseResource,sep="/"),body=sendJson, content_type("application/json"))
```
## In python
```{python}
import requests
import json
from flask import jsonify
url = 'http://localhost:5000/insertThreat'
dictTosend = {"gbifkey": 2676236, "threatstatus": "VU", "ref_citation":["Ministerio de Ambiente y Desarrollo Sostenible . (2020): Lista de especies silvestres amenazadas de la diversidad biológica continental y marino-costera de Colombia - Resolución 1912 de 2017 expedida por el Ministerio de Ambiente y Desarrollo Sostenible. v2.5. Ministerio de Ambiente y Desarrollo Sostenible. Dataset/Checklist. https://doi.org/10.15472/5an5tz"],"link": ["https://ipt.biodiversidad.co/sib/resource?r=resolucion1912-2017mads#anchor-downloads"]}
x = requests.post(url, json = dictTosend)
x.json()
```
# A list of species and their status
In order to insert all the taxa in one script, using the same POST endpoint of the API (another endpoint will come later, allowing to post directly a list of species and their statuses), we may use the following codes.
Note that since the "distribution.txt" file included in the dataset includes the gbif codes, we do not need to use the "taxon.txt" file
In the examples here, we will simply exclude the 2 taxa which are referenced from EOL
## In R
```{r}
file = "../../data/dwca-resolucion1912-2017mads-v2.5/distribution.txt"
listSpThreat <- read.csv(file,sep="\t")
regexExtractGbifkey <- "^gbif\\.org/species/([0-9]+)$"
listSpThreat <- listSpThreat [grepl(regexExtractGbifkey,listSpThreat$id),]
tabToSend <- data.frame(gbifkey = as.integer(sub(regexExtractGbifkey,"\\1",listSpThreat$id)), threatstatus= listSpThreat$threatStatus)
res = apply(tabToSend,1,function(x,bU,bR,ref,link)
{sendJson = to_json(list(gbifkey=x[1],threatstatus=x[2],ref_citation = list(ref),link = list(link)),unbox = T)
return(POST(paste(bU,bR,sep="/"),body=sendJson, content_type("application/json")))
},
bU=baseURL,
bR=baseResource,
ref="Ministerio de Ambiente y Desarrollo Sostenible . (2020): Lista de especies silvestres amenazadas de la diversidad biológica continental y marino-costera de Colombia - Resolución 1912 de 2017 expedida por el Ministerio de Ambiente y Desarrollo Sostenible. v2.5. Ministerio de Ambiente y Desarrollo Sostenible. Dataset/Checklist. https://doi.org/10.15472/5an5tz",
link = "https://ipt.biodiversidad.co/sib/resource?r=resolucion1912-2017mads#anchor-downloads" )
```
<!--
## In python
```python
import csv
import pandas as pd
import re
file = "../data/dwca-resolucion1912-2017mads-v2.5/distribution.txt"
def readAndInsertApi()
listSpStatus = pd.read_csv(file, sep='\t')
```
-->
# problems
```{r}
pbs <- !sapply(res,function(x)"cd_tax"%in%names(content(x)))
sum(pbs)
```