Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citing GBIF data properly #374

Open
jhnwllr opened this issue May 17, 2022 · 3 comments
Open

Citing GBIF data properly #374

jhnwllr opened this issue May 17, 2022 · 3 comments
Milestone

Comments

@jhnwllr
Copy link

jhnwllr commented May 17, 2022

Hello I am writing from GBIF.

I am doing a small outreach to those R packages that use GBIF occurrence search.

Under the terms of the GBIF data user agreement, users who download data agree to cite a DOI. Good citation also rewards data-publishing institutions and individuals by reinforcing the value of sharing open data and demonstrating its impact to their funders.

https://docs.ropensci.org/rgbif/articles/gbif_citations.html
https://www.gbif.org/citation-guidelines

Unfortunately, when using the occurrence search, rather than the occurrence download, one does not receive a citable DOI.

Because occurrence search is easier for some users to use, we have created something called derived datasets, which allows users to create a citable DOI after they have pulled the data from the GBIF public API.

https://www.gbif.org/derived-dataset

As a package maintainer, it would be appreciated by GBIF, if you could remind users in the documentation or with warning messages to cite the GBIF mediated data properly, perhaps by linking to one of these articles:

https://docs.ropensci.org/rgbif/articles/gbif_citations.html
https://www.gbif.org/citation-guidelines
https://www.gbif.org/derived-dataset

Also important to remind users to keep the datasetKey column because this allows for proper attribution to the original data providers.

@gepinillab
Copy link
Member

Dear John (@jhnwllr),

The Wallace team is already aware that the recent update of the rgbif package makes it easier to obtain a citable DOI. It is fantastic the implementation of the derived datasets, and we will consider their implementation in our package in the future.

In the last few years, we worked on the second version of our package, which we will soon submit the manuscript to review. One of its new features is obtaining a citable DOI using the occCite package (thanks to the collaboration with @hannahlowens). So, users will have an option #ToCiteDOI of GBIF data when they want to download all the occurrences.

Thanks for writing us about this critical topic. We will be looking to improve DOI citations for occurrence searches in future releases.

Regards,
Gonzalo Pinilla

@dnoesgaard
Copy link

Hi Gonzalo,

I've been playing around with the v1.9 beta of Wallace, including the implementation of occCite for getting occurrences. I think this is a significant improvement, so thanks for that!

That being said, considering that citing GBIF using a DOI is a requirement of the terms of the GBIF data user agreement, I would love to see this implementation as the default behaviour in Wallace rather than optional.

Another solution (also mentioned by John) could be to retain the datasetKey column in the data pulled using spocc, allowing the user to create a derived dataset record for citing only the specific records used in the downstream analysis.

Thanks,
Daniel

@gepinillab
Copy link
Member

Hi Daniel,

Thanks for checking the current implementation of occCite in Wallace. I am glad that you like it. Unfortunately, making this option as default is not possible because of i) the time that could take to download some species with thousands of records and ii) the possibility that used machines could not handle "massive" occurrences databases (RAM issue of R handling data).

I think that the way to go is with derived datasets. First, I believe it is mediately easy to create an option in Wallace to download a CSV file with datasetKey and occurrence counts that are ready to upload to gbif.org/derived-dataset. Also, we can generate a template for the description field required in this website, mentioning how the data was obtained and processed in Wallace. This will make more accessible the registration of these datasets for the users.

I will share this issue with the rest of our development team to get a potential timeline for its implementation in Wallace. We will keep you posted.

Best,
Gonzalo

@gepinillab gepinillab added this to the Someday milestone Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants