-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflict of interests missing from xml output #1142
Comments
Hi @mariadelmarq, thanks for reporting this problem. Could you please send me the PDF of this issue and on #1143 at luca AT sciencialab.com? I'm not able to access them via the pubmed / publisher portal 😅 |
Sent, thanks heaps for looking into it! |
Thanks for sending the files, I'm sorry, I did not have time to check them till now. For the file discussed in this issue, there are two issues:
|
Hello ! Indeed Conflict Of Interest section is not part of the funding section and is considered as a section on its own. However it's not identified explicitly as such by Grobid yet. This is something to do in the future, so extend the segmentation and header models to explicitly recognize COI sections, which is not something complicated I think. I already received this request, COI is more and more common. About the text lost in the header, what is labeled with |
Thank you both so much for looking into this. For the other articles I'm looking at, Conflict of Interest statements tend to end up in the back matter tag, either one or two divs down, or sometimes within a note tag. Sometimes they do end up in the body, though, which is ok for me, as long as they're somewhere. |
Hi,
We are looking into using Grobid for a project to look into conflict of interest, funding, and other transparency statements in published articles. These statements are put in different random locations depending on the publisher, sometimes in footnotes, sometimes after that abstract, sometime in the back matter, etc.
For the published pdf for this particular article (not the author manuscript, which is open access, but the actual published pdf by the APA): https://pubmed.ncbi.nlm.nih.gov/27819460/, Grobid does well to extract the funding information from paragraph 4 of the footnote on page 1, but the conflict of interest, contained in paragraph 5 of the same footnote, is missing from the xml output. I suspect perhaps Grobid does not know where to put it in the xml... Is there any chance this has an easy fix?
The text was updated successfully, but these errors were encountered: