Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOIs on multiple lines are split up #14

Open
michamos opened this issue Nov 17, 2016 · 5 comments
Open

DOIs on multiple lines are split up #14

michamos opened this issue Nov 17, 2016 · 5 comments

Comments

@michamos
Copy link
Contributor

We noticed with @kaplun that a DOI is sometimes broken up by TeX and appears on multiple lines. This is default behavior of the url LaTeX package (which is often used to typeset URLs and used internally by the hyperref package), contrarily to what @tsgit said earlier. In that case, only the part of the DOI on the first line is taken as a DOI, which is of course wrong.

The default behavior (in the \UrlBreaks and \UrlBigBreaks macros in url.sty) is to allow line breaks after:
. @ \ / ! _ | ; > ] ) , ? & ' + = # : and also after - if the [hyphens] option is passed to the package.

@michamos
Copy link
Contributor Author

Possible solution:

  1. if the DOI ends in some of those characters (to be specified), have refextract spit out both the naive DOI and the same with the next word appended, recursively, as possible DOIs
  2. call bibrank on all those DOIs that are substrings of each other, trying to match first the longest (to avoid the proceedings vs conference paper issue) and put the first match in the right MARC field
  3. if there is no match, try to resolve the DOIs to find the right one

I don't know if this is compatible with the way refextract and bibrank talk to each other

@tsgit
Copy link
Contributor

tsgit commented Nov 17, 2016

I stand corrected, tex does insert line breaks in dois when necessary in most circumstances

@kaplun
Copy link
Contributor

kaplun commented Nov 17, 2016

@michamos, was the case we were looking today on arXiv? In that case we can look-up the sources and better understand the circumstances.

@michamos
Copy link
Contributor Author

@kaplun yes, we were looking at an arXiv paper with this problem the other day. I don't really understand your remark though

@kaplun
Copy link
Contributor

kaplun commented Nov 21, 2016

Nevermind: I misread @tsgit's "I stand corrected"... I.e. we all agree that LaTeX can indeed break on newline DOIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants