Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set regex pattern for CURIE and add URL, fixes #400 #406

Open
wants to merge 3 commits into
base: 1.4
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions TranslatorReasonerAPI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1103,7 +1103,7 @@ components:
CURIE for a Biolink 'qualifier' association slot, generally taken
from Biolink association slots designated for this purpose
(that is, association slots with names ending in 'qualifier')
e.g. biolink:subject_aspect_qualifier,
e.g. biolink:subject_aspect_qualifier,
biolink:subject_direction_qualifier,
biolink:object_aspect_qualifier, etc. Such qualifiers are used
to elaborate a second layer of meaning of a knowledge graph edge.
Expand Down Expand Up @@ -1181,9 +1181,16 @@ components:
by a colon, such as UniProtKB:P00738. Via an external context
definition, the CURIE prefix and colon may be replaced by a URI
prefix, such as http://identifiers.org/uniprot/, to form a full
URI.
URI. Conforms to https://www.w3.org/TR/curie/
externalDocs:
url: https://www.w3.org/TR/2010/NOTE-curie-20101216/
pattern: ^\S+:\S+$
URL:
type: string
description: >-
externalDocs:
url: https://www.ietf.org/rfc/rfc3987.txt
pattern: ^(http(s)?:\/\/.)?(www\.)?\S+$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this regular expression anything without a space? Is that really helpful? Testing this regexp:

#!/bin/env python3
import re

inputs = [ 'http://arax.ncats.io', 'foo', '@*$&@#', 'PMID:123', 'http://peptideatlas.org/tmp/hello world.txt' ]

for input in inputs:
    match = re.search(r'^(http(s)?:\/\/.)?(www\.)?\S+$', input)
    if match:
        print(f"MATCHES {input}")
    else:
        print(f"    x   {input}")

yields

MATCHES http://arax.ncats.io
MATCHES foo
MATCHES @*$&@#
MATCHES PMID:123
    x   http://peptideatlas.org/tmp/hello world.txt

only the last one fails.

Yet if you paste that into your browser, it works!

Maybe this is useful with a more restrictive regular expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought too was to find a more comprehensive one. The one I found and tested that was more restrictive did fail the line length linting on this repo and before I tried to break it into many lines, I started asking around a bit for best practice on this. The feedback I got was that a very restrictive regex will mean constant tweaking with "in the wild" implementations of URLs and CURIEs. However, if we want to have a URI (not L) type, that is much more restrictive, we can do that.

MetaKnowledgeGraph:
type: object
description: >-
Expand Down