Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper encoding of @ids in SequenceAnnotation and SequenceRange #28

Open
ivanmicetic opened this issue May 27, 2022 · 1 comment
Open
Assignees

Comments

@ivanmicetic
Copy link
Member

while updating the markup of the resources, I stumbled upon how to properly encode the @ids of SequenceAnnotation and SequenceRange. The thing I want to encode is:

I have a protein ("@id": "https://disprot.org/DP03543") which has three hasSequenceAnnotation objects, each with it's own SequenceRange:

  1. SequenceAnnotation ("@id": "https://disprot.org/DP03543#disorder-content")
    with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
    saying that this whole protein (1..96) has a disorder content of 0.99

  2. SequenceAnnotation ("@id": "https://disprot.org/DP03543r001")
    with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
    saying that this protein region (1..96) is disordered (ontology)

  3. SequenceAnnotation ("@id": "https://disprot.org/DP03543r003")
    with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.10_50")
    saying that this protein region (10..50) is modulated...

Note that the first two SequenceAnnotations share the same SequenceRange.

An alternative version would be with modified SequenceRange @ids like this:

  1. SequenceAnnotation ("@id": "https://disprot.org/DP03543#disorder-content")
    with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")

  2. SequenceAnnotation ("@id": "https://disprot.org/DP03543r001")
    with SequenceRange ("@id": "https://disprot.org/DP03543r001#sequence-location.1_96")

  3. SequenceAnnotation ("@id": "https://disprot.org/DP03543r003")
    with SequenceRange ("@id": "https://disprot.org/DP03543r003#sequence-location.10_50")

where each SequenceAnnotation has it's own SequenceRange so now, the first two SequenceRanges become separated nodes in the graph. Bottom line is should we treat SequenceRange as child node of SequenceAnnotation or somehow link it to the parent node of SequenceAnnotation (in this case Protein node, with all implied changes to the profile)?

Which solution is more correct conceptually? And of course, easier to process in the IDP-KG?

@AlasdairGray
Copy link
Collaborator

This is a question of what is being modelled here, and ultimately what is of interest.

The way things are currently modelled, it is the annotation that is of interest. We distinguish the annotation and then define properties on it, e.g. sequence range, disordered content of 0.99, ontology label, and link to the literature. We end up with multiple annotations on the same region each with their own identity.
Why are these annotations kept separate in the DisProt model?

However, if I've understood your proposal correctly, you are suggesting changing the emphasis here to the sub-sequence. We would then identify the sub-sequence and add properties to it that would currently come from multiple annotations, e.g. disordered content, links to the literature, tagging from the ontology. This would have the nice effect that details that are currently contained in multiple annotations are merged together into a single object.

What would be desirable for users of the IDP-KG?

  1. We could mimic the latter with the current model but would require more complex query processing (probably less desirable).
  2. We could do processing during the transformation step
  3. We could change the Bioschemas model

Would the distinguishing of sub-sequences hold across data sources, particularly those coming from autonomous organisations, i.e. would we be interested in merging content coming from different providers into a single sequence region annotation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants