Proper encoding of @ids in SequenceAnnotation and SequenceRange #28

ivanmicetic · 2022-05-27T09:44:45Z

while updating the markup of the resources, I stumbled upon how to properly encode the @ids of SequenceAnnotation and SequenceRange. The thing I want to encode is:

I have a protein ("@id": "https://disprot.org/DP03543") which has three hasSequenceAnnotation objects, each with it's own SequenceRange:

SequenceAnnotation ("@id": "https://disprot.org/DP03543#disorder-content")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
saying that this whole protein (1..96) has a disorder content of 0.99
SequenceAnnotation ("@id": "https://disprot.org/DP03543r001")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
saying that this protein region (1..96) is disordered (ontology)
SequenceAnnotation ("@id": "https://disprot.org/DP03543r003")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.10_50")
saying that this protein region (10..50) is modulated...

Note that the first two SequenceAnnotations share the same SequenceRange.

An alternative version would be with modified SequenceRange @ids like this:

SequenceAnnotation ("@id": "https://disprot.org/DP03543#disorder-content")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
SequenceAnnotation ("@id": "https://disprot.org/DP03543r001")
with SequenceRange ("@id": "https://disprot.org/DP03543r001#sequence-location.1_96")
SequenceAnnotation ("@id": "https://disprot.org/DP03543r003")
with SequenceRange ("@id": "https://disprot.org/DP03543r003#sequence-location.10_50")

where each SequenceAnnotation has it's own SequenceRange so now, the first two SequenceRanges become separated nodes in the graph. Bottom line is should we treat SequenceRange as child node of SequenceAnnotation or somehow link it to the parent node of SequenceAnnotation (in this case Protein node, with all implied changes to the profile)?

Which solution is more correct conceptually? And of course, easier to process in the IDP-KG?

The text was updated successfully, but these errors were encountered:

AlasdairGray · 2022-06-20T13:04:38Z

This is a question of what is being modelled here, and ultimately what is of interest.

The way things are currently modelled, it is the annotation that is of interest. We distinguish the annotation and then define properties on it, e.g. sequence range, disordered content of 0.99, ontology label, and link to the literature. We end up with multiple annotations on the same region each with their own identity.
Why are these annotations kept separate in the DisProt model?

However, if I've understood your proposal correctly, you are suggesting changing the emphasis here to the sub-sequence. We would then identify the sub-sequence and add properties to it that would currently come from multiple annotations, e.g. disordered content, links to the literature, tagging from the ontology. This would have the nice effect that details that are currently contained in multiple annotations are merged together into a single object.

What would be desirable for users of the IDP-KG?

We could mimic the latter with the current model but would require more complex query processing (probably less desirable).
We could do processing during the transformation step
We could change the Bioschemas model

Would the distinguishing of sub-sequences hold across data sources, particularly those coming from autonomous organisations, i.e. would we be interested in merging content coming from different providers into a single sequence region annotation?

ivanmicetic assigned AlasdairGray May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper encoding of @ids in SequenceAnnotation and SequenceRange #28

Proper encoding of @ids in SequenceAnnotation and SequenceRange #28

ivanmicetic commented May 27, 2022

AlasdairGray commented Jun 20, 2022

Proper encoding of @ids in SequenceAnnotation and SequenceRange #28

Proper encoding of @ids in SequenceAnnotation and SequenceRange #28

Comments

ivanmicetic commented May 27, 2022

AlasdairGray commented Jun 20, 2022