You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while updating the markup of the resources, I stumbled upon how to properly encode the @ids of SequenceAnnotation and SequenceRange. The thing I want to encode is:
I have a protein ("@id": "https://disprot.org/DP03543") which has three hasSequenceAnnotation objects, each with it's own SequenceRange:
SequenceAnnotation ("@id": "https://disprot.org/DP03543#disorder-content")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
saying that this whole protein (1..96) has a disorder content of 0.99
SequenceAnnotation ("@id": "https://disprot.org/DP03543r001")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
saying that this protein region (1..96) is disordered (ontology)
SequenceAnnotation ("@id": "https://disprot.org/DP03543r003")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.10_50")
saying that this protein region (10..50) is modulated...
Note that the first two SequenceAnnotations share the same SequenceRange.
An alternative version would be with modified SequenceRange@ids like this:
SequenceAnnotation ("@id": "https://disprot.org/DP03543#disorder-content")
with SequenceRange ("@id": "https://disprot.org/DP03543#sequence-location.1_96")
SequenceAnnotation ("@id": "https://disprot.org/DP03543r001")
with SequenceRange ("@id": "https://disprot.org/DP03543r001#sequence-location.1_96")
SequenceAnnotation ("@id": "https://disprot.org/DP03543r003")
with SequenceRange ("@id": "https://disprot.org/DP03543r003#sequence-location.10_50")
where each SequenceAnnotation has it's own SequenceRange so now, the first two SequenceRanges become separated nodes in the graph. Bottom line is should we treat SequenceRange as child node of SequenceAnnotation or somehow link it to the parent node of SequenceAnnotation (in this case Protein node, with all implied changes to the profile)?
Which solution is more correct conceptually? And of course, easier to process in the IDP-KG?
The text was updated successfully, but these errors were encountered:
This is a question of what is being modelled here, and ultimately what is of interest.
The way things are currently modelled, it is the annotation that is of interest. We distinguish the annotation and then define properties on it, e.g. sequence range, disordered content of 0.99, ontology label, and link to the literature. We end up with multiple annotations on the same region each with their own identity.
Why are these annotations kept separate in the DisProt model?
However, if I've understood your proposal correctly, you are suggesting changing the emphasis here to the sub-sequence. We would then identify the sub-sequence and add properties to it that would currently come from multiple annotations, e.g. disordered content, links to the literature, tagging from the ontology. This would have the nice effect that details that are currently contained in multiple annotations are merged together into a single object.
What would be desirable for users of the IDP-KG?
We could mimic the latter with the current model but would require more complex query processing (probably less desirable).
We could do processing during the transformation step
We could change the Bioschemas model
Would the distinguishing of sub-sequences hold across data sources, particularly those coming from autonomous organisations, i.e. would we be interested in merging content coming from different providers into a single sequence region annotation?
while updating the markup of the resources, I stumbled upon how to properly encode the
@id
s ofSequenceAnnotation
andSequenceRange
. The thing I want to encode is:I have a protein (
"@id": "https://disprot.org/DP03543"
) which has threehasSequenceAnnotation
objects, each with it's ownSequenceRange
:SequenceAnnotation
("@id": "https://disprot.org/DP03543#disorder-content"
)with
SequenceRange
("@id": "https://disprot.org/DP03543#sequence-location.1_96"
)saying that this whole protein (1..96) has a disorder content of 0.99
SequenceAnnotation
("@id": "https://disprot.org/DP03543r001"
)with
SequenceRange
("@id": "https://disprot.org/DP03543#sequence-location.1_96"
)saying that this protein region (1..96) is disordered (ontology)
SequenceAnnotation
("@id": "https://disprot.org/DP03543r003"
)with
SequenceRange
("@id": "https://disprot.org/DP03543#sequence-location.10_50"
)saying that this protein region (10..50) is modulated...
Note that the first two
SequenceAnnotations
share the sameSequenceRange
.An alternative version would be with modified
SequenceRange
@id
s like this:SequenceAnnotation
("@id": "https://disprot.org/DP03543#disorder-content"
)with
SequenceRange
("@id": "https://disprot.org/DP03543#sequence-location.1_96"
)SequenceAnnotation
("@id": "https://disprot.org/DP03543r001"
)with
SequenceRange
("@id": "https://disprot.org/DP03543r001#sequence-location.1_96"
)SequenceAnnotation
("@id": "https://disprot.org/DP03543r003"
)with
SequenceRange
("@id": "https://disprot.org/DP03543r003#sequence-location.10_50"
)where each
SequenceAnnotation
has it's ownSequenceRange
so now, the first twoSequenceRange
s become separated nodes in the graph. Bottom line is should we treatSequenceRange
as child node ofSequenceAnnotation
or somehow link it to the parent node ofSequenceAnnotation
(in this caseProtein
node, with all implied changes to the profile)?Which solution is more correct conceptually? And of course, easier to process in the IDP-KG?
The text was updated successfully, but these errors were encountered: