Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbol3-10807 - Clarify which Component's sequence should be linked to locations #492

Open
goksel opened this issue Oct 18, 2022 · 8 comments
Assignees
Labels

Comments

@goksel
Copy link
Contributor

goksel commented Oct 18, 2022

According to the validation rule 10807, the location object's sequence properties should point subComponent.instanceOf.Sequence rather than via the parent component that contains SubComponent. Can we please double check if this rule is correct?

sbol3-10807 �- If a SubComponent object has at least one hasLocation and zero sourceLocation properties, and the Component linked by its instanceOf has precisely one hasSequence property whose Sequence has a value for its elements property, then the sum of the lengths of the Location objects referred to by the hasLocation properties MUST equal the length of the elements value of the Sequence.

Considering the following example. I assume Range1.hasSequence points to parent_Sequence1 rather than child_Sequence1.

:parent a sbol:Component ;
sbol:hasFeature <parent/SubComponent1> ;
sbol:hasSequence :parent_Sequence1 .

<parent/SubComponent1>
a sbol:SubComponent ;
sbol:hasLocation <parent/SubComponent1/Range1> ;
sbol:instanceOf :child .

:parent_Sequence1 a sbol:Sequence ;
sbol:elements "atgcgtaaaggagaagaacttttca" .

<parent/SubComponent1/Range1>
a sbol:Range ;
sbol:hasSequence :parent_Sequence1 .

:child a sbol:Component ;
sbol:hasSequence :child_Sequence1 .

:child_Sequence1 a sbol:Sequence ;
sbol:elements "atg" .

@jakebeal
Copy link
Contributor

I'm having a hard time with your example: your range objects are missing their required start and end locations, so I can't tell if the example complies with the rule or not.

If Range1 had start=1 and end=3, however, that would comply with the rule, since then the length of the Range would be 3, and that is the same as the length of the elements of child_Sequence1.

Here's the key idea in this rule: in the prior rule, sbol3-10806, we can directly compare sequence lengths because we have both a sourceLocation (pointing to a Sequence associated with the child) and a hasLocation (pointing to a Sequence associated with the parent).

But sourceLocation is optional: what do we do if we don't have a sourceLocation? In this case, we fall based to using the child's whole sequence. Rule sbol3-10807 is checking that the lengths are compatible for that case.

@cjmyers
Copy link
Contributor

cjmyers commented Oct 19, 2022

I'm also confused by your example. I thought hasSequence is used to identify which sequence is where the subComponent is found in the case that the Component referenced by the instanceOf property has more than one sequence linked to it. In this case, pointing to the parent_Sequence1 should not be allowed, since it is not one of the sequences for child.

@goksel
Copy link
Contributor Author

goksel commented Oct 19, 2022

I changed the example as shown below. Could you please check if it is correct now (including the range start and end locations)? Range.hasSequence now points to sequences from the child components.

I think the example below will now violate sbol3-11402 ("The value of the end property of a Range MUST be greater than zero and less than or equal to the length of the elements value of theSequence referred to by its hasSequence property").

Additionally, according to sbol3-11302, Range.hasSequence should point to sequences from the parent. As a result, this example would also violate sbol3-11302 ("For every Location that is not an EntireSequence and that is the value of a hasLocation property of a Feature, the value of its hasSequence property MUST also either be a value of the hasSequence property of the parent Component or else be the value of some hasSequence property of an EntireSequence that is also a child of the same Component").

I think I misinterpret some of these validation rules, but I'm not sure which ones. Are you aware of any use cases or example diagrams I can look at? I probably need to revise all the validation rules that depend on the hasLocation and sourceLocation properties.

:parent a sbol:Component ;
sbol:hasFeature <parent/SubComponent1> ;
sbol:hasSequence :parent_Sequence1 ;

:parent_Sequence1 a sbol:Sequence ;
sbol:elements "atgcgtaaaggagaagaacttttca" ;

:child a sbol:Component ;
sbol:hasSequence :child_Sequence1 ;

:child_Sequence1 a sbol:Sequence ;
sbol:elements "atgaaa" ;

<parent/SubComponent1>
a sbol:SubComponent ;
sbol:hasLocation <parent/SubComponent1/Range2> , <parent/SubComponent1/Range1> ;
sbol:instanceOf :child ;

<parent/SubComponent1/Range1>
a sbol:Range ;
sbol:hasSequence :child_Sequence1 ;
sbol:start "1" ;
sbol:end "3" .

<parent/SubComponent1/Range2>
a sbol:Range ;
sbol:hasSequence :child_Sequence1 ;
sbol:start "7" ;
sbol:end "9" .

@jakebeal
Copy link
Contributor

You are correct: your example violates both of those rules, as expected:

  • "What is the 9th element of 'atgaaa'?" would get you an index error in any programming language.
  • The child's sequence should show up in sourceLocation (which talks about the child) and not hasLocation, which talks about the parent.

@goksel
Copy link
Contributor Author

goksel commented Oct 21, 2022

Chris, Jake, apologies, I'm still not understanding the rule 10807 (If a SubComponent object has at least one hasLocation and zero sourceLocation properties, and the Component linked by its instanceOf has precisely one hasSequence property whose Sequence has a value for its elements property, then the sum of the lengths of the Location objects referred to by the hasLocation properties MUST equal the length of the elements value of the Sequence.).

I'm trying to come up with a case that satisfies this rule. But, I'm still struggling to understand this rule biologically. Would you be able to please send me a biologically valid SBOL example for this validation rule?

According to the rule 10807, I assume the following (I must misunderstand something here):

  • The location has zero sourceLocation properties ("...SubComponent object has at least one hasLocation and zero sourceLocation properties..."). This means we are not describing the child sequence.
  • There is at least one hasLocation property. To create a valid case, I'm trying to have two hasLocation properties with range properties. I assumed the range properties will be the coordinates in the parent. I initially assumed the hasSequence of a Location would point to the parent's sequence.
  • The child has got one sequence only ("... the Component linked by its instanceOf has precisely one hasSequence property...").

@jakebeal
Copy link
Contributor

jakebeal commented Oct 21, 2022

The typical case of Rule 10807 is including a whole part.

For an example, take a look at the Anderson Promoters package in the iGEM distribution. Specifically, let's look at the plasmid insert for J23100, Anderson_Promoters_in_vector_ins_BBa_J23100.

  • This Component includes a SubComponent5 child whose instanceOf points to BBa_J23100.
  • The SubComponent5 has a hasLocation for its child SubComponent5/Range1, which has start=181 and end=215, for a length of 35 bp.
  • There is no sourceLocation for SubComponent5, which means we default to assuming that the entire sequence is being used.
  • When we look at BBa_J23100 we find it has precisely one hasSequence property, pointing to BBa_J23100_sequence.
  • When we look at BBa_J23100_sequence, we find that it has elements of ttgacggctagctcagtcctaggtacagtgctagc, which is 35 bp.

Thus, this construction says "the sequence for BBa_J23100 goes in location 181-215 on Anderson_Promoters_in_vector_ins_BBa_J23100" This is a valid construction because it takes a 35 bp sequence and puts it in a 35 bp location.

Here are some variations that are important for understanding the operation of the rule:

  • We might not yet know the sequence, in which case we can't check.
  • We might have more than one hasLocation, e.g., because we are inserting an intron or a recombinase site into the middle of the original sequence. In this case, we still need the whole thing to add up (e.g. "put 700 bp of sequence into a 300 bp location followed by a 400 bp location").

@gmisirli
Copy link

Many thanks @jakebeal

I provided an example for multiple locations, this is where I was confused. I hope this is correct. I extended your promoter example to specify the RNAP binding feature using two ranges (e.g. -35 and -10 boxes). I only showed the necessary properties in the example. Each range has 5 bases, and the binding sequence is 10 bases in total.

<plasmid/SubComponent1/Range1>
a sbol:Range ;
sbol:end "185" ;
sbol:hasSequence :plasmid_Sequence1 ;
sbol:start "181" .

:BBa_J23100_RNAPbinding
a sbol:Component ;
sbol:hasSequence :BBa_J23100_RNAPbinding_Sequence1 ;

<plasmid/SubComponent1>
a sbol:SubComponent ;
sbol:hasLocation <plasmid/SubComponent1/Range2> , <plasmid/SubComponent1/Range1> ;
sbol:instanceOf :BBa_J23100_RNAPbinding .

<plasmid/SubComponent1/Range2>
a sbol:Range ;
sbol:end "214" ;
sbol:hasSequence :plasmid_Sequence1 ;
sbol:start "210" .

:BBa_J23100_RNAPbinding_Sequence1
a sbol:Sequence ;
sbol:elements "ttgacctagc" ;

:plasmid a sbol:Component ;
sbol:hasFeature <plasmid/SubComponent1> ;
sbol:hasSequence :plasmid_Sequence1 ;

:plasmid_Sequence1 a sbol:Sequence ;
sbol:elements "aacgatgatgctcactc......";

@jakebeal
Copy link
Contributor

Yes, what you have written here is valid by rule sbol3-10807.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants