-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add redundant/missing legs to STAR assertions #30
Comments
Quick info concerning redundant/missing legs to STAR assertions: I implemented r11cli which I intend to be a general Command Line Interface for running commands on the R11 triplestore. The tool now has a subcommand 'starlegs' which runs a set of SPARQL construct queries (so far only for the gender assertions) to produce the missing/redundant assertions. See output. Options: The insert command will be implemented using named graphs. |
Update: I implemented the The graph is already in the store: r11cli_starlegs (still with just the gender assertions). My proposal for named graph medadata would be this: <graph_uri> a rdfg:Graph, sd:NamedGraph, crmdig:D9_Data_Object .
[a crmdig:D10_Software_Execution] crmdig:L11_had_output <graph_uri> ;
crm:P82_begin_of_the_begin "<time value>" ;
crmdig:L23_used_software_or_firmware [
a crmdig:D14_Software ;
P1_is_identified_by [
a crm:E42_Identifier ;
crm:P190_has_symbolic_content <script_uri>
] ;
crmdig:L12_happened_on_device [
a crmdig:D8_Digital_Device ;
crm:P129i_is_subject_of [
a crm:E73_Information_Object ;
crm:P2_has_type
<https://vocabs.sshopencloud.eu/browse/media-type/en/page/applicationslashjson> ;
crm:P190_has_symbolic_content "<json system info>."
]
] The system info will be extracted dynamically, e.g. on my machine it would be {
"system": "Linux",
"node": "arch-e14",
"release": "6.7.1-arch1-1",
"version": "#1 SMP PREEMPT_DYNAMIC Sun, 21 Jan 2024 22:14:10 +0000",
"machine": "x86_64",
"python_implementation": "CPython",
"python_version": "3.11.6"
} I will implement the metadata generation as soon as the model is approved. Todo:
|
Thanks - the metadata schema looks fine, though we might need to think about back-porting the generation metadata for the original PBW script, and for the death / location factoid generation, to this model. Concerning the STAR legs you have generated in the named graph, many of them are duplicates of triples that already exist in the main data graph (probably because these triples were generated by the original PBW script instead of via WissKI.) So it would be a good idea to check whether these triples already exist before creating them in the second graph. |
Update: Starleg construct requests are now generated dynamically using a simple (and hopefuly sufficiently generic) query builder based on a revised construct query template. I also implemented tests for Gender assertions, see tests_starlegs_queries. The tests work by first building a set of gender graphs with different constellations of missing legs (using combinatorics) and running SHACL constraints against every graph expecting SHACL validation to fail. Then every data graph is updated with the results of the respective construct query and the SHACL validation is run again - this time validation is expected to pass. |
Generated assertions for
See starlegs output |
Quick sketch for a very (probably overly) generic starlegs query: PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX star: <https://r11.eu/ns/star/>
construct {
?o crm:P14_carried_out_by ?agent .
?o crm:P17_was_motivated_by ?source .
}
where {
?e13_initial a crm:E13_Attribute_Assignment ;
crm:P14_carried_out_by ?agent ;
crm:P17_was_motivated_by ?source ;
crm:P140_assigned_attribute_to | crm:P141_assigned ?common .
?common ^crm:P140_assigned_attribute_to ?o .
filter (?o != ?e13_initial)
minus { ?o crm:P14_carried_out_by ?_agent . }
minus { ?o crm:P17_was_motivated_by ?_source . }
} This finds star nodes connected to an initial E13 and asserts the initial P14/P17 statements if the connected nodes miss P14/P17 assertions altogether. |
The status of the Starlegs problem is roughly this: The construct query for generating the missing assertions is actually rather simple, especially since the query does not have to use the OPTIONAL clauses (as Tara pointed out); the difficulty is to reliably indentify the classes that actually need the missing legs constructed. My new approach for doing this is to extract the initial star pattern classes ("TopStars") from the pathbuilder XML dump obtained from the WissKI API and apply the construct query to those classes. Quick digression/rantThe incovenience of expressing graph patterns with paths is that edge adjacency is highly verbose and repetitive. E.g. for stating that instances of <path_array>
<x>http://www.cidoc-crm.org/cidoc-crm/E21_Person</x>
<y>^http://www.cidoc-crm.org/cidoc-crm/P141_assigned</y>
<x>https://r11.eu/ns/star/E13_sdhss_P36</x>
</path_array>
<!-- ... -->
<path_array>
<x>http://www.cidoc-crm.org/cidoc-crm/E21_Person</x>
<y>^http://www.cidoc-crm.org/cidoc-crm/P141_assigned</y>
<x>https://r11.eu/ns/star/E13_sdhss_P36</x>
<y>http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to</y>
<x>https://r11.eu/ns/prosopography/C23</x>
<y>^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to</y>
<x>https://r11.eu/ns/star/E13_sdhss_P35</x>
<y>http://www.cidoc-crm.org/cidoc-crm/P141_assigned</y>
<x>https://r11.eu/ns/prosopography/C24</x>
</path_array>
<!-- ... -->
<path_array>
<x>http://www.cidoc-crm.org/cidoc-crm/E21_Person</x>
<y>^http://www.cidoc-crm.org/cidoc-crm/P141_assigned</y>
<x>https://r11.eu/ns/star/E13_sdhss_P36</x>
<y>http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by</y>
<x>http://www.cidoc-crm.org/cidoc-crm/E21_Person</x>
</path_array>
<!-- ... -->
<path_array>
<x>http://www.cidoc-crm.org/cidoc-crm/E21_Person</x>
<y>^http://www.cidoc-crm.org/cidoc-crm/P141_assigned</y>
<x>https://r11.eu/ns/star/E13_sdhss_P36</x>
<y>http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by</y>
<x>https://r11.eu/ns/spec/Author_Group</x>
</path_array>
<!-- ... -->
<path_array>
<x>http://www.cidoc-crm.org/cidoc-crm/E21_Person</x>
<y>^http://www.cidoc-crm.org/cidoc-crm/P141_assigned</y>
<x>https://r11.eu/ns/star/E13_sdhss_P36</x>
<y>http://www.cidoc-crm.org/cidoc-crm/P17_was_motivated_by</y>
<x>http://www.cidoc-crm.org/cidoc-crm/E73_Information_Object</x>
</path_array> Basically the same thing expressed in Turtle: [a star:E13_sdhss_P36]
crm:P140 [a C23] ;
# inferred: crm:p177 P36 ;
crm:P141 [a crm:E21] ;
crm:P14 [a crm:E21] ;
crm:P17 [a crm:E73] . Paths are good at expressing depth but very very bad at expressing breadth. RDF graphs usually exhibit high edge adjacency == breadth. Anyway. So what I am doing now is to XPATH-extract all first y nodes that are either The advantage of using the WissKI API information for finding the applicable |
Result of a quick and dirty run of the logic explained above: WissKITopStar(cls='https://r11.eu/ns/star/E13_lrmoo_R24', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P108', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P1', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P92', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P2', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_sdhss_P17', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P89', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P65', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_lrmoo_R15', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P51', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_sdhss_P36', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='http://www.cidoc-crm.org/cidoc-crm/E15_Identifier_Assignment', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_lrmoo_R17', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P100', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P128', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P196', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_spec_L1', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P56', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_sdhss_P13', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_sdhss_P26', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P107', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_lrmoo_R5', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P41', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P98', connector='^http://www.cidoc-crm.org/cidoc-crm/P141_assigned')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P45', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_sdhss_P38', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to')
WissKITopStar(cls='https://r11.eu/ns/star/E13_crm_P46', connector='^http://www.cidoc-crm.org/cidoc-crm/P140_assigned_attribute_to') The Todos:
|
The initial problem outline was roughly this: In a "star chain", if only the "top star" has I thought that by identifying the "top stars" using the WissKI API XML data, the problem could be solved by finding the P14/P17 assertions of a given top star and all connected stars in the chain and run a SPARQL construct query to generate those assertions. I looked at the WissKI pathbuilder data a bit more closely and noticed, that the above approach would almost certainly be insufficient. First, not all top stars in the pathbuilder chains even have P14/P17 assertions. Secondly, most top stars don't have P14 and P17 assertions. Thirdly, not only top stars have P14 or P17 assertions, but also stars in the chain. E.g. I translated the (completely incomprehensible) WissKI path representation for the [a E13_lrmoo_R17]
p140 [a F28] := f28 ;
p141 [a Text_Expression] := text_expression ;
p14 [a E21] ;
p14 [a Author_Group] .
[a E13_crm_p4]
p140 f28 ;
p141 [a E52] ;
p14 [a E21] ;
p14 [a Author_Group]
p17 [a E73] .
[a E13_crm_p14]
p140 f28 ;
p141 [a E21] ;
p141 [a Author_Group] ;
p14 [a E21] ;
p14 [a Author_Group] . This shows much better what is actually going on: Three E13 assertions are connected to the same F28. However, it remains kind of unclear, which entities need which P14/P17 assertions constructed. This cannot be generalized though, which becomes clear if one looks e.g. at the [a E15] p140 [a Boulloterion] := boulloterion ;
p37 [a E42] ;
p14 [a F11] .
[a E13_spec_L1] p140 boulloterion ;
p141 [a Lead_Seal] ;
p14 [a E21] ;
p14 [a Author_Group] ;
p17 [a E73] . The first P14 assertion is meant to have an F11 object, the second P14 assertion is meant to have either an E21 or Author_Group object. Generally, I feel like I do not have enough information to come up with a generic solution. Looking at the WissKI pathbuilder data showed that (my) previous assumptions about the data shapes and the actual problem at hand were faulty. So I think we need to discuss how the actual task can be exactly defined. |
Many of the things that get claimed (statements) require more than one triple/STAR object (assertion) in the data models we are using. Each of these assertions will have the same authority and source. In order to preserve the sanity of people using WissKI, I configured it so that the authority and source only get specified once per statement, which means that all but one of the assertions will technically be incomplete. We need a maintenance script that completes the missing 'legs' of the STAR assertions.
The text was updated successfully, but these errors were encountered: