-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible data extraction from Consilita et Narrationes of Kekaumenos #36
Comments
I queried the SAWS endpoint for Kekaumenos data, the graph appears to be mostly a more or less direct projection of the underlying XML sources. E.g. the following query extracts all the text nodes (spans) of the english Kekaumenos translation: select ?object ?p ?o
where {
?object a saws:LinguisticObject ;
saws:fallsWithin* <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsEng01> ;
<http://www.homermultitext.org/cts/rdf/hasTextContent> ?text .
minus {?object ^saws:fallsWithin ?upper .}
?object ?p ?o .
}
order by ?object Unfortunately I haven't found a public SPARQL endpoint yet and at this point I'm not sure if it even exists. This obviously complicates machine processing of the data. Since the graph basically mirrors the XML sources, analyzing the XML data directly would be an option, but the XML source download links give me 404s. 🤨 I will probably contact the people from SAWS about this. |
Question: Would this be sufficient to extract all Kekaumenos text nodes? I would like to verify the "Consilia et Narrationes" provenance, which should be connected as an F2 (?). PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
select distinct ?e33 ?text
where {
?e13 a crm:E13_Attribute_Assignment ;
crm:P14_carried_out_by <https://r11.eu/rdf/resource/773a32fb-f5ab-4898-926c-354bfe0171ba> ;
crm:P17_was_motivated_by ?e33 .
?e33 a crm:E33_Linguistic_Object ;
crm:P190_has_symbolic_content ?text .
} |
Sorry for the late reply - that query would get you all the text nodes that were used as sources for statements we are saying Kekaumenos made. While I think it would work, there is the possibility that someone else is credited with something from Kekaumenos' text, or that the author is credited with assertions that don't come from this text. So I would rather suggest to start from the text, get the publication, the passage, and thence the assertion. (You can drop the query for
|
I was able to query the SAWS SPARQL endpoint for annotated text nodes with Python now (httpx), processed/grouped the SPARQL results and generated JSON output for further processing. Next steps include comparing the JSON outputs to each other and looking at/figuring out the SPARQL query for extracting Kekaumenos text literals from the GraphDB store. |
Two things need consideration:
If the actual goal is to establish a connection between the Releven graph and the SAWS graph (which seems reasonable), then no further data processing is needed. E.g. mappings from greek to english LinguistObjects are already established in the SAWS graph.
Identity conditions for equivalent text passages and string matching parameters for text nodes in both graphs need to be carefully evaluated. For example the E33/E73 node {
"node_id": "http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.divsection3.o101.a91",
"data": {
"is_variant_of": "http://titus.fkidg1.uni-frankfurt.de/texte/etcs/grie/sept/sept.htm?sept858.htm#VT_Sir._7_27",
"a": "http://purl.org/saws/ontology#LinguisticObject",
"falls_within": "http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.divsection3.o101",
"is_close_translation_of": null,
"has_text_content": "περὶ τιμὴσ γονέων Μνήσθητι ὠδῖνος μητρός",
"provenance": "CMR",
"rdf_schema_label": "divedition.divsection3.o101.a91"
}
} But obvioulsy there aren't any strong markers that would allow to assert a text passage match with reasonable certainty. |
Actually, given the literal text data in the Releven graph, I guess it would be quite correct to assert a match for "μητρός" in that case. It certainly is not wrong. |
I would use thefuzz for string matching btw, it implements and neatly wraps Levenshtein distance string comparison. |
Currently getting 404s from the SAWS SPARQL endpoint.. 🤷♂️ Might be due to Temporary Archive Notice:
Just emailed KDL support. In the worst case, the JSON data I persisted is all we got for some time. Guess being paranoid and persisting data paid off! :D |
SAWS Kekaumenos edition gives me a "Problem loading document list from server" prompt. |
Update: King’s Digital Laboratory answered my mail, they are currently putting a static version of the site up until the migration is completed. Greek and english Kekaumenos editions still give me "Problem loading document list from server". I will check again what might be possible with the data I pulled when the triplestore was still up, but I am not sure if this is sufficient to link the two graphs. |
Rouche's edition of the Consilia et Narrationes of Kekaumenos
The text was updated successfully, but these errors were encountered: