Possible data extraction from Consilita et Narrationes of Kekaumenos #36

laletuver1 · 2024-03-21T13:40:10Z

Rouche's edition of the Consilia et Narrationes of Kekaumenos

lu-pl · 2024-10-08T08:14:13Z

I queried the SAWS endpoint for Kekaumenos data, the graph appears to be mostly a more or less direct projection of the underlying XML sources.

E.g. the following query extracts all the text nodes (spans) of the english Kekaumenos translation:

select ?object ?p ?o 
where {
  ?object a saws:LinguisticObject ;
          saws:fallsWithin* <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsEng01> ;
          <http://www.homermultitext.org/cts/rdf/hasTextContent> ?text .

  minus {?object ^saws:fallsWithin ?upper .}

  ?object ?p ?o .
}
order by ?object

Link to SNORQL interface

Unfortunately I haven't found a public SPARQL endpoint yet and at this point I'm not sure if it even exists. This obviously complicates machine processing of the data.

Since the graph basically mirrors the XML sources, analyzing the XML data directly would be an option, but the XML source download links give me 404s. 🤨

I will probably contact the people from SAWS about this.

lu-pl · 2024-10-08T09:45:39Z

Question: Would this be sufficient to extract all Kekaumenos text nodes? I would like to verify the "Consilia et Narrationes" provenance, which should be connected as an F2 (?).

PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>

select distinct ?e33 ?text
where {
    ?e13 a crm:E13_Attribute_Assignment ;
        crm:P14_carried_out_by <https://r11.eu/rdf/resource/773a32fb-f5ab-4898-926c-354bfe0171ba> ;
        crm:P17_was_motivated_by ?e33 .
    ?e33 a crm:E33_Linguistic_Object ;
        crm:P190_has_symbolic_content ?text .
}

tla · 2024-10-15T07:11:59Z

Sorry for the late reply - that query would get you all the text nodes that were used as sources for statements we are saying Kekaumenos made. While I think it would work, there is the possibility that someone else is credited with something from Kekaumenos' text, or that the author is credited with assertions that don't come from this text. So I would rather suggest to start from the text, get the publication, the passage, and thence the assertion. (You can drop the query for ?c if you don't need the assertion where the text node is used.)

PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX star: <https://r11.eu/ns/star/>
PREFIX data: <https://r11.eu/rdf/resource/>

select distinct ?e33 ?text
where {
    ?a crm:P141_assigned data:38da8bdf-c317-4796-be0f-3b6a3ff783c7 ;
       a star:E13_lrmoo_R5 ;
       crm:P140_assigned_attribute_to ?pub .
    ?b crm:P140_assigned_attribute_to ?pub ;
       a star:E13_lrmoo_R15 ;
       crm:P141_assigned ?e33 .
    ?c crm:P17_was_motivated_by ?e33 ;
       a crm:E13_Attribute_Assignment .
    ?e33 a crm:E33_Linguistic_Object ;
        crm:P190_has_symbolic_content ?text .
}

lu-pl · 2024-10-22T08:57:47Z

I was able to query the SAWS SPARQL endpoint for annotated text nodes with Python now (httpx), processed/grouped the SPARQL results and generated JSON output for further processing.

Next steps include comparing the JSON outputs to each other and looking at/figuring out the SPARQL query for extracting Kekaumenos text literals from the GraphDB store.

lu-pl · 2024-11-25T12:31:12Z

Two things need consideration:

What should be achieved?

If the actual goal is to establish a connection between the Releven graph and the SAWS graph (which seems reasonable), then no further data processing is needed. E.g. mappings from greek to english LinguistObjects are already established in the SAWS graph.

Identity conditions for equivalent text passages

Identity conditions for equivalent text passages and string matching parameters for text nodes in both graphs need to be carefully evaluated. For example the E33/E73 node https://r11.eu/rdf/resource/d4deff31-2933-41b1-93a7-17ff71f91f6f in the Releven graph has a symbolic content of "μητρός" asserted about it. Obviously, this is highly unspecific and therefore has high match probability. In the case of "μητρός", only one text node in the SAWS graph would actually match:

{
    "node_id": "http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.divsection3.o101.a91",
    "data": {
        "is_variant_of": "http://titus.fkidg1.uni-frankfurt.de/texte/etcs/grie/sept/sept.htm?sept858.htm#VT_Sir._7_27",
        "a": "http://purl.org/saws/ontology#LinguisticObject",
        "falls_within": "http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.divsection3.o101",
        "is_close_translation_of": null,
        "has_text_content": "περὶ τιμὴσ γονέων Μνήσθητι ὠδῖνος μητρός",
        "provenance": "CMR",
        "rdf_schema_label": "divedition.divsection3.o101.a91"
    }
}

But obvioulsy there aren't any strong markers that would allow to assert a text passage match with reasonable certainty.

lu-pl · 2024-11-25T12:45:37Z

Actually, given the literal text data in the Releven graph, I guess it would be quite correct to assert a match for "μητρός" in that case. It certainly is not wrong.

lu-pl · 2024-11-25T12:47:15Z

I would use thefuzz for string matching btw, it implements and neatly wraps Levenshtein distance string comparison.

lu-pl · 2024-11-29T07:55:11Z

Currently getting 404s from the SAWS SPARQL endpoint.. 🤷‍♂️

Might be due to Temporary Archive Notice:

"This site has been temporarily archived as part of King's Digital Lab (KDL) infrastructure migration and upgrade process.

We are in the final stages of a major project to migrate approximately 85 research websites to King's College London e-Research infrastructure. This modernisation will improve stability and long-term sustainability of our digital estate.

The site will be available with limited functionality from December 2024, and we expect it to be available again by spring 2025. For more information, please see the migration blog post.

Just emailed KDL support.

In the worst case, the JSON data I persisted is all we got for some time.

Guess being paranoid and persisting data paid off! :D

lu-pl · 2024-11-29T08:10:59Z

SAWS Kekaumenos edition gives me a "Problem loading document list from server" prompt.

lu-pl · 2024-12-02T08:45:09Z

Update: King’s Digital Laboratory answered my mail, they are currently putting a static version of the site up until the migration is completed.

Greek and english Kekaumenos editions still give me "Problem loading document list from server".

I will check again what might be possible with the data I pulled when the triplestore was still up, but I am not sure if this is sufficient to link the two graphs.

laletuver1 added the Texts label Mar 21, 2024

laletuver1 assigned lu-pl Mar 26, 2024

laletuver1 assigned Aaleks93 and unassigned lu-pl Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible data extraction from Consilita et Narrationes of Kekaumenos #36

Possible data extraction from Consilita et Narrationes of Kekaumenos #36

laletuver1 commented Mar 21, 2024

lu-pl commented Oct 8, 2024

lu-pl commented Oct 8, 2024

tla commented Oct 15, 2024

lu-pl commented Oct 22, 2024 •

edited

Loading

lu-pl commented Nov 25, 2024 •

edited

Loading

lu-pl commented Nov 25, 2024

lu-pl commented Nov 25, 2024

lu-pl commented Nov 29, 2024

lu-pl commented Nov 29, 2024

lu-pl commented Dec 2, 2024

Possible data extraction from Consilita et Narrationes of Kekaumenos #36

Possible data extraction from Consilita et Narrationes of Kekaumenos #36

Comments

laletuver1 commented Mar 21, 2024

lu-pl commented Oct 8, 2024

lu-pl commented Oct 8, 2024

tla commented Oct 15, 2024

lu-pl commented Oct 22, 2024 • edited Loading

lu-pl commented Nov 25, 2024 • edited Loading

lu-pl commented Nov 25, 2024

lu-pl commented Nov 25, 2024

lu-pl commented Nov 29, 2024

lu-pl commented Nov 29, 2024

lu-pl commented Dec 2, 2024

lu-pl commented Oct 22, 2024 •

edited

Loading

lu-pl commented Nov 25, 2024 •

edited

Loading