Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible data extraction from Consilita et Narrationes of Kekaumenos #36

Open
laletuver1 opened this issue Mar 21, 2024 · 10 comments
Open
Assignees
Labels

Comments

@laletuver1
Copy link

Rouche's edition of the Consilia et Narrationes of Kekaumenos

@lu-pl
Copy link

lu-pl commented Oct 8, 2024

I queried the SAWS endpoint for Kekaumenos data, the graph appears to be mostly a more or less direct projection of the underlying XML sources.

E.g. the following query extracts all the text nodes (spans) of the english Kekaumenos translation:

select ?object ?p ?o 
where {
  ?object a saws:LinguisticObject ;
          saws:fallsWithin* <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsEng01> ;
          <http://www.homermultitext.org/cts/rdf/hasTextContent> ?text .

  minus {?object ^saws:fallsWithin ?upper .}

  ?object ?p ?o .
}
order by ?object

Link to SNORQL interface

Unfortunately I haven't found a public SPARQL endpoint yet and at this point I'm not sure if it even exists. This obviously complicates machine processing of the data.

Since the graph basically mirrors the XML sources, analyzing the XML data directly would be an option, but the XML source download links give me 404s. 🤨

I will probably contact the people from SAWS about this.

@lu-pl
Copy link

lu-pl commented Oct 8, 2024

Question: Would this be sufficient to extract all Kekaumenos text nodes? I would like to verify the "Consilia et Narrationes" provenance, which should be connected as an F2 (?).

PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>

select distinct ?e33 ?text
where {
    ?e13 a crm:E13_Attribute_Assignment ;
        crm:P14_carried_out_by <https://r11.eu/rdf/resource/773a32fb-f5ab-4898-926c-354bfe0171ba> ;
        crm:P17_was_motivated_by ?e33 .
    ?e33 a crm:E33_Linguistic_Object ;
        crm:P190_has_symbolic_content ?text .
}

@tla
Copy link
Member

tla commented Oct 15, 2024

Sorry for the late reply - that query would get you all the text nodes that were used as sources for statements we are saying Kekaumenos made. While I think it would work, there is the possibility that someone else is credited with something from Kekaumenos' text, or that the author is credited with assertions that don't come from this text. So I would rather suggest to start from the text, get the publication, the passage, and thence the assertion. (You can drop the query for ?c if you don't need the assertion where the text node is used.)

PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX star: <https://r11.eu/ns/star/>
PREFIX data: <https://r11.eu/rdf/resource/>

select distinct ?e33 ?text
where {
    ?a crm:P141_assigned data:38da8bdf-c317-4796-be0f-3b6a3ff783c7 ;
       a star:E13_lrmoo_R5 ;
       crm:P140_assigned_attribute_to ?pub .
    ?b crm:P140_assigned_attribute_to ?pub ;
       a star:E13_lrmoo_R15 ;
       crm:P141_assigned ?e33 .
    ?c crm:P17_was_motivated_by ?e33 ;
       a crm:E13_Attribute_Assignment .
    ?e33 a crm:E33_Linguistic_Object ;
        crm:P190_has_symbolic_content ?text .
}

@lu-pl
Copy link

lu-pl commented Oct 22, 2024

I was able to query the SAWS SPARQL endpoint for annotated text nodes with Python now (httpx), processed/grouped the SPARQL results and generated JSON output for further processing.

Next steps include comparing the JSON outputs to each other and looking at/figuring out the SPARQL query for extracting Kekaumenos text literals from the GraphDB store.

@lu-pl
Copy link

lu-pl commented Nov 25, 2024

Two things need consideration:

  1. What should be achieved?

If the actual goal is to establish a connection between the Releven graph and the SAWS graph (which seems reasonable), then no further data processing is needed. E.g. mappings from greek to english LinguistObjects are already established in the SAWS graph.

  1. Identity conditions for equivalent text passages

Identity conditions for equivalent text passages and string matching parameters for text nodes in both graphs need to be carefully evaluated. For example the E33/E73 node https://r11.eu/rdf/resource/d4deff31-2933-41b1-93a7-17ff71f91f6f in the Releven graph has a symbolic content of "μητρός" asserted about it. Obviously, this is highly unspecific and therefore has high match probability. In the case of "μητρός", only one text node in the SAWS graph would actually match:

{
    "node_id": "http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.divsection3.o101.a91",
    "data": {
        "is_variant_of": "http://titus.fkidg1.uni-frankfurt.de/texte/etcs/grie/sept/sept.htm?sept858.htm#VT_Sir._7_27",
        "a": "http://purl.org/saws/ontology#LinguisticObject",
        "falls_within": "http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.divsection3.o101",
        "is_close_translation_of": null,
        "has_text_content": "περὶ τιμὴσ γονέων Μνήσθητι ὠδῖνος μητρός",
        "provenance": "CMR",
        "rdf_schema_label": "divedition.divsection3.o101.a91"
    }
}

But obvioulsy there aren't any strong markers that would allow to assert a text passage match with reasonable certainty.

@lu-pl
Copy link

lu-pl commented Nov 25, 2024

Actually, given the literal text data in the Releven graph, I guess it would be quite correct to assert a match for "μητρός" in that case. It certainly is not wrong.

@lu-pl
Copy link

lu-pl commented Nov 25, 2024

I would use thefuzz for string matching btw, it implements and neatly wraps Levenshtein distance string comparison.

@lu-pl
Copy link

lu-pl commented Nov 29, 2024

Currently getting 404s from the SAWS SPARQL endpoint.. 🤷‍♂️

Might be due to Temporary Archive Notice:

"This site has been temporarily archived as part of King's Digital Lab (KDL) infrastructure migration and upgrade process.

We are in the final stages of a major project to migrate approximately 85 research websites to King's College London e-Research infrastructure. This modernisation will improve stability and long-term sustainability of our digital estate.

The site will be available with limited functionality from December 2024, and we expect it to be available again by spring 2025. For more information, please see the migration blog post.

Just emailed KDL support.

In the worst case, the JSON data I persisted is all we got for some time.

Guess being paranoid and persisting data paid off! :D

@lu-pl
Copy link

lu-pl commented Nov 29, 2024

SAWS Kekaumenos edition gives me a "Problem loading document list from server" prompt.

@lu-pl
Copy link

lu-pl commented Dec 2, 2024

Update: King’s Digital Laboratory answered my mail, they are currently putting a static version of the site up until the migration is completed.

Greek and english Kekaumenos editions still give me "Problem loading document list from server".

I will check again what might be possible with the data I pulled when the triplestore was still up, but I am not sure if this is sufficient to link the two graphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants