Import death factoids to RDF data #1

tla · 2022-10-12T11:43:05Z

No description provided.

tla · 2023-08-28T14:23:55Z

Factoid data attached

tla · 2023-08-28T14:25:09Z

The fact of the deaths themselves are already in the database; here we are parsing and adding the date information. We can discuss the details further on Wednesday, and make notes in this issue.

Aaleks93 · 2023-11-21T11:37:28Z

also related to issue #2 revised version of the death factoids, completed. you can access the updated spreadsheet through this link

Aaleks93 · 2024-01-09T15:08:02Z

The spreadsheet with death records has been updated with sources on which I based the datings where my name is the authority. Therefore, the file from 21.11.2023 has been updated to the file named "C11 PBW Death records, AA_revised version_09.01.2024." xlsx, accessible here https://ucloud.univie.ac.at/index.php/f/797833040

tla · 2024-01-23T10:09:04Z

Report from @lu-pl 💯
I implemented the table conversion for the editor rows, see example output.
The P14 assertion for assigning Aleks or Marton is still missing, will add it today (+ some minor fixes).

Note that some SPARQL queries return empty, in which case no RDF is generated. See the logs.
I haven't really looked into that (yet) because I think you said you would like to investigate the empty queries yourself.

lu-pl · 2024-01-29T12:39:59Z

Update: Implemented the missing P14 assertions, see output.

tla · 2024-02-13T08:28:29Z

Note that some SPARQL queries return empty, in which case no RDF is generated. See the logs. I haven't really looked into that (yet) because I think you said you would like to investigate the empty queries yourself.

Some of these are expected (where they are based on sources that we ended up not using), but others have to do with the fact that the Name column has something added in parentheses. So for example Ioannes (Smbat) 106 should just be queried as Ioannes 106. I don't know where the parenthetical text came from, but it needs to be stripped / ignored in all cases.

For sanity-checking purposes, it might be helpful to keep a list of the sources we aren't using; these include Council of 1157, Italikos, Niketas Choniates, Historia, Pantokrator Typikon, Prodromos, Historische Gedichte, Tzetzes, Letters at least. If you could implement these as exclusions (i.e. if the Source canonical name is one of these, just skip the row) and output in the log what the source was every time a query returns nothing, this would help me audit a new run.

lu-pl · 2024-02-19T15:20:15Z

Update:

Parenthetical text in Name fields gets ignored now and unused Source values are skipped (see the log).

The script now generates a trig file deaths.trig with a named graph for every table partition.

I also investigated the empty queries, some of those were caused by typos or incomplete PBW strings in the tables.
I queried the store for the correct PBW strings and manually updated the tables in the r11tab/tables/xlsx folder.

For the remaining empty queries in most cases the PBW data is missing in the triplestore, so I don't really know what to do about that.

lu-pl · 2024-02-19T17:25:50Z

Note: I would like to/will port the metadata schema used in the r11cli application to the table conversion at some point, if that is alright.

tla · 2024-02-20T12:29:34Z

I've now looked at the empty queries, which have three causes:

They are about Basileios 2 (Basil II), who is not in our database except insofar as he was kin to others.
The source should have been Pantokrator Typikon but the string was modified.
The source is not exactly a primary source (in this case, it is Christos Philanthropos, note every time) and so has a slightly different modeling structure (we didn't create a Text Expression for this publication, but instead we created a Manifestation Creation event whose authority is the publication author, i.e. the editor of the text). The following query should work.

PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX star: <https://r11.eu/ns/star/>

select ?pub ?d ?a4 ?e
where { 
    ?a1 a star:E13_crm_P3 ;
        crm:P140_assigned_attribute_to ?d ;
        crm:P141_assigned """She died on a November 1 [shortly after 1100, a year before <Isaakios 61>]"""@en ;
        crm:P14_carried_out_by ?authority ;
        crm:P17_was_motivated_by ?source .
    ?d a crm:E69_Death .
    ?a2 a star:E13_crm_P100 ;
        crm:P140_assigned_attribute_to ?d ;
        crm:P141_assigned ?p .
    ?p a crm:E21_Person .
    ?id a crm:E15_Identifier_Assignment ;
        crm:P140_assigned_attribute_to ?p ;
        crm:P37_assigned ?e42 .
    ?e42 a crm:E42_Identifier ;
         crm:P190_has_symbolic_content "Anna 61" .
    ?a3 a star:E13_lrmoo_R15 ;
        crm:P140_assigned_attribute_to ?pub ;
        crm:P141_assigned ?source .
    ?a4 a star:E13_lrmoo_R24 ;
        crm:P140_assigned_attribute_to ?pubcreation ;
        crm:P141_assigned ?pub ;
        crm:P14_carried_out_by ?e .
    ?e crm:P3_has_note ?editor . 
} limit 1

tla · 2024-02-20T13:21:26Z

I forgot the fourth case, which was a death record for Symbatios 101 from Iveron 2.178.5; this is from a document in the Iveron archive that was produced in 1098, which is past our cutoff point of 1095.

lu-pl · 2024-03-11T15:46:21Z

All empty query cases are handled now (see logs and I updated the script to the new metadata schema.

The way this is impemented now, a named named + metadata is generated for every table partition, see deaths.trig. Another option would be to merge all graphs in to a single named graph and generate metadata only for that graph.

lu-pl · 2024-03-11T15:49:34Z

note: Metadata of course gets generated only once for every software execution, but every named graph is registered as being an output of that software execution, see the metadata graph.

lu-pl · 2024-03-13T09:51:40Z

The script now produces a single turtle file with all subgraphs merged, see deaths.ttl.

I had to slightly modify the metadata schema, metadata assertions are now pointing to E13 subject nodes instead of named graphs along L11_had_output. Since the range of L11 is D1_Digital_Object this implies (and a reasoner would inference) that E13 assertions are D1s i.e. E73_Information_Objects - which is not wrong but maybe something worth pointing out.

laletuver1 · 2024-03-26T14:40:29Z

Meeting notes: Lukas has changed the metadata schema, which Tara will put on the Graph database. A new issue might be necessary for converting all old metadata into new metadata schema.

lu-pl · 2024-05-21T08:09:19Z

Ingested deaths data to https://r11.eu/rdf/resource/deaths.

lu-pl · 2024-05-21T14:02:41Z

Note: Consolidation/merging of named graphs into another named graph can be automated using SPARQL update (INSERT) requests.

This should be implemented in r11cli.

edit: DROPing a named graph would not be reflected in the merged graph though, so one would need to SPARQL the merged triples out of target graph before deleting the named graph!

delete { ?s ?p ?o . }
where {
    graph <named_graph> {
        ?s ?p ?o .
    }
}

drop graph <named_graph>

tla · 2024-07-02T12:00:18Z

Hi @lu-pl , concerning the metadata schema, I've just noticed a problem with the timestamps...

star:cd81994d8e a crmdig:D10_Software_Execution ;
    crm:P82_begin_of_the_begin "2024-03-25T08:07:23.267077"^^xsd:dateTime ;

The first issue is that begin_of_the_begin is actually P82a, not P82 itself; the second issue is that a crmdig:D10_Software_Execution is a subclass of E7, not E52, which is what the domain of P82* is supposed to be. So this would need to be rewritten to something like

star:cd81994d8e a crmdig:D10_Software_Execution ;
    crm:P4_has_time-span [ crm:P82a_begin_of_the_begin "2024-03-25T08:07:23.267077"^^xsd:dateTime ] ;

lu-pl · 2024-07-08T09:18:13Z

hi @tla, the metadata issue should be fixed, see deaths.ttl.

LODKit now has a feature for Ontology derived ClosedNamespaces, so at least typos won't be an issue anymore.

tla self-assigned this Oct 12, 2022

tla added the People label Oct 12, 2022

tla assigned lu-pl Aug 28, 2023

tla changed the title ~~Import death factoids to Neo4J data~~ Import death factoids to RDF data Aug 28, 2023

laletuver1 mentioned this issue Nov 20, 2023

Monthly Blog Posts #25

Open

Aaleks93 mentioned this issue Nov 21, 2023

Curation of death-factoid data from PBW #2

Closed

laletuver1 added the Data model label Dec 11, 2023

tla mentioned this issue Jul 2, 2024

Convert PBW-import script to use RDF directly instead of going through Neo4J erc-releven/PBWgraph#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import death factoids to RDF data #1

Import death factoids to RDF data #1

tla commented Oct 12, 2022

tla commented Aug 28, 2023

tla commented Aug 28, 2023

Aaleks93 commented Nov 21, 2023

Aaleks93 commented Jan 9, 2024

tla commented Jan 23, 2024

lu-pl commented Jan 29, 2024

tla commented Feb 13, 2024 •

edited

Loading

lu-pl commented Feb 19, 2024 •

edited

Loading

lu-pl commented Feb 19, 2024

tla commented Feb 20, 2024

tla commented Feb 20, 2024

lu-pl commented Mar 11, 2024

lu-pl commented Mar 11, 2024

lu-pl commented Mar 13, 2024

laletuver1 commented Mar 26, 2024

lu-pl commented May 21, 2024

lu-pl commented May 21, 2024 •

edited

Loading

tla commented Jul 2, 2024

lu-pl commented Jul 8, 2024

Import death factoids to RDF data #1

Import death factoids to RDF data #1

Comments

tla commented Oct 12, 2022

tla commented Aug 28, 2023

tla commented Aug 28, 2023

Aaleks93 commented Nov 21, 2023

Aaleks93 commented Jan 9, 2024

tla commented Jan 23, 2024

lu-pl commented Jan 29, 2024

tla commented Feb 13, 2024 • edited Loading

lu-pl commented Feb 19, 2024 • edited Loading

lu-pl commented Feb 19, 2024

tla commented Feb 20, 2024

tla commented Feb 20, 2024

lu-pl commented Mar 11, 2024

lu-pl commented Mar 11, 2024

lu-pl commented Mar 13, 2024

laletuver1 commented Mar 26, 2024

lu-pl commented May 21, 2024

lu-pl commented May 21, 2024 • edited Loading

tla commented Jul 2, 2024

lu-pl commented Jul 8, 2024

tla commented Feb 13, 2024 •

edited

Loading

lu-pl commented Feb 19, 2024 •

edited

Loading

lu-pl commented May 21, 2024 •

edited

Loading