-
Notifications
You must be signed in to change notification settings - Fork 5
Make metadata IDs persistant #269
Comments
I suggest we use If the birthday isn't available, we would use |
People change names so this might be confusing long term. Maybe just use a uuid? That we know will persistent. |
I would say you should have id:s for everything parties/PM members/departments/electoral districts/subjects/.... and do like Wikidata just an id with no meaning (Q is from the name of Dennys wife Qamarniso Q61768970)
redirectAnother lesson learned is support redirects ---> When e.g. #88 Riksdagens does mistakes and adds 2 id:s for the same person (and never fix it 😢 ) its easy you also get "2 people" --> they should be merged on your side and IF the end user still have the "old id" they should find the merged target..,.. --> owl:sameAs |
That sounds like a good idea. Best of both worlds. =) |
Why are the wiki_ids not persistent? It seems like the least expensive solution (for us, since we used the QIDs in protocol documents) would be to convince wikidata to make the QIDs persistent. |
@salgo60 know this better than me. But I think the core problem is that anyone can create a new person (hence a new id). This can then be merged. So it is a ”flaw” of the wikidata structure. In addition, wikidata would like us to have persistant id that they could reference to. Ie our corpus will (after 1.0) be a reference for the quality control of wikidata. I hope this explains why. |
@MansMeg @ninpnin maybe its time for starting the process of getting persistant unique Welfare state analytics ids #269 See how Nobelprize.org redesigned its data with an API and then @miroli proposed a Wikidata id P8024 --> we can now access the WD object using the Nobelprize unique id...
|
I would say that Wikidata is not designed to be the source and its better as I describe above that you have an unique persistent id as the update frequency in WD is crazy and its an open system with its strengths and weakness... also supporting > 200 languages make this equation nearly impossible and we merge a lot - see real time stream The design as I understand it is not about the truth more what other sources claim --> Wikidata can also store contradicting facts...
|
@MansMeg @ninpnin @fredrik1984 @liamtabib We discussed persistent IDs this morning. There's already an open issue, so I didn't want to start a new one. Regardless of the format we use for the IDs, it seems like we need to obtain/create a property item on wikidata, something like SWERIK_MP_ID. According the this such an needs to be proposed and discussed "for some time" before it can be approved --- do we know @salgo60 if it's already been proposed and/or how long is "some time"? Maybe we should decide on the property name and propose it ASAP if it hasn't been done already. There has been discussion about whether to use name/birth date or a uuid. I see the sense in using a UUID, but also sense in having a deterministic ID -- I suggest that we create a UUID deterministically using the primary name/surname and birth date as a seed (we can use pyriksdagen.utils.get_formatted_uuid as a starting point) -- best of both worlds?. What do you all say? |
Good idea! |
That works for me. The only important thing is that the IDs are persistent. I.e. we need to commit to the IDs, and they will never change after they are assigned to an individual. How we create them is less important, as long as it is uuids. I think the discussions on Wikidata will be less of a problem if we set up a persistant id, since these IDs will probably be the only persistent ids for MPs going far back in time. |
WD need a formatter string and some examples See how a proposal looks like that I created 11:39, 21 September 2016 https://www.wikidata.org/wiki/Wikidata:Property_proposal/SBL Anyone can create a proposal and everyone can comment and vote on it.... my experience is that it takes some weeks to get it approved... I am out kayaking this week and can help you when I am back but it is no rocket science so give it a try... One thought I had if we could use Liberis-URI or the one Riksdagens has dependent were you will store your data Landing pagesWould be nice if you had landing pages --> we could link you from Swedish Wikipedia objects like
It's easy extracting text and pictures from Swedish Wikipedia see examples I did for people doing an app with Swedish cemeteries OT there is a WD conferenceWould be interesting if you shared you experience as researcher's how you experience working with Wikidata see tweet what is missing and can be better... UPDATE: Wikidata modelling days 2023 looks like a researcher Daniel Mietchen is part he is also involved in designing Scholia see video |
I'll draft a text for the Motivation part of the wikidata proposal in the next couple of days and post it here for commentary before submitting it. I think there's one unsettled issue, though. There's some consensus on using a UUID solution, but do we want to add some kind of human readable segment so it's clear that these are our UUIDs? E.g.: " |
Extra bonus can be done when approved
b) URL match pattern Property:P8966 we have tools using the URL to understand what Wikidata property it relates to eg. ^http?://(?:www.)?fossilworks.org/cgi-bin/bridge.pl?a=taxonInfo&taxon_no=(([1-9]\d{0,5})) relates to Property:P842 |
Would be cool if we could do linked data of your Push release tests we have Software_quality_assurance property = Property:P2992
|
Good document about persistent identifiers and see also my "The Magnus list" created 2021 "One way to design a system to be a good external identifier in Wikidata" this list was mentioned by David Shorthouse at 27:50 in the Stanford video - slides "Keepin 'N Sync... with wikidata ... and ORCID...and GBIF" A Persistent Identifier (PID) policy for the European Open Science Cloud (EOSC)Good design pattern use tombstone pagesHow to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data |
I have also tried to get Riksarkivet to support archived documents and PIDs --> status work in progress :sad:😭 maybe your project can explain that PIDs support in archives are very important for research people Today, I perceive that there is no one else on the line when it comes to discussing persistent identifiers and how they should be supported in archives. DIGG's project does not seem to firmly decide that the National Archives and the Royal Library (KB) should handle this. |
@BobBorges doi.org/10.1101/117812 states in Lesson 3. Opt for simple, durable web resolution
|
I think going with a pure uuid is probably the simplest. I dont see the value of adding swerik as a slug. Ideally the pid will live longer (with the vorpus) than with the swerik project name. |
@MansMeg I hope we in Sweden will move i direction creating our resolving service something lika a Swedish DOI maybe SWEDOI Maybe related I read this paper Introducing Innovative Indicators to Track Sweden's Open Research Data Objective: How to Measure Progress? Defining Indicators to Track Open Research Data Across Swedish Universities Observer patternI thinks loosely coupled systems should implement the observer pattern so that you can maybe easier show citation graphs - see my suggestion to DIGG people "Best practice needed for understanding who is referencing my PID" and "#17 Vem använder en identifierare" |
I see that point. But I doubt the swerik name will live long enough. Whatever slug we use we will have this or similar problems. Just going with a uuid is probably the easiest minimal viable uuid and would have the least long term risks, I think. |
There's some motivation for a persistent SWERIK person ID here: https://docs.google.com/document/d/10_SEVNI7dF46hhnucTps242ntSr1nm_R3EHC7_9Mkjk/edit?usp=sharing Modeled on @salgo60's example in scope/length/level of detail. Feel free to add any commentary directly to that google document. |
This is excellent @BobBorges ! I will read and comment. I think this is an issue that I think we can discuss now, and then have a discussion with the TAB next Friday as a last pair of eyes before we go forward and implement. |
I think one good motivation is with your own persistent identifier you can VERY easy start use SKOS and explain a difference with Wikidata, Riksdagens Oppna data, Riksarkivet SBL, the book "Tvåkammar Riksdagen".....
WIkidata merge a lot - maybe too much.... |
@BobBorges The best motivation I feel is FAIRDATA F1 as you produce research data ut should be FAIRDATA.
see also DOI 10.1101/117812 Other good resources
|
Thanks @salgo60! FAIR is a good thing to mention in the motivation. As someone with a research background, the R in FAIR seems the most problematic in our case now without persistent IDs -- How can we reuse and verify research findings when the primary keys of our database change regularly? |
@BobBorges as Wikidata addictive I also would like to see the provenance - PROV of every singel data point i.e. something like a more advanced version history combined with the role of who did the change.... I.e what trust does the agent has and what data is that change based on... I feel we see that problem with "party" vilde #139 and chatGPT using PROV one Wikidata anti-patternOne antipattern I see in Wikidata that "every" source should confirm the birth of Selma Lagerlöf Q44519#P569 right now 23 references The Wikidata model lack a Trust dimension. I asked Denny the WD designer of his point of view and wrote a blogpost about it WikidataCon 2019: We need a better model communicating quality/relevance of sources in Wikidata / Provenance |
I did a small test using PROV with chatGPT and also show how good change tracking SPA Svensk Porträttarkiv has when you use the API link 139#issuecomment-1806804671 |
If you have a Wiki account don’t hesitate to support it syntax
https://www.wikidata.org/wiki/Wikidata:Property_proposal/SWERIK_Person_ID |
@BobBorges I heard comments from your statement
As said before more times should I show you WD? What can happen is that 2 ids are merged… A merge will have an redirect from the old to the new… and if we speak semantics SKOS exactMatch the problem with Wikidata is that most people are not domain experts and as it’s an open system we also get anonymous edits and vandalism…. |
@BobBorges wait and see we now have enough people I guess to get this approved… next step is to get the focus of a wiki admin which could take 1 minute or more weeks :sad: |
FYI: I added P12192 to Template:Sweden_properties / diff and Template:Politician_properties / diff Feels like its wrong set up I guess you will have persistent identifiers for everything not just people as P31 indicates |
@BobBorges can we close this? |
There is a need from the wikidata people to refer to our corpus (from version 1.0) as a reference on the data. Hence we should make our ids persistent.
The text was updated successfully, but these errors were encountered: