-
Notifications
You must be signed in to change notification settings - Fork 5
Problems with metadata updates from wikidata #345
Comments
Re 2, they're not new people, but new attributes or whatever that cause more rows to be created in the input/matching/*.csv files. |
I will try to outline a procedure for updates from wikidata before the next time we do it, hopefully to avoid some of the trouble we ran into this time. |
Let me know if WD has errors Add sources when changing valuesI saw some earlier edits done by the project in Wikidata without sources....
alias vs. NameIn WD we can just have sources on properties e.g. change Q5792849 1940037453 on the Name property —— |
In this particular case, it was two individuals in question. My previous edits (with source) were further edited. I put the changes back yesterday. The edits in question have to do with apparent spelling variants of iort. I don't know myself what the correct variant is, my edits are in line with the spelling in the bio books. If there are sources for the other spelling, then I guess both variants should be on wikidata. |
Yes. But the spelling in the biobooks should be the one that is used when the reference is the biobooks. Right, @salgo60 ? |
That's what i was trying to say - my edits have bio book sources and spelling. If alt spellings will also be entered, they should get their own source. |
Yes as mentioned before
Volume 1 page 436 - skånska p / centernVolume 2 page 158 - centernVolume 4 page 92 - skånska p / AK:s center
What would be interesting is if we could confirm what is stated in the books with where its mentioned in your corpus and get a better understanding/quality by adding a Property:P4584 "first appearance" based on your corpus
SourcesI would also like to see in your data
Examples when "Tvåkammar-riksdagen 1867–1970" is wrong
My suggestion step up and use sources and persistent identifiers
|
I like the idea of persistent identifiers. Until then, I think we can solve (close) this issue with a metadata update procedure.
The issues last time around would have been spotted and fixed very quickly if I were following this as a guide. |
That sounds like a good solution. Maybe put this in the repo wiki for now? |
FYI: We have a suspected duplicate in WIkidata that I have asked other people for a second opinion but no feedback yet I used Property:P460 "said to be the same as" The sv:Wikipedia article is marked |
done. |
In the discussion of Pull Request #344 we identified four different problems
We captured incorrect revisions made by wikidata users in iort (changed correct iorts) using the unit test. However, the mapping algorithms had already been run using the incorrect data. Running some of the tests before the mapping algorithm might be a solution.
Continuously new people can be added by Wikidata users. This means that when we do the metadata update there can be new duplicates (the same person as multiple wikidata entries). A solution is to list these potential duplicates (new names the mapping algorithm will confuse) with new names/persons the algorithm finds difficult and check them quickly before we run the algorithm.
A potential solution is to structure the metadata updates more than we currently do to capture potential problems more efficiently.
The text was updated successfully, but these errors were encountered: