-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some wrong ROR IDs #6
Comments
I also found some problems here (Paris, France) with the ROR matching : whereas this are two different universities. (you can find 13 univ. at Paris starting with "univ of paris" but ending differently). (Kudos for the tools and data ! really appareciate). |
Hi @psmukhopadhyay and @ml4rrieu and @andreaspacher -- apologies for belated commenting and out of the blue tagging! I'm the new Technical Community Manager for ROR, and we're beta testing some improvements to our API's One of many changes is that we've removed a lot of false positives. See for instance the difference between the same search on the production server and the staging server, where we're beta testing the changes: https://api.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore The request for feedback and link to more documentation and examples is at https://github.com/ror-community/ror-roadmap/discussions/77 -- let us know what you think! |
Thanks Amanda.
It is now much improved, and I tested it already the moment you posted this
news in ***@***.***
I'll post in the ror forum in case any further issues arise.
Best regards
…On Thu, Sep 8, 2022 at 9:15 PM Amanda French ***@***.***> wrote:
Hi @psmukhopadhyay <https://github.com/psmukhopadhyay> and @ml4rrieu
<https://github.com/ml4rrieu> and @andreaspacher
<https://github.com/andreaspacher> -- apologies for belated commenting
and out of the blue tagging! I'm the new Technical Community Manager for
ROR, and we're beta testing some improvements to our API's ?affiliation
matching parameter that I think would help the issues listed here. Also
figured it'd be wise to give users a heads up about the forthcoming changes
in any case.
One of many changes is that we've removed a lot of false positives. See
for instance the difference between the same search on the production
server and the staging server, where we're beta testing the changes:
https://api.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore
https://api.staging.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore
The request for feedback and link to more documentation and examples is at
ror-community/ror-roadmap#77
<https://github.com/ror-community/ror-roadmap/discussions/77> -- let us
know what you think!
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AET2TVBKZB43G5OR2WSKYK3V5ICZNANCNFSM43RWSDDA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We have noticed a few wrong ROR IDs during our attempt to create a subset of India-specific results from the OpenEditor dataset (editors1_ror_and_countries.csv and editors2_ror_and_countries.csv).
The classic two cases as examples are as follows:
A) Indian Institute of Science, Bangalore: the corresponding records for this premier Indian institute show wrong ror IDs in all rows/records - https://ror.org/05j873a45 - This ror ID is actually for Indian Institute of Soil Science (IISS, भाकृअनुप-भारतीय मृदा विज्ञान संस्थान, Website - http://www.iiss.nic.in/index.html)
B) Christian Medical College Vellore, Vellore, India: the corresponding records for this institute show wrong ror ID in all rows/records - https://ror.org/01vj9qy35 - This ror ID is actually for Christian Medical College, Ludhiana (another CMC in another city and state in India) (Website - http://cmcludhiana.in/medical_college/)
Possible reasons:
An API call to ROR database (in affiliation field) for Indian Institute of Science like - https://api.ror.org/organizations?filter=country.country_code:IN&affiliation=Indian+Institute+of+Science - shows a few results (around 14) with following data in json format
++++++++++
{"number_of_results":10,"items":[{"substring":"Indian Institute of Science","score":0.92,"matching_type":"COMMON TERMS","chosen":true,"organization":{"id":"https://ror.org/05j873a45","name":"Indian Institute of Soil Science","email_address":null,"ip_addresses":[],"established":1988,"types":["Facility"],"relationships":[{"label":"Indian Council of Agricultural Research","type":"Parent","id":"https://ror.org/04fw54a43"}],"addresses":[{"lat":23.309722,"lng":77.403056,"state":null,"state_code":null,"city":"Bhopal","geonames_city":{"id":1275841,"city":"Bhopal","geonames_admin1":{"name":"Madhya Pradesh","id":1264542,"ascii_name":"Madhya Pradesh","code":"IN.35"},"geonames_admin2":{"name":"Bhopāl","id":1275842,"ascii_name":"Bhopal","code":"IN.35.444"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iiss.nic.in/index.html"],"aliases":[],"acronyms":["IISS"],"status":"active","wikipedia_url":"https://en.wikipedia.org/wiki/Indian_Institute_of_Soil_Science","labels":[{"label":"भाकृअनुप-भारतीय मृदा विज्ञान संस्थान","iso639":"hi"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0000 9288 3664"]},"Wikidata":{"preferred":null,"all":["Q18125957"]},"GRID":{"preferred":"grid.464869.1","all":"grid.464869.1"}}}},{"substring":"Indian Institute of Science","score":0.84,"matching_type":"PHRASE","chosen":false,"organization":{"id":"https://ror.org/04dese585","name":"Indian Institute of Science Bangalore","email_address":null,"ip_addresses":[],"established":1909,"types":["Education"],"relationships":[],"addresses":[{"lat":13.021275,"lng":77.565769,"state":null,"state_code":null,"city":"Bengaluru","geonames_city":{"id":1277333,"city":"Bengaluru","geonames_admin1":{"name":"Karnataka","id":1267701,"ascii_name":"Karnataka","code":"IN.19"},"geonames_admin2":{"name":"Bangalore Urban","id":1277331,"ascii_name":"Bangalore Urban","code":"IN.19.572"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iisc.ernet.in/"],"aliases":[],"acronyms":["IISc"],"status":"active","wikipedia_url":"http://en.wikipedia.org/wiki/Indian_Institute_of_Science","labels":[{"label":"ఇండియన్ ఇన్ స్టిట్యూట్ ఆఫ్ సైన్స్","iso639":"te"},{"label":"இந்திய அறிவியல் கழகம்","iso639":"ta"},{"label":"ਭਾਰਤੀ ਵਿਗਿਆਨ ਅਦਾਰਾ","iso639":"pa"},{"label":"ഇന്ത്യൻ ഇൻസ്റ്റിറ്റ്യൂട്ട് ഓഫ് സയൻസ്","iso639":"ml"},{"label":"ಭಾರತೀಯ ವಿಜ್ಞಾನ ಸಂಸ್ಥೆ","iso639":"kn"},{"label":"भारतीय विज्ञान संस्थान","iso639":"hi"},{"label":"ભારતીય વિજ્ઞાન સંસ્થા","iso639":"gu"},{"label":"ভারতীয় বিজ্ঞান সংস্থা","iso639":"bn"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0001 0482 5067"]},"FundRef":{"preferred":"100007780","all":["100007780","100007871","100008044","100009935"]},"OrgRef":{"preferred":null,"all":["37533"]},"Wikidata":{"preferred":null,"all":["Q948720"]},"GRID":{"preferred":"grid.34980.36","all":"grid.34980.36"}}}},........
++++++++++++++++++++
We can easily understand now that what is the reason for wrong ror ID in this case. The first one i.e Indian Institute of Soil Science has been picked up the process. In fact we have also observed that to be on the safe side score=1.0 is a better condition than chosen==true for extracting ror IDs through API call (but I am not quite sure that you have also adopted API path for ror ID or you are fetching ror IDs through some other means).
We found a total of 455 records (India-specific only) initially with wrong ror IDs in a total of 8170 records having ror IDs (out of 10316 records with affiliated country as India).
I am attaching a csv file containing these 455 records (rorORI column is the ror ID as available in the dataset and rorOEM is the corrected ror ID as fetched for our subset of data)
no-match-report.csv
The text was updated successfully, but these errors were encountered: