Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armadillos, ratites, and pill bugs: feedback #74

Open
hyanwong opened this issue Jul 2, 2024 · 2 comments
Open

Armadillos, ratites, and pill bugs: feedback #74

hyanwong opened this issue Jul 2, 2024 · 2 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Jul 2, 2024

I tried experimenting with the automatic wiki harvester, using armadillos as a test case:

get_wiki_images clade data/Wiki/wd_JSON/OneZoom_latest-all.json 847764

Here are the pictures. They aren't quite as good quality as I would have hoped for, but that might be a reflection on the unusualness of the taxon. There are some better wikimedia images (e.g.

), but it does appear that some hand curation might be needed for some of these non-european groups. I guess the main question is whether assigning all these image a value of 35000 will displace existing, better Onezoom images on the tree:

75070
111846
148752
203033
244043
649549
743510
752691
902876
968416
1042139
1052814
1761577
1764523

@hyanwong
Copy link
Member Author

hyanwong commented Jul 2, 2024

Here's another test using get_wiki_images clade OneZoom_latest-all.json Palaeognathae For comparison, here's what we have for that clade on OneZoom:

Screenshot 2024-07-02 at 17 11 57

A lot of these images don't have the artist/author information in a format we can ingest easily, e.g.

WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_duidae.JPG': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_obsoletus.jpg': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_strigulosus.jpg': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Tinamus_solitarius.jpg': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Tinamus_guttatus.JPG': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_parvirostris.JPG': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_noctivagus.JPG': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Crypturellus_undulatus.JPG': using 'Unknown artist'
WARNING:get_wiki_images.py:Artist not found for 'Nothura_minor.jpg': using 'Unknown artist'

These usually have e.g. "Given to the wikipedia by the author, Renato Caniatti" or something similar written on the page. I assume that someone will figure out a way to make this a bit more machine readable, and we just have to wait until this is sorted.

My impression is that the wiki images are of roughly the same quality on average (maybe very slightly better) than what we have, but that the image rating of the existing images means that our existing image stock is probably a bit more useful, because we can pick the ones we know to be high quality for percolating upwards in the tree.

1265542
1266282
1267803
1268546
1270229
1270445
1271939
3501588
11179834
1262693
1264146
1017426
1089078
1262031
933682
935204
971229
998385
870092
870099
916853
847190
849184
852723
860730
730091
733120
733485
734829
742093
790940
793068
793573
834414
843182
843185
843266
375790
388464
428441
510118
602080
609761
667778
17592
93208
192044
244197
248520
251765

@hyanwong hyanwong changed the title Armadillos: feedback Armadillos and ratites: feedback Jul 2, 2024
@hyanwong
Copy link
Member Author

hyanwong commented Jul 2, 2024

Finally, here are pill bugs (get_wiki_images clade OneZoom_latest-all.json Armadillidiidae). OneZoom only have 2 images in this taxon, so the 13 images that we can get from wikidata is a distinct improvement, and the pictures are all pretty good quality, I think:

1300629
1646723
1813667
1891394
1927292
2126857
2224976
2331928
2585227
2610354
2682942
2946078
3433338

@hyanwong hyanwong changed the title Armadillos and ratites: feedback Armadillos, ratites, and pill bugs: feedback Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant