You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For some reason the 'filename' elements in the Bookworm use a 'filename' that replaces the hathi trust id with colons and slashes. (Eg, psia.ark:/13960/t5z623168 becomes psia.ark+=13960=t5z623168.) I assume this has something to do with certain ids not working as file paths on some operating system. But can it be corrected before the bookworm receives the filenames? It creates a number of problems all through the pipeline whenever we interface with Hathi resources, and it seems to me it would be much better if bookworm just received canonical hathi id.
The text was updated successfully, but these errors were encountered:
That's the clean id, which is part of the PairTree structure HathiTrust uses. If we have something labelled 'filename', the clean id is correct.
I'm in favour of using the htid as often as possible and keeping the clean id behind the scenes. In Bookworm, we could store both filename and htid, emphasizing the latter.
I think, to keep things simplest, the files hitting Bookworm should never even know of the clean id; it's easily derived from htid, and I haven't yet seen a use case for it. It's true filename is a required key in bookworm, but we shouldn't use cleanid for it: bookworm.filename (as opposed to filename in a hathi context) is just a synonym for 'unique document id.' And that's better served through htid.
For some reason the 'filename' elements in the Bookworm use a 'filename' that replaces the hathi trust id with colons and slashes. (Eg,
psia.ark:/13960/t5z623168
becomespsia.ark+=13960=t5z623168
.) I assume this has something to do with certain ids not working as file paths on some operating system. But can it be corrected before the bookworm receives the filenames? It creates a number of problems all through the pipeline whenever we interface with Hathi resources, and it seems to me it would be much better if bookworm just received canonical hathi id.The text was updated successfully, but these errors were encountered: