-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annif: Find existing content in Scholar to be used as a training set #3
Comments
I think this is the method used to get full text from files for indexing, could be useful when building a dataset. |
Question: Send the document to AI or have Scholar extract the full text and send to AI?
|
Some Scholar collections that may have useful content for a training set: CEAS Electrical Engineering and Computing Systems (EECS) Senior Design Projects 2017 CECH Information Technology Senior Design Projects The Lucille M. Schultz 19th Century Composition Archive Modernnati: Archiving & Preserving Cincinnati's Modernist Architecture Nature of Black Holes Cincinnati Romance Review 2019 Information Technology Research Symposium |
Create a list of works and/or collections in Scholar that have embedded text and existing metadata. We want a good mix of examples. Different files types, files with minimal metadata as well as well-described files.
Probably best to start with works that have only 1 file attached to avoid confusion.
https://github.com/NatLibFi/Annif-corpora
The text was updated successfully, but these errors were encountered: