Environmental scan of potential AI/ML models #4

hortongn · 2023-01-30T20:00:56Z

Create a list of models that we could potentially use to extract text from documents and suggest metadata. We will start with basic metadata like title, description, etc. and eventually move on to optional metadata fields found in Scholar.

We ideally want to use "machine learning as a service" options that will host things for us, but we can also explore open source options.

haitzlm · 2023-02-02T19:09:42Z

- https://www.aiforlibrarians.com/ai-cases/

https://annif.org/
https://pubmed.ncbi.nlm.nih.gov/30153250/
https://iris.ai/
Annif: DIY automated subject indexing using multiple algorithms:
--https://liberquarterly.eu/article/view/10732:
Automated Classification to Improve the Efficiency of Weeding Library Collections
--https://www.sciencedirect.com/science/article/pii/S0099133317304160?via%3Dihub
https://github.com/openai (Open AI on Github)
https://github.com/openai/openai-cookbook
Can Machine Learning be used to assign managed metadata attributes for items?
--https://learn.microsoft.com/en-us/microsoft-365/community/machine-learning-and-managed-metadata
Apache Mahout
--https://mahout.apache.org//
Spark MLlib Apache
o https://spark.apache.org/mllib/

• https://library.stanford.edu/blogs/stanford-libraries-blog/2022/07/working-students-library-collections-data

hortongn · 2023-02-09T19:15:27Z

Next steps:

categorize what models can be used for specific metadata fields.
Expand on the existing list - more examples/resources

hortongn · 2023-02-13T18:00:32Z

Consider making use of the metadata tags that may already be embedded in a document (PDF, Word, etc.)

hortongn · 2023-02-21T21:04:58Z

An AI toolkit for libraries (paper)
https://insights.uksg.org/articles/10.1629/uksg.592

Integrating Ruby with OpenAI: A Beginner’s Guide
https://ai.plainenglish.io/integrating-ruby-with-openai-a-beginners-guide-88ffaa10f202

GPT-JT is an open source GPT-3 alternative with a decentralized approach
https://the-decoder.com/gpt-jt-is-an-open-source-gpt-3-alternative-with-a-decentralized-approach/

hortongn · 2023-02-23T18:03:32Z

How to use Microsoft AI Builder to Extract Data from PDF
https://www.youtube.com/watch?v=J3d6bx3i4l0&ab_channel=KevinStratvert

MS PowerAutomate (part of Office 365)
https://powerautomate.microsoft.com

haitzlm · 2023-04-14T21:00:20Z

Interesting:
Text Analytics APIs are machine learning-powered services that allow developers to analyze and extract insights from text-based data. These APIs use natural language processing (NLP) techniques to automatically identify and extract entities, sentiments, topics, and other relevant information from text.

Here's a high-level overview of how Text Analytics APIs work:

Data Input: The API accepts text-based data as input, such as documents, social media posts, or customer feedback.
Preprocessing: The API preprocesses the input data to clean and normalize it. This may include tasks such as tokenization, stop-word removal, and stemming.
Feature Extraction: The API uses NLP techniques to extract features from the text data. This may include identifying entities such as people, organizations, and locations; extracting sentiments such as positive or negative; and identifying topics or themes.
Analysis and Output: The API analyzes the extracted features and generates insights or summaries based on the input data. The output may include visualizations, reports, or structured data that can be easily consumed by applications.
Some common use cases for Text Analytics APIs include sentiment analysis of social media data, entity extraction from news articles, and topic modeling for customer feedback.

Some popular Text Analytics APIs include:

Google Cloud Natural Language API
Microsoft Azure Cognitive Services Text Analytics API
Amazon Comprehend
IBM Watson Natural Language Understanding

By using Text Analytics APIs, developers can leverage the power of machine learning to extract valuable insights from text-based data with minimal effort and expertise.

hortongn added this to App Dev AI Project Jan 30, 2023

github-project-automation bot moved this to Triage in App Dev AI Project Jan 30, 2023

hortongn mentioned this issue Jan 30, 2023

Choose a model (alogrithm) and a training set #5

Open

hortongn moved this from Triage to Todo in App Dev AI Project Jan 30, 2023

haitzlm self-assigned this Jan 31, 2023

hortongn moved this from Todo to In Progress in App Dev AI Project Feb 3, 2023

hortongn moved this from In Progress to On Hold in App Dev AI Project Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environmental scan of potential AI/ML models #4

Environmental scan of potential AI/ML models #4

hortongn commented Jan 30, 2023

haitzlm commented Feb 2, 2023 •

edited

Loading

hortongn commented Feb 9, 2023

hortongn commented Feb 13, 2023

hortongn commented Feb 21, 2023

hortongn commented Feb 23, 2023 •

edited

Loading

haitzlm commented Apr 14, 2023

Environmental scan of potential AI/ML models #4

Environmental scan of potential AI/ML models #4

Comments

hortongn commented Jan 30, 2023

haitzlm commented Feb 2, 2023 • edited Loading

hortongn commented Feb 9, 2023

hortongn commented Feb 13, 2023

hortongn commented Feb 21, 2023

hortongn commented Feb 23, 2023 • edited Loading

haitzlm commented Apr 14, 2023

haitzlm commented Feb 2, 2023 •

edited

Loading

hortongn commented Feb 23, 2023 •

edited

Loading