Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environmental scan of potential AI/ML models #4

Open
hortongn opened this issue Jan 30, 2023 · 6 comments
Open

Environmental scan of potential AI/ML models #4

hortongn opened this issue Jan 30, 2023 · 6 comments
Assignees

Comments

@hortongn
Copy link
Member

Create a list of models that we could potentially use to extract text from documents and suggest metadata. We will start with basic metadata like title, description, etc. and eventually move on to optional metadata fields found in Scholar.

We ideally want to use "machine learning as a service" options that will host things for us, but we can also explore open source options.

@haitzlm
Copy link
Contributor

haitzlm commented Feb 2, 2023

@hortongn hortongn moved this from Todo to In Progress in App Dev AI Project Feb 3, 2023
@hortongn
Copy link
Member Author

hortongn commented Feb 9, 2023

Next steps:

  • categorize what models can be used for specific metadata fields.
  • Expand on the existing list - more examples/resources

@hortongn
Copy link
Member Author

Consider making use of the metadata tags that may already be embedded in a document (PDF, Word, etc.)

@hortongn
Copy link
Member Author

An AI toolkit for libraries (paper)
https://insights.uksg.org/articles/10.1629/uksg.592

Integrating Ruby with OpenAI: A Beginner’s Guide
https://ai.plainenglish.io/integrating-ruby-with-openai-a-beginners-guide-88ffaa10f202

GPT-JT is an open source GPT-3 alternative with a decentralized approach
https://the-decoder.com/gpt-jt-is-an-open-source-gpt-3-alternative-with-a-decentralized-approach/

@hortongn
Copy link
Member Author

hortongn commented Feb 23, 2023

How to use Microsoft AI Builder to Extract Data from PDF
https://www.youtube.com/watch?v=J3d6bx3i4l0&ab_channel=KevinStratvert

MS PowerAutomate (part of Office 365)
https://powerautomate.microsoft.com

@hortongn hortongn moved this from In Progress to On Hold in App Dev AI Project Mar 16, 2023
@haitzlm
Copy link
Contributor

haitzlm commented Apr 14, 2023

Interesting:
Text Analytics APIs are machine learning-powered services that allow developers to analyze and extract insights from text-based data. These APIs use natural language processing (NLP) techniques to automatically identify and extract entities, sentiments, topics, and other relevant information from text.

Here's a high-level overview of how Text Analytics APIs work:

  • Data Input: The API accepts text-based data as input, such as documents, social media posts, or customer feedback.
  • Preprocessing: The API preprocesses the input data to clean and normalize it. This may include tasks such as tokenization, stop-word removal, and stemming.
  • Feature Extraction: The API uses NLP techniques to extract features from the text data. This may include identifying entities such as people, organizations, and locations; extracting sentiments such as positive or negative; and identifying topics or themes.
  • Analysis and Output: The API analyzes the extracted features and generates insights or summaries based on the input data. The output may include visualizations, reports, or structured data that can be easily consumed by applications.
  • Some common use cases for Text Analytics APIs include sentiment analysis of social media data, entity extraction from news articles, and topic modeling for customer feedback.

Some popular Text Analytics APIs include:

  • Google Cloud Natural Language API
  • Microsoft Azure Cognitive Services Text Analytics API
  • Amazon Comprehend
  • IBM Watson Natural Language Understanding

By using Text Analytics APIs, developers can leverage the power of machine learning to extract valuable insights from text-based data with minimal effort and expertise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: On Hold / Backlog
Development

No branches or pull requests

2 participants