From 1b0213eb51f469e87680d8419976e879f8a4fde2 Mon Sep 17 00:00:00 2001 From: R0bL <133535059+R0bL@users.noreply.github.com> Date: Tue, 16 Apr 2024 15:35:45 -0400 Subject: [PATCH] Update README.md --- README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 533e256..2e5e44f 100644 --- a/README.md +++ b/README.md @@ -53,17 +53,22 @@ Link to API : https://www.nbim.no/en/responsible-investment/voting/our-voting-re 2. Data Collection from SEC EDGAR System to get Corprate 10-K filings: -Used sec-api.io Link: https://sec-api.io/docs/sec-filings-item-extraction-api +Link to sec-api.io : https://sec-api.io/docs/sec-filings-item-extraction-api 3. Data preprocessing: Ingesting text into a dictionary, split into chunks and report on token count. -see link for open source nlp preprocesser spaCy: https://spacy.io/api/sentencizer +Link to open source nlp preprocesser spaCy: https://spacy.io/api/sentencizer 4. Embedding the chunks: use a pretrained model mpnet-base model -5. Creating a sematic search pipeline +Link to hugging face: https://huggingface.co/sentence-transformers/all-mpnet-base-v2 + +5. Creating a sematic search pipeline between a user query and the text + 6. Loading an LLM locally + +Link to LLM: https://huggingface.co/google/gemma-7b-it 7. Generating text with an LLM