Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
R0bL authored Apr 16, 2024
1 parent 4d93011 commit 1b0213e
Showing 1 changed file with 8 additions and 3 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,22 @@ Link to API : https://www.nbim.no/en/responsible-investment/voting/our-voting-re

2. Data Collection from SEC EDGAR System to get Corprate 10-K filings:

Used sec-api.io Link: https://sec-api.io/docs/sec-filings-item-extraction-api
Link to sec-api.io : https://sec-api.io/docs/sec-filings-item-extraction-api

3. Data preprocessing: Ingesting text into a dictionary, split into chunks and report on token count.

see link for open source nlp preprocesser spaCy: https://spacy.io/api/sentencizer
Link to open source nlp preprocesser spaCy: https://spacy.io/api/sentencizer

4. Embedding the chunks: use a pretrained model mpnet-base model

5. Creating a sematic search pipeline
Link to hugging face: https://huggingface.co/sentence-transformers/all-mpnet-base-v2

5. Creating a sematic search pipeline between a user query and the text


6. Loading an LLM locally

Link to LLM: https://huggingface.co/google/gemma-7b-it

7. Generating text with an LLM

Expand Down

0 comments on commit 1b0213e

Please sign in to comment.