Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): add synthetic data generation pipeline #165

Merged

Conversation

akotyla
Copy link
Collaborator

@akotyla akotyla commented Oct 31, 2024

No description provided.

@akotyla akotyla added feature New feature or request document search Changes to the document search package evals Adding new evaluation pipelines or improving existing ones labels Oct 31, 2024
@akotyla akotyla added this to the Ragbits 0.4 milestone Oct 31, 2024
@akotyla akotyla linked an issue Oct 31, 2024 that may be closed by this pull request
Copy link
Contributor

github-actions bot commented Oct 31, 2024

badge

Code Coverage Summary

Filename                                                                                                     Stmts    Miss  Cover    Missing
---------------------------------------------------------------------------------------------------------  -------  ------  -------  -------------------------------------
packages/ragbits-core/src/ragbits/core/__init__.py                                                               0       0  100.00%
packages/ragbits-core/src/ragbits/core/config.py                                                                 7       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/__init__.py                                                        67       6  91.04%   40-48
packages/ragbits-core/src/ragbits/core/audit/base.py                                                            31       0  100.00%
packages/ragbits-core/src/ragbits/core/audit/otel.py                                                            36      16  55.56%   20-21, 35-41, 51-54, 64-67, 96
packages/ragbits-core/src/ragbits/core/embeddings/__init__.py                                                   11       0  100.00%
packages/ragbits-core/src/ragbits/core/embeddings/base.py                                                        8       2  75.00%   28, 40
packages/ragbits-core/src/ragbits/core/embeddings/noop.py                                                        6       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/__init__.py                                                         14       8  42.86%   27-39
packages/ragbits-core/src/ragbits/core/llms/base.py                                                             47       9  80.85%   46, 65, 149-157, 160-162
packages/ragbits-core/src/ragbits/core/llms/factory.py                                                          23       4  82.61%   50, 53, 66, 77
packages/ragbits-core/src/ragbits/core/llms/litellm.py                                                          40      13  67.50%   9-10, 55, 86, 92-109
packages/ragbits-core/src/ragbits/core/llms/types.py                                                             9       2  77.78%   25, 29
packages/ragbits-core/src/ragbits/core/llms/clients/__init__.py                                                  4       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/clients/base.py                                                     26       0  100.00%
packages/ragbits-core/src/ragbits/core/llms/clients/exceptions.py                                               17       7  58.82%   7-8, 17, 26-27, 36, 45
packages/ragbits-core/src/ragbits/core/llms/clients/litellm.py                                                  77      21  72.73%   11-12, 72, 120, 155-176, 196-201, 212
packages/ragbits-core/src/ragbits/core/llms/clients/local.py                                                    52      24  53.85%   10-13, 65-73, 94-105, 126-142
packages/ragbits-core/src/ragbits/core/metadata_stores/__init__.py                                              12       3  75.00%   27-30
packages/ragbits-core/src/ragbits/core/metadata_stores/base.py                                                   6       0  100.00%
packages/ragbits-core/src/ragbits/core/metadata_stores/exceptions.py                                             4       0  100.00%
packages/ragbits-core/src/ragbits/core/metadata_stores/in_memory.py                                             16       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/__init__.py                                                        2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/base.py                                                           20       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/parsers.py                                                        35       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/prompt.py                                                        124       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/__init__.py                                              2       0  100.00%
packages/ragbits-core/src/ragbits/core/prompt/discovery/prompt_discovery.py                                     33       2  93.94%   54-55
packages/ragbits-core/src/ragbits/core/utils/_pyproject.py                                                      37       0  100.00%
packages/ragbits-core/src/ragbits/core/utils/config_handling.py                                                 16       4  75.00%   32-33, 37-38
packages/ragbits-core/src/ragbits/core/utils/decorators.py                                                      29       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/__init__.py                                                 8       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/base.py                                                    26       1  96.15%   65
packages/ragbits-core/src/ragbits/core/vector_stores/chroma.py                                                  49       3  93.88%   57-58, 75
packages/ragbits-core/src/ragbits/core/vector_stores/in_memory.py                                               32       0  100.00%
packages/ragbits-core/src/ragbits/core/vector_stores/qdrant.py                                                  51       3  94.12%   53-54, 74
packages/ragbits-core/tests/unit/__init__.py                                                                     0       0  100.00%
packages/ragbits-core/tests/unit/audit/__init__.py                                                               0       0  100.00%
packages/ragbits-core/tests/unit/audit/test_otel.py                                                              7       0  100.00%
packages/ragbits-core/tests/unit/audit/test_trace.py                                                            88       3  96.59%   13, 16, 19
packages/ragbits-core/tests/unit/llms/__init__.py                                                                0       0  100.00%
packages/ragbits-core/tests/unit/llms/test_litellm.py                                                           64       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/__init__.py                                                        3       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_default_llm.py                                           10       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_get_llm_from_factory.py                                       8       0  100.00%
packages/ragbits-core/tests/unit/llms/factory/test_has_default_llm.py                                           10       0  100.00%
packages/ragbits-core/tests/unit/metadata_stores/__init__.py                                                     0       0  100.00%
packages/ragbits-core/tests/unit/metadata_stores/test_in_memory.py                                              22       0  100.00%
packages/ragbits-core/tests/unit/prompts/__init__.py                                                             0       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_parsers.py                                                        65       0  100.00%
packages/ragbits-core/tests/unit/prompts/test_prompt.py                                                        155       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/__init__.py                                                   0       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/prompt_classes_for_tests.py                                  30       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/test_prompt_discovery.py                                     18       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/__init__.py                    2       1  50.00%   3
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/__init__.py            3       2  33.33%   2-4
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt1.py       14       0  100.00%
packages/ragbits-core/tests/unit/prompts/discovery/ragbits_tests_pkg_with_prompts/prompts/temp_prompt2.py       14       0  100.00%
packages/ragbits-core/tests/unit/utils/test_decorators.py                                                       26       2  92.31%   17, 39
packages/ragbits-core/tests/unit/utils/pyproject/test_find.py                                                   13       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_config.py                                              9       0  100.00%
packages/ragbits-core/tests/unit/utils/pyproject/test_get_instace.py                                            37       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/__init__.py                                                       0       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_chroma.py                                                   38       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_in_memory.py                                                77       0  100.00%
packages/ragbits-core/tests/unit/vector_stores/test_qdrant.py                                                   33       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/__init__.py                                         2       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/_main.py                                           73       3  95.89%   159-160, 167
packages/ragbits-document-search/src/ragbits/document_search/documents/__init__.py                               0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/documents/document.py                              61       2  96.72%   99, 146
packages/ragbits-document-search/src/ragbits/document_search/documents/element.py                               56       2  96.43%   77, 152
packages/ragbits-document-search/src/ragbits/document_search/documents/exceptions.py                            11       5  54.55%   7-8, 17, 26-27
packages/ragbits-document-search/src/ragbits/document_search/documents/sources.py                              116      15  87.07%   15-16, 129, 212-217, 254-257, 261-262
packages/ragbits-document-search/src/ragbits/document_search/ingestion/__init__.py                               0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py                    28       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/__init__.py         11       1  90.91%   30
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/base.py             25       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/batched.py          18       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/processor_strategies/sequential.py       13       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/__init__.py                    10       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/base.py                        14       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/dummy.py                       20       7  65.00%   33, 54-60
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/__init__.py        4       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/default.py        46       4  91.30%   98, 103-104, 137
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/images.py         58      24  58.62%   76-83, 90-110, 122, 135
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/pdf.py            19       6  68.42%   23, 35-43
packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured/utils.py          38      11  71.05%   71, 82-83, 98-101, 110, 121-123
packages/ragbits-document-search/src/ragbits/document_search/retrieval/__init__.py                               0       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/__init__.py                   15       4  73.33%   39-44
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/base.py                        7       1  85.71%   32
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/llm.py                        22       9  59.09%   28-29, 47-50, 67-69
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/noop.py                        6       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rephrasers/prompts.py                    16       4  75.00%   49-54
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/__init__.py                    10       1  90.00%   27
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/base.py                        15       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/litellm.py                     18       0  100.00%
packages/ragbits-document-search/src/ragbits/document_search/retrieval/rerankers/noop.py                        11       0  100.00%
packages/ragbits-document-search/tests/__init__.py                                                               0       0  100.00%
packages/ragbits-document-search/tests/helpers.py                                                                3       0  100.00%
packages/ragbits-document-search/tests/integration/__init__.py                                                   0       0  100.00%
packages/ragbits-document-search/tests/integration/test_rerankers.py                                            15       6  60.00%   18-38
packages/ragbits-document-search/tests/integration/test_sources.py                                              23      10  56.52%   22-32, 40-45
packages/ragbits-document-search/tests/integration/test_unstructured.py                                         47      10  78.72%   46-52, 65-71
packages/ragbits-document-search/tests/unit/test_document_processor.py                                          17       0  100.00%
packages/ragbits-document-search/tests/unit/test_document_search.py                                             87       0  100.00%
packages/ragbits-document-search/tests/unit/test_documents.py                                                   13       0  100.00%
packages/ragbits-document-search/tests/unit/test_elements.py                                                    19       0  100.00%
packages/ragbits-document-search/tests/unit/test_local_file_source.py                                           13       0  100.00%
packages/ragbits-document-search/tests/unit/test_processing_strategies.py                                       19       0  100.00%
packages/ragbits-document-search/tests/unit/test_providers.py                                                   31       0  100.00%
packages/ragbits-document-search/tests/unit/test_rerankers.py                                                   31       1  96.77%   21
packages/ragbits-document-search/tests/unit/test_source_discriminator.py                                        35       0  100.00%
packages/ragbits-document-search/tests/unit/test_sources.py                                                     25       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/__init__.py                                                   0       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/base.py                                                      15       0  100.00%
packages/ragbits-guardrails/src/ragbits/guardrails/openai_moderation.py                                         19       5  73.68%   29-33
packages/ragbits-guardrails/tests/unit/test_openai_moderation.py                                                35       0  100.00%
TOTAL                                                                                                         2913     267  90.83%

Diff against main

Filename      Stmts    Miss  Cover
----------  -------  ------  --------
TOTAL             0       0  +100.00%

Results for commit: a81144f

Minimum allowed coverage is 60%

♻️ This comment has been updated with latest results

@kdziedzic68 kdziedzic68 force-pushed the 127-featdatasets-add-synthetic-data-generation-pipeline branch from 53ac0c2 to 1448a81 Compare November 12, 2024 11:56
Copy link
Contributor

github-actions bot commented Nov 12, 2024

Trivy scanning results.

.venv/lib/python3.10/site-packages/litellm/llms/huggingface_llms_metadata/hf_text_generation_models.txt (secrets)

Total: 1 (MEDIUM: 0, HIGH: 0, CRITICAL: 1)

CRITICAL: HuggingFace (hugging-face-access-token)
════════════════════════════════════════
Hugging Face Access Token
────────────────────────────────────────
.venv/lib/python3.10/site-packages/litellm/llms/huggingface_llms_metadata/hf_text_generation_models.txt:36162
────────────────────────────────────────
36160 mncai/Llama2-7B-Active_3rd-floor-LoRA-dim64_epoch4
36161 ajcdp/CM
36162 [ Nagharjun17/*************************************
36163 BigSalmon/InformalToFormalLincoln114Paraphrase
────────────────────────────────────────

.venv/lib/python3.10/site-packages/litellm/proxy/_types.py (secrets)

Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0)

MEDIUM: Slack (slack-web-hook)
════════════════════════════════════════
Slack Webhook
────────────────────────────────────────
.venv/lib/python3.10/site-packages/litellm/proxy/_types.py:1288
────────────────────────────────────────
1286 alert_to_webhook_url: Optional[Dict] = Field(
1287 None,
1288 [ bhook_url: {'budget_alerts': '*****************************************************************************'}`",
1289 )
────────────────────────────────────────

.venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA (secrets)

Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0)

MEDIUM: JWT (jwt-token)
════════════════════════════════════════
JWT token
────────────────────────────────────────
.venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA:80
────────────────────────────────────────
78 >>> encoded = jwt.encode({"some": "payload"}, "secret", algorithm="HS256")
79 >>> print(encoded)
80 [ *********************************************************************************************************
81 >>> jwt.decode(encoded, "secret", algorithms=["HS256"])
────────────────────────────────────────

@kdziedzic68 kdziedzic68 marked this pull request as ready for review November 12, 2024 15:27
@kdziedzic68 kdziedzic68 force-pushed the 127-featdatasets-add-synthetic-data-generation-pipeline branch from 37dfe7c to e035bb4 Compare November 19, 2024 19:29
@kdziedzic68 kdziedzic68 merged commit d11ff73 into main Nov 20, 2024
6 checks passed
@mhordynski mhordynski deleted the 127-featdatasets-add-synthetic-data-generation-pipeline branch November 21, 2024 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
document search Changes to the document search package evals Adding new evaluation pipelines or improving existing ones feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(datasets): add synthetic data generation pipeline
3 participants