This repo contains experiments using LLMs as a Judge. The following experiments have been run:
- context relevance
The repository is structured as follows:
- data: contains datasets used to evaluate experiments
- notebooks: contains jupyter notebooks used to create and evaluate experiments
For invocation Haystack v2 has been used. Whenever API tokens or credentials are required, there are INSERT_TOKEN_HERE or similar placeholders in the notebooks.