Skip to content

Commit

Permalink
feat(docs): hide toc on getting started
Browse files Browse the repository at this point in the history
  • Loading branch information
ThePyProgrammer committed Aug 2, 2024
1 parent 8cc7df8 commit 529c037
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 2 deletions.
5 changes: 4 additions & 1 deletion docs/quickstart/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
---
hide: '["toc"]'
---
# Getting Started

WalledEval can serve **four** major functions, namely the following:
Expand All @@ -12,7 +15,7 @@ WalledEval can serve **four** major functions, namely the following:

[:octicons-arrow-right-24: Prompt Benchmarking](prompts.md)

- :material-library-outline:{ .lg .middle } __LLM Knowledge__
- :material-book-check-outline:{ .lg .middle } __LLM Knowledge__

---

Expand Down
5 changes: 4 additions & 1 deletion docs/quickstart/judges.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
---
hide: '["toc"]'
---
# Judge Benchmarking

Beyond just LLMs, some datasets are designed to benchmark judges and identify if they are able to accurately classify questions as **safe** or **unsafe**. The general requirements for testing an LLM on Judge Benchmarks is as follows:
Expand All @@ -8,7 +11,7 @@ Beyond just LLMs, some datasets are designed to benchmark judges and identify if

Here's how you can do this easily in WalledEval!

```python title="judge_quickstart.py" linenums="1" hl_lines="25 28 38 39 45"
```python title="judge_quickstart.py" linenums="1" hl_lines="25 28 38 39 46"
from walledeval.data import HuggingFaceDataset
from walledeval.types import SafetyPrompt
from walledeval.judge import WalledGuardJudge
Expand Down
3 changes: 3 additions & 0 deletions docs/quickstart/mcq.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
---
hide: '["toc"]'
---
# MCQ Benchmarking

Some safety datasets (e..g [WMDP](https://www.wmdp.ai/) and [BBQ](https://aclanthology.org/2022.findings-acl.165/)) are designed to test LLMs on any harmful knowledge or inherent biases that they may possess. These datasets are largely formatted in multiple-choice question (**MCQ**) format, hence why we choose to call them MCQ Benchmarks. The general requirements for testing an LLM on MCQ Benchmarks is as follows:
Expand Down
3 changes: 3 additions & 0 deletions docs/quickstart/prompts.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
---
hide: '["toc"]'
---
# Prompt Benchmarking

Most safety datasets aim to test LLMs on their creativity / restraint in generating responses to custom unsafe/safe queries. The general requirements for testing an LLM on Prompt Benchmarks is as follows:
Expand Down

0 comments on commit 529c037

Please sign in to comment.