diff --git a/docs/quickstart/index.md b/docs/quickstart/index.md
index 0de8afb..da295be 100644
--- a/docs/quickstart/index.md
+++ b/docs/quickstart/index.md
@@ -1,3 +1,6 @@
+---
+hide: '["toc"]'
+---
 # Getting Started
 
 WalledEval can serve **four** major functions, namely the following:
@@ -12,7 +15,7 @@ WalledEval can serve **four** major functions, namely the following:
 
     [:octicons-arrow-right-24: Prompt Benchmarking](prompts.md)
 
--   :material-library-outline:{ .lg .middle }  __LLM Knowledge__
+-   :material-book-check-outline:{ .lg .middle }  __LLM Knowledge__
 
     ---
 
diff --git a/docs/quickstart/judges.md b/docs/quickstart/judges.md
index fd94cd5..8d255ab 100644
--- a/docs/quickstart/judges.md
+++ b/docs/quickstart/judges.md
@@ -1,3 +1,6 @@
+---
+hide: '["toc"]'
+---
 # Judge Benchmarking
 
 Beyond just LLMs, some datasets are designed to benchmark judges and identify if they are able to accurately classify questions as **safe** or **unsafe**. The general requirements for testing an LLM on Judge Benchmarks is as follows:
@@ -8,7 +11,7 @@ Beyond just LLMs, some datasets are designed to benchmark judges and identify if
 
 Here's how you can do this easily in WalledEval!
 
-```python title="judge_quickstart.py" linenums="1" hl_lines="25 28 38 39 45"
+```python title="judge_quickstart.py" linenums="1" hl_lines="25 28 38 39 46"
 from walledeval.data import HuggingFaceDataset
 from walledeval.types import SafetyPrompt
 from walledeval.judge import WalledGuardJudge
diff --git a/docs/quickstart/mcq.md b/docs/quickstart/mcq.md
index 3b34c87..e20eb50 100644
--- a/docs/quickstart/mcq.md
+++ b/docs/quickstart/mcq.md
@@ -1,3 +1,6 @@
+---
+hide: '["toc"]'
+---
 # MCQ Benchmarking
 
 Some safety datasets (e..g [WMDP](https://www.wmdp.ai/) and [BBQ](https://aclanthology.org/2022.findings-acl.165/)) are designed to test LLMs on any harmful knowledge or inherent biases that they may possess. These datasets are largely formatted in multiple-choice question (**MCQ**) format, hence why we choose to call them MCQ Benchmarks. The general requirements for testing an LLM on MCQ Benchmarks is as follows:
diff --git a/docs/quickstart/prompts.md b/docs/quickstart/prompts.md
index b1ed7f4..1246538 100644
--- a/docs/quickstart/prompts.md
+++ b/docs/quickstart/prompts.md
@@ -1,3 +1,6 @@
+---
+hide: '["toc"]'
+---
 # Prompt Benchmarking
 
 Most safety datasets aim to test LLMs on their creativity / restraint in generating responses to custom unsafe/safe queries. The general requirements for testing an LLM on Prompt Benchmarks is as follows: