diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 9b59c61817..4672ce0960 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -36,7 +36,22 @@
title: Overview
- local: clickhouse
title: ClickHouse
- - local: duckdb
+ - isExpanded: false
+ sections:
+ - local: duckdb
+ title: General Usage
+ - local: duckdb_cli
+ title: DuckDB CLI
+ - local: duckdb_cli_auth
+ title: Authentication for private and gated datasets
+ - local: duckdb_cli_select
+ title: Query datasets
+ - local: duckdb_cli_sql
+ title: Perform SQL operations
+ - local: duckdb_cli_combine_and_export
+ title: Combine datasets and export
+ - local: duckdb_cli_vector_similarity_search
+ title: Perform vector similarity search
title: DuckDB
- local: pandas
title: Pandas
diff --git a/docs/source/duckdb_cli.md b/docs/source/duckdb_cli.md
new file mode 100644
index 0000000000..1ec43419a0
--- /dev/null
+++ b/docs/source/duckdb_cli.md
@@ -0,0 +1,57 @@
+# DuckDB CLI
+
+The [DuckDB CLI](https://duckdb.org/docs/api/cli/overview.html) (Command Line Interface) is a single, dependency-free executable.
+
+
+
+For installation details, visit the [installation page](https://duckdb.org/docs/installation).
+
+
+
+Starting from version `v0.10.3`, the DuckDB CLI includes native support for accessing datasets on the Hugging Face Hub via URLs. Here are some features you can leverage with this powerful tool:
+
+- Query public datasets and your own gated and private datasets
+- Analyze datasets and perform SQL operations
+- Combine datasets and export it to different formats
+- Conduct vector similarity search on embedding datasets
+- Implement full-text search on datasets
+
+For a complete list of DuckDB features, visit the DuckDB [documentation](https://duckdb.org/docs/).
+
+To start the CLI, execute the following command in the installation folder:
+
+```bash
+./duckdb
+```
+
+## Forming the Hugging Face URL
+
+To access Hugging Face datasets, use the following URL format:
+
+```plaintext
+hf://datasets/{my-username}/{my-dataset}/{path_to_parquet_file}
+```
+
+- **my-username**, the user or organization of the dataset, e.g. `ibm`
+- **my-dataset**, the dataset name, e.g: `duorc`
+- **path_to_parquet_file**, the parquet file path which supports glob patterns, e.g `**/*.parquet`, to query all parquet files
+
+
+
+
+You can query auto-converted Parquet files using the @~parquet branch, which corresponds to the refs/convert/parquet revision. For more details, refer to the documentation at https://huggingface.co/docs/datasets-server/en/parquet#conversion-to-parquet.
+
+
+
+Let's start with a quick demo to query all the rows of a dataset:
+
+```sql
+FROM 'hf://datasets/ibm/duorc/ParaphraseRC/*.parquet' LIMIT 3;
+```
+
+Or using traditional SQL syntax:
+
+```sql
+SELECT * FROM 'hf://datasets/ibm/duorc/ParaphraseRC/*.parquet' LIMIT 3;
+```
+In the following sections, we will cover more complex operations you can perform with DuckDB on Hugging Face datasets.
diff --git a/docs/source/duckdb_cli_auth.md b/docs/source/duckdb_cli_auth.md
new file mode 100644
index 0000000000..32c2d37a24
--- /dev/null
+++ b/docs/source/duckdb_cli_auth.md
@@ -0,0 +1,46 @@
+# Authentication for private and gated datasets
+
+To access private or gated datasets, you need to configure your Hugging Face Token in the DuckDB Secrets Manager.
+
+Visit [Hugging Face Settings - Tokens](https://huggingface.co/settings/tokens) to obtain your access token.
+
+DuckDB supports two providers for managing secrets:
+
+- `CONFIG`: Requires the user to pass all configuration information into the CREATE SECRET statement.
+- `CREDENTIAL_CHAIN`: Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from `~/.cache/huggingface/token`.
+
+For more information about DuckDB Secrets visit the [Secrets Manager](https://duckdb.org/docs/configuration/secrets_manager.html) guide.
+
+## Creating a secret with `CONFIG` provider
+
+To create a secret using the CONFIG provider, use the following command:
+
+```bash
+CREATE SECRET hf_token (TYPE HUGGINGFACE, TOKEN 'your_hf_token');
+```
+
+Replace `your_hf_token` with your actual Hugging Face token.
+
+## Creating a secret with `CREDENTIAL_CHAIN` provider
+
+To create a secret using the CREDENTIAL_CHAIN provider, use the following command:
+
+```bash
+CREATE SECRET hf_token (TYPE HUGGINGFACE, PROVIDER credential_chain);
+```
+
+This command automatically retrieves the stored token from `~/.cache/huggingface/token`.
+
+If you haven't configured your token, execute the following command in the terminal:
+
+```bash
+huggingface-cli login
+```
+
+Alternatively, you can set your Hugging Face token as an environment variable:
+
+```bash
+export HF_TOKEN="HF_XXXXXXXXXXXXX"
+```
+
+For more information on authentication, see the [Hugging Face authentication](https://huggingface.co/docs/huggingface_hub/main/en/quick-start#authentication) documentation.
diff --git a/docs/source/duckdb_cli_combine_and_export.md b/docs/source/duckdb_cli_combine_and_export.md
new file mode 100644
index 0000000000..c0d504b87e
--- /dev/null
+++ b/docs/source/duckdb_cli_combine_and_export.md
@@ -0,0 +1,105 @@
+# Combine datasets and export
+
+In this section, we'll combine two datasets and export the result. Let's start with our datasets:
+
+
+The first will be [TheFusion21/PokemonCards](https://huggingface.co/datasets/TheFusion21/PokemonCards):
+
+```bash
+FROM 'hf://datasets/TheFusion21/PokemonCards/train.csv' LIMIT 3;
+┌─────────┬──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┬───────┬─────────────────┐
+│ id │ image_url │ caption │ name │ hp │ set_name │
+│ varchar │ varchar │ varchar │ varchar │ int64 │ varchar │
+├─────────┼──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼───────┼─────────────────┤
+│ pl3-1 │ https://images.pok… │ A Basic, SP Pokemon Card of type Darkness with the title Absol G and 70 HP of rarity Rare Holo from the set Supreme Victors. It has … │ Absol G │ 70 │ Supreme Victors │
+│ ex12-1 │ https://images.pok… │ A Stage 1 Pokemon Card of type Colorless with the title Aerodactyl and 70 HP of rarity Rare Holo evolved from Mysterious Fossil from … │ Aerodactyl │ 70 │ Legend Maker │
+│ xy5-1 │ https://images.pok… │ A Basic Pokemon Card of type Grass with the title Weedle and 50 HP of rarity Common from the set Primal Clash and the flavor text: It… │ Weedle │ 50 │ Primal Clash │
+└─────────┴──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┴───────┴─────────────────┘
+```
+
+And the second one will be [wanghaofan/pokemon-wiki-captions](https://huggingface.co/datasets/wanghaofan/pokemon-wiki-captions):
+
+```bash
+FROM 'hf://datasets/wanghaofan/pokemon-wiki-captions/data/*.parquet' LIMIT 3;
+
+┌──────────────────────┬───────────┬──────────┬──────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐
+│ image │ name_en │ name_zh │ text_en │ text_zh │
+│ struct(bytes blob,… │ varchar │ varchar │ varchar │ varchar │
+├──────────────────────┼───────────┼──────────┼──────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
+│ {'bytes': \x89PNG\… │ abomasnow │ 暴雪王 │ Grass attributes,Blizzard King standing on two feet, with … │ 草属性,双脚站立的暴雪王,全身白色的绒毛,淡紫色的眼睛,几缕长条装的毛皮盖着它的嘴巴 │
+│ {'bytes': \x89PNG\… │ abra │ 凯西 │ Super power attributes, the whole body is yellow, the head… │ 超能力属性,通体黄色,头部外形类似狐狸,尖尖鼻子,手和脚上都有三个指头,长尾巴末端带着一个褐色圆环 │
+│ {'bytes': \x89PNG\… │ absol │ 阿勃梭鲁 │ Evil attribute, with white hair, blue-gray part without ha… │ 恶属性,有白色毛发,没毛发的部分是蓝灰色,头右边类似弓的角,红色眼睛 │
+└──────────────────────┴───────────┴──────────┴──────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+Now, let's try to combine these two datasets by joining on the `name` column:
+
+```bash
+SELECT a.image_url
+ , a.caption AS card_caption
+ , a.name
+ , a.hp
+ , b.text_en as wiki_caption
+FROM 'hf://datasets/TheFusion21/PokemonCards/train.csv' a
+JOIN 'hf://datasets/wanghaofan/pokemon-wiki-captions/data/*.parquet' b
+ON LOWER(a.name) = b.name_en
+LIMIT 3;
+
+┌──────────────────────┬──────────────────────┬────────────┬───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
+│ image_url │ card_caption │ name │ hp │ wiki_caption │
+│ varchar │ varchar │ varchar │ int64 │ varchar │
+├──────────────────────┼──────────────────────┼────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
+│ https://images.pok… │ A Stage 1 Pokemon … │ Aerodactyl │ 70 │ A Pokémon with rock attributes, gray body, blue pupils, purple inner wings, two sharp claws on the wings, jagged teeth, and an arrow-like … │
+│ https://images.pok… │ A Basic Pokemon Ca… │ Weedle │ 50 │ Insect-like, caterpillar-like in appearance, with a khaki-yellow body, seven pairs of pink gastropods, a pink nose, a sharp poisonous need… │
+│ https://images.pok… │ A Basic Pokemon Ca… │ Caterpie │ 50 │ Insect attributes, caterpillar appearance, green back, white abdomen, Y-shaped red antennae on the head, yellow spindle-shaped tail, two p… │
+└──────────────────────┴──────────────────────┴────────────┴───────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+We can export the result to a Parquet file using the `COPY` command:
+
+```bash
+COPY (SELECT a.image_url
+ , a.caption AS card_caption
+ , a.name
+ , a.hp
+ , b.text_en as wiki_caption
+FROM 'hf://datasets/TheFusion21/PokemonCards/train.csv' a
+JOIN 'hf://datasets/wanghaofan/pokemon-wiki-captions/data/*.parquet' b
+ON LOWER(a.name) = b.name_en)
+TO 'output.parquet' (FORMAT PARQUET);
+```
+
+Let's validate the new Parquet file:
+
+```bash
+SELECT COUNT(*) FROM 'output.parquet';
+
+┌──────────────┐
+│ count_star() │
+│ int64 │
+├──────────────┤
+│ 9460 │
+└──────────────┘
+
+```
+
+
+
+You can also export to [CSV](https://duckdb.org/docs/guides/file_formats/csv_export), [Excel](https://duckdb.org/docs/guides/file_formats/excel_export
+) and [JSON](https://duckdb.org/docs/guides/file_formats/json_export
+) formats.
+
+
+
+Finally, let's push the resulting dataset to the Hub using the [Datasets](https://huggingface.co/docs/datasets/index) library in Python:
+
+```python
+from datasets import load_dataset
+
+dataset = load_dataset("parquet", data_files="output.parquet")
+dataset.push_to_hub("asoria/duckdb_combine_demo")
+```
+
+And that's it! You've successfully combined two datasets, exported the result, and uploaded it to the Hugging Face Hub.
diff --git a/docs/source/duckdb_cli_select.md b/docs/source/duckdb_cli_select.md
new file mode 100644
index 0000000000..d126737c04
--- /dev/null
+++ b/docs/source/duckdb_cli_select.md
@@ -0,0 +1,150 @@
+# Query datasets
+
+Querying datasets is a fundamental step in data analysis. Here, we'll guide you through querying datasets using various methods.
+
+There are several [different ways](https://duckdb.org/docs/data/parquet/overview.html) to select your data.
+
+Using the `FROM` syntax:
+```bash
+FROM 'hf://datasets/jamescalam/world-cities-geo/train.jsonl' SELECT city, country, region LIMIT 3;
+
+┌────────────────┬─────────────┬───────────────┐
+│ city │ country │ region │
+│ varchar │ varchar │ varchar │
+├────────────────┼─────────────┼───────────────┤
+│ Kabul │ Afghanistan │ Southern Asia │
+│ Kandahar │ Afghanistan │ Southern Asia │
+│ Mazar-e Sharif │ Afghanistan │ Southern Asia │
+└────────────────┴─────────────┴───────────────┘
+
+```
+
+Using the `SELECT` and `FROM` syntax:
+
+```bash
+SELECT city, country, region FROM 'hf://datasets/jamescalam/world-cities-geo/train.jsonl' USING SAMPLE 3;
+
+┌──────────┬─────────┬────────────────┐
+│ city │ country │ region │
+│ varchar │ varchar │ varchar │
+├──────────┼─────────┼────────────────┤
+│ Wenzhou │ China │ Eastern Asia │
+│ Valdez │ Ecuador │ South America │
+│ Aplahoue │ Benin │ Western Africa │
+└──────────┴─────────┴────────────────┘
+
+```
+
+Count all parquet files matching a glob pattern:
+
+```bash
+SELECT COUNT(*) FROM 'hf://datasets/jamescalam/world-cities-geo/*.jsonl';
+
+┌──────────────┐
+│ count_star() │
+│ int64 │
+├──────────────┤
+│ 9083 │
+└──────────────┘
+
+```
+
+You can also query Parquet files using the read_parquet and parquet_scan functions. Let's explore these functions using the auto-converted Parquet files from the same dataset.
+
+Select using [read_parquet](https://duckdb.org/docs/guides/file_formats/query_parquet.html) function:
+
+```bash
+SELECT * FROM read_parquet('hf://datasets/jamescalam/world-cities-geo@~parquet/default/**/*.parquet') LIMIT 3;
+```
+
+Read all files that match a glob pattern and include a filename column specifying which file each row came from:
+
+```bash
+SELECT * FROM read_parquet('hf://datasets/jamescalam/world-cities-geo@~parquet/default/**/*.parquet', filename = true) LIMIT 3;
+```
+
+Using [`parquet_scan`](https://duckdb.org/docs/data/parquet/overview) function:
+
+```bash
+SELECT * FROM parquet_scan('hf://datasets/jamescalam/world-cities-geo@~parquet/default/**/*.parquet') LIMIT 3;
+```
+
+## Get metadata and schema
+
+The [parquet_metadata](https://duckdb.org/docs/data/parquet/metadata.html) function can be used to query the metadata contained within a Parquet file.
+
+```bash
+SELECT * FROM parquet_metadata('hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet');
+
+┌───────────────────────────────────────────────────────────────────────────────┬──────────────┬────────────────────┬─────────────┐
+│ file_name │ row_group_id │ row_group_num_rows │ compression │
+│ varchar │ int64 │ int64 │ varchar │
+├───────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────────────┼─────────────┤
+│ hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet │ 0 │ 1000 │ SNAPPY │
+│ hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet │ 0 │ 1000 │ SNAPPY │
+│ hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet │ 0 │ 1000 │ SNAPPY │
+└───────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────────┘
+
+```
+
+Fetch the column names and column types:
+
+```bash
+DESCRIBE SELECT * FROM 'hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet';
+
+┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
+│ column_name │ column_type │ null │ key │ default │ extra │
+│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
+├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
+│ city │ VARCHAR │ YES │ │ │ │
+│ country │ VARCHAR │ YES │ │ │ │
+│ region │ VARCHAR │ YES │ │ │ │
+│ continent │ VARCHAR │ YES │ │ │ │
+│ latitude │ DOUBLE │ YES │ │ │ │
+│ longitude │ DOUBLE │ YES │ │ │ │
+│ x │ DOUBLE │ YES │ │ │ │
+│ y │ DOUBLE │ YES │ │ │ │
+│ z │ DOUBLE │ YES │ │ │ │
+└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
+
+```
+
+Fetch the internal schema (excluding the file name):
+
+```bash
+SELECT * EXCLUDE (file_name) FROM parquet_schema('hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet');
+
+┌───────────┬────────────┬─────────────┬─────────────────┬──────────────┬────────────────┬───────┬───────────┬──────────┬──────────────┐
+│ name │ type │ type_length │ repetition_type │ num_children │ converted_type │ scale │ precision │ field_id │ logical_type │
+│ varchar │ varchar │ varchar │ varchar │ int64 │ varchar │ int64 │ int64 │ int64 │ varchar │
+├───────────┼────────────┼─────────────┼─────────────────┼──────────────┼────────────────┼───────┼───────────┼──────────┼──────────────┤
+│ schema │ │ │ REQUIRED │ 9 │ │ │ │ │ │
+│ city │ BYTE_ARRAY │ │ OPTIONAL │ │ UTF8 │ │ │ │ StringType() │
+│ country │ BYTE_ARRAY │ │ OPTIONAL │ │ UTF8 │ │ │ │ StringType() │
+│ region │ BYTE_ARRAY │ │ OPTIONAL │ │ UTF8 │ │ │ │ StringType() │
+│ continent │ BYTE_ARRAY │ │ OPTIONAL │ │ UTF8 │ │ │ │ StringType() │
+│ latitude │ DOUBLE │ │ OPTIONAL │ │ │ │ │ │ │
+│ longitude │ DOUBLE │ │ OPTIONAL │ │ │ │ │ │ │
+│ x │ DOUBLE │ │ OPTIONAL │ │ │ │ │ │ │
+│ y │ DOUBLE │ │ OPTIONAL │ │ │ │ │ │ │
+│ z │ DOUBLE │ │ OPTIONAL │ │ │ │ │ │ │
+├───────────┴────────────┴─────────────┴─────────────────┴──────────────┴────────────────┴───────┴───────────┴──────────┴──────────────┤
+
+```
+
+## Get statistics
+
+The `SUMMARIZE` command can be used to get various aggregates over a query (min, max, approx_unique, avg, std, q25, q50, q75, count). It returns these statistics along with the column name, column type, and the percentage of NULL values.
+
+```bash
+SUMMARIZE SELECT latitude, longitude FROM 'hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet';
+
+┌─────────────┬─────────────┬──────────────┬─────────────┬───────────────┬────────────────────┬───────────────────┬────────────────────┬───────────────────┬────────────────────┬───────┐
+│ column_name │ column_type │ min │ max │ approx_unique │ avg │ std │ q25 │ q50 │ q75 │ count │
+│ varchar │ varchar │ varchar │ varchar │ int64 │ varchar │ varchar │ varchar │ varchar │ varchar │ int64 │
+├─────────────┼─────────────┼──────────────┼─────────────┼───────────────┼────────────────────┼───────────────────┼────────────────────┼───────────────────┼────────────────────┼───────┤
+│ latitude │ DOUBLE │ -54.8 │ 67.8557214 │ 7324 │ 22.5004568364307 │ 26.77045468469093 │ 6.065424395863388 │ 29.33687520478191 │ 44.88357641321427 │ 9083 │
+│ longitude │ DOUBLE │ -175.2166595 │ 179.3833313 │ 7802 │ 14.699333721953098 │ 63.93672742608224 │ -7.077471714978484 │ 19.19758476462836 │ 43.782932169927165 │ 9083 │
+└─────────────┴─────────────┴──────────────┴─────────────┴───────────────┴────────────────────┴───────────────────┴────────────────────┴───────────────────┴────────────────────┴───────┘
+
+```
diff --git a/docs/source/duckdb_cli_sql.md b/docs/source/duckdb_cli_sql.md
new file mode 100644
index 0000000000..33714d1e49
--- /dev/null
+++ b/docs/source/duckdb_cli_sql.md
@@ -0,0 +1,159 @@
+# Perform SQL operations
+
+Performing SQL operations with DuckDB opens up a world of possibilities for querying datasets efficiently. Let's dive into some examples showcasing the power of DuckDB functions.
+
+For our demonstration, we'll explore a fascinating dataset. The [MMLU](https://huggingface.co/datasets/cais/mmlu) dataset is a multitask test containing multiple-choice questions spanning various knowledge domains.
+
+To preview the dataset, let's select a sample of 3 rows:
+
+```bash
+FROM 'hf://datasets/cais/mmlu/all/test-*.parquet' USING SAMPLE 3;
+
+┌──────────────────────┬──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────┐
+│ question │ subject │ choices │ answer │
+│ varchar │ varchar │ varchar[] │ int64 │
+├──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┤
+│ Dr. Harry Holliday… │ professional_psych… │ [discuss his vacation plans with his current clients ahead of time so that they know he’ll be unavailable during that time., give his clients a phone … │ 2 │
+│ A resident of a st… │ professional_law │ [The resident would succeed, because the logging company's selling of the timber would entitle the resident to re-enter and terminate the grant to the… │ 2 │
+│ Moderate and frequ… │ miscellaneous │ [dispersed alluvial fan soil, heavy-textured soil, such as silty clay, light-textured soil, such as loamy sand, region of low humidity] │ 2 │
+└──────────────────────┴──────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┘
+
+```
+
+This command retrieves a random sample of 3 rows from the dataset for us to examine.
+
+Let's start by examining the schema of our dataset. The following table outlines the structure of our dataset:
+
+```bash
+DESCRIBE FROM 'hf://datasets/cais/mmlu/all/test-*.parquet' USING SAMPLE 3;
+┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
+│ column_name │ column_type │ null │ key │ default │ extra │
+│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
+├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
+│ question │ VARCHAR │ YES │ │ │ │
+│ subject │ VARCHAR │ YES │ │ │ │
+│ choices │ VARCHAR[] │ YES │ │ │ │
+│ answer │ BIGINT │ YES │ │ │ │
+└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
+
+```
+Next, let's analyze if there are any duplicated records in our dataset:
+
+```bash
+SELECT *,
+ COUNT(*) AS counts
+FROM 'hf://datasets/cais/mmlu/all/test-*.parquet'
+GROUP BY ALL
+HAVING counts > 2;
+
+┌──────────┬─────────┬───────────┬────────┬────────┐
+│ question │ subject │ choices │ answer │ counts │
+│ varchar │ varchar │ varchar[] │ int64 │ int64 │
+├──────────┴─────────┴───────────┴────────┴────────┤
+│ 0 rows │
+└──────────────────────────────────────────────────┘
+
+```
+
+Fortunately, our dataset doesn't contain any duplicate records.
+
+Let's see the proportion of questions based on the subject in a bar representation:
+
+```bash
+SELECT
+ subject,
+ COUNT(*) AS counts,
+ BAR(COUNT(*), 0, (SELECT COUNT(*) FROM 'hf://datasets/cais/mmlu/all/test-*.parquet')) AS percentage
+FROM
+ 'hf://datasets/cais/mmlu/all/test-*.parquet'
+GROUP BY
+ subject
+ORDER BY
+ counts DESC;
+
+┌──────────────────────────────┬────────┬────────────────────────────────────────────────────────────────────────────────┐
+│ subject │ counts │ percentage │
+│ varchar │ int64 │ varchar │
+├──────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────────────────┤
+│ professional_law │ 1534 │ ████████▋ │
+│ moral_scenarios │ 895 │ █████ │
+│ miscellaneous │ 783 │ ████▍ │
+│ professional_psychology │ 612 │ ███▍ │
+│ high_school_psychology │ 545 │ ███ │
+│ high_school_macroeconomics │ 390 │ ██▏ │
+│ elementary_mathematics │ 378 │ ██▏ │
+│ moral_disputes │ 346 │ █▉ │
+├──────────────────────────────┴────────┴────────────────────────────────────────────────────────────────────────────────┤
+│ 57 rows (8 shown) 3 columns │
+└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+Now, let's prepare a subset of the dataset containing questions related to **nutrition** and create a mapping of questions to correct answers.
+Notice that we have the column **choices** from which we can get the correct answer using the **answer** column as an index.
+
+```bash
+SELECT *
+FROM 'hf://datasets/cais/mmlu/all/test-*.parquet'
+WHERE subject = 'nutrition' LIMIT 3;
+
+┌──────────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────┐
+│ question │ subject │ choices │ answer │
+│ varchar │ varchar │ varchar[] │ int64 │
+├──────────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┤
+│ Which foods tend t… │ nutrition │ [Meat, Confectionary, Fruits and vegetables, Potatoes] │ 2 │
+│ In which one of th… │ nutrition │ [If the incidence rate of the disease falls., If survival time with the disease increases., If recovery of the disease is faster., If the population in which the… │ 1 │
+│ Which of the follo… │ nutrition │ [The flavonoid class comprises flavonoids and isoflavonoids., The digestibility and bioavailability of isoflavones in soya food products are not changed by proce… │ 0 │
+└──────────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┘
+
+```
+
+```bash
+SELECT question,
+ choices[answer] AS correct_answer
+FROM 'hf://datasets/cais/mmlu/all/test-*.parquet'
+WHERE subject = 'nutrition' LIMIT 3;
+
+┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────┐
+│ question │ correct_answer │
+│ varchar │ varchar │
+├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────┤
+│ Which foods tend to be consumed in lower quantities in Wales and Scotland (as of 2020)?\n │ Confectionary │
+│ In which one of the following circumstances will the prevalence of a disease in the population increase, all else being constant?\n │ If the incidence rate of the disease falls. │
+│ Which of the following statements is correct?\n │ │
+└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────┘
+
+```
+
+To ensure data cleanliness, let's remove any newline characters at the end of the questions and filter out any empty answers:
+
+```bash
+SELECT regexp_replace(question, '\n', '') AS question,
+ choices[answer] AS correct_answer
+FROM 'hf://datasets/cais/mmlu/all/test-*.parquet'
+WHERE subject = 'nutrition' AND LENGTH(correct_answer) > 0 LIMIT 3;
+
+┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────┐
+│ question │ correct_answer │
+│ varchar │ varchar │
+├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────┤
+│ Which foods tend to be consumed in lower quantities in Wales and Scotland (as of 2020)? │ Confectionary │
+│ In which one of the following circumstances will the prevalence of a disease in the population increase, all else being constant? │ If the incidence rate of the disease falls. │
+│ Which vitamin is a major lipid-soluble antioxidant in cell membranes? │ Vitamin D │
+└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────┘
+
+```
+
+Finally, lets hightlight some of the DuckDB functions used in this section:
+- `DESCRIBE`, returns the table schema.
+- `USING SAMPLE`, samples are used to randomly select a subset of a dataset.
+- `BAR`, draws a band whose width is proportional to (x - min) and equal to width characters when x = max. Width defaults to 80.
+- `string[begin:end]`, extracts a string using slice conventions. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
+- `regexp_replace`, if the string contains the regexp pattern, replaces the matching part with replacement.
+- `LENGTH`, gets the number of characters in the string.
+
+
+
+There are plenty of useful functions available in DuckDB's [SQL functions overview](https://duckdb.org/docs/sql/functions/overview). The best part is that you can use them directly on Hugging Face datasets.
+
+
diff --git a/docs/source/duckdb_cli_vector_similarity_search.md b/docs/source/duckdb_cli_vector_similarity_search.md
new file mode 100644
index 0000000000..ef6aed3907
--- /dev/null
+++ b/docs/source/duckdb_cli_vector_similarity_search.md
@@ -0,0 +1,63 @@
+# Perform vector similarity search
+
+The Fixed-Length Arrays feature was added in DuckDB version 0.10.0. This lets you use vector embeddings in DuckDB tables, making your data analysis even more powerful.
+
+Additionally, the array_cosine_similarity function was introduced. This function measures the cosine of the angle between two vectors, indicating their similarity. A value of 1 means they’re perfectly aligned, 0 means they’re perpendicular, and -1 means they’re completely opposite.
+
+Let's explore how to use this function for similarity searches. In this section, we’ll show you how to perform similarity searches using DuckDB.
+
+We will use the [asoria/awesome-chatgpt-prompts-embeddings](https://huggingface.co/datasets/asoria/awesome-chatgpt-prompts-embeddings) dataset.
+
+First, let's preview a few records from the dataset:
+
+```bash
+FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet' SELECT act, prompt, len(embedding) as embed_len LIMIT 3;
+
+┌──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────┐
+│ act │ prompt │ embed_len │
+│ varchar │ varchar │ int64 │
+├──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
+│ Linux Terminal │ I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output insid… │ 384 │
+│ English Translator… │ I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language and you will detect the language, translate it and answer… │ 384 │
+│ `position` Intervi… │ I want you to act as an interviewer. I will be the candidate and you will ask me the interview questions for the `position` position. I want you to only reply as the inte… │ 384 │
+└──────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────┘
+
+```
+
+Next, let's choose an embedding to use for the similarity search:
+
+```bash
+FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet' SELECT embedding WHERE act = 'Linux Terminal';
+
+┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
+│ embedding │
+│ float[] │
+├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
+│ [-0.020781303, -0.029143505, -0.0660217, -0.00932716, -0.02601602, -0.011426172, 0.06627567, 0.11941507, 0.0013917526, 0.012889079, 0.053234346, -0.07380514, 0.04871567, -0.043601237, -0.0025319182, 0.0448… │
+└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+Now, let's use the selected embedding to find similar records:
+
+
+```bash
+SELECT act,
+ prompt,
+ array_cosine_similarity(embedding::float[384], (SELECT embedding FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet' WHERE act = 'Linux Terminal')::float[384]) AS similarity
+FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet'
+ORDER BY similarity DESC
+LIMIT 3;
+
+┌──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┐
+│ act │ prompt │ similarity │
+│ varchar │ varchar │ float │
+├──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
+│ Linux Terminal │ I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output insi… │ 1.0 │
+│ JavaScript Console │ I want you to act as a javascript console. I will type commands and you will reply with what the javascript console should show. I want you to only reply with the termin… │ 0.7599728 │
+│ R programming Inte… │ I want you to act as a R interpreter. I'll type commands and you'll reply with what the terminal should show. I want you to only reply with the terminal output inside on… │ 0.7303775 │
+└──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┘
+
+```
+
+That's it! You have successfully performed a vector similarity search using DuckDB.