diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 9b59c61817..4672ce0960 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -36,7 +36,22 @@
           title: Overview
         - local: clickhouse
           title: ClickHouse
-        - local: duckdb
+        - isExpanded: false
+          sections:
+            - local: duckdb
+              title: General Usage
+            - local: duckdb_cli
+              title: DuckDB CLI
+            - local: duckdb_cli_auth
+              title: Authentication for private and gated datasets
+            - local: duckdb_cli_select
+              title: Query datasets
+            - local: duckdb_cli_sql
+              title: Perform SQL operations
+            - local: duckdb_cli_combine_and_export
+              title: Combine datasets and export
+            - local: duckdb_cli_vector_similarity_search
+              title: Perform vector similarity search
           title: DuckDB
         - local: pandas
           title: Pandas
diff --git a/docs/source/duckdb_cli.md b/docs/source/duckdb_cli.md
new file mode 100644
index 0000000000..1ec43419a0
--- /dev/null
+++ b/docs/source/duckdb_cli.md
@@ -0,0 +1,57 @@
+# DuckDB CLI
+
+The [DuckDB CLI](https://duckdb.org/docs/api/cli/overview.html) (Command Line Interface) is a single, dependency-free executable. 
+
+<Tip>
+
+For installation details, visit the [installation page](https://duckdb.org/docs/installation).
+
+</Tip>
+
+Starting from version `v0.10.3`, the DuckDB CLI includes native support for accessing datasets on the Hugging Face Hub via URLs. Here are some features you can leverage with this powerful tool:
+
+- Query public datasets and your own gated and private datasets
+- Analyze datasets and perform SQL operations
+- Combine datasets and export it to different formats
+- Conduct vector similarity search on embedding datasets
+- Implement full-text search on datasets
+
+For a complete list of DuckDB features, visit the DuckDB [documentation](https://duckdb.org/docs/).
+
+To start the CLI, execute the following command in the installation folder:
+
+```bash
+./duckdb
+```
+
+## Forming the Hugging Face URL
+
+To access Hugging Face datasets, use the following URL format:
+
+```plaintext
+hf://datasets/{my-username}/{my-dataset}/{path_to_parquet_file} 
+```
+
+- **my-username**, the user or organization of the dataset, e.g. `ibm`
+- **my-dataset**, the dataset name, e.g: `duorc`
+- **path_to_parquet_file**, the parquet file path which supports glob patterns, e.g `**/*.parquet`, to query all parquet files
+
+
+<Tip>
+
+You can query auto-converted Parquet files using the @~parquet branch, which corresponds to the refs/convert/parquet revision. For more details, refer to the documentation at https://huggingface.co/docs/datasets-server/en/parquet#conversion-to-parquet.
+
+</Tip>
+
+Let's start with a quick demo to query all the rows of a dataset:
+
+```sql
+FROM 'hf://datasets/ibm/duorc/ParaphraseRC/*.parquet' LIMIT 3;
+```
+
+Or using traditional SQL syntax:
+
+```sql
+SELECT * FROM 'hf://datasets/ibm/duorc/ParaphraseRC/*.parquet' LIMIT 3;
+```
+In the following sections, we will cover more complex operations you can perform with DuckDB on Hugging Face datasets.
diff --git a/docs/source/duckdb_cli_auth.md b/docs/source/duckdb_cli_auth.md
new file mode 100644
index 0000000000..32c2d37a24
--- /dev/null
+++ b/docs/source/duckdb_cli_auth.md
@@ -0,0 +1,46 @@
+# Authentication for private and gated datasets
+
+To access private or gated datasets, you need to configure your Hugging Face Token in the DuckDB Secrets Manager.
+
+Visit [Hugging Face Settings - Tokens](https://huggingface.co/settings/tokens) to obtain your access token.
+
+DuckDB supports two providers for managing secrets:
+
+- `CONFIG`: Requires the user to pass all configuration information into the CREATE SECRET statement.
+- `CREDENTIAL_CHAIN`: Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from  `~/.cache/huggingface/token`.
+
+For more information about DuckDB Secrets visit the [Secrets Manager](https://duckdb.org/docs/configuration/secrets_manager.html) guide.
+
+## Creating a secret with `CONFIG` provider
+
+To create a secret using the CONFIG provider, use the following command:
+
+```bash
+CREATE SECRET hf_token (TYPE HUGGINGFACE, TOKEN 'your_hf_token');
+```
+
+Replace `your_hf_token` with your actual Hugging Face token.
+
+## Creating a secret with `CREDENTIAL_CHAIN` provider
+
+To create a secret using the CREDENTIAL_CHAIN provider, use the following command:
+
+```bash
+CREATE SECRET hf_token (TYPE HUGGINGFACE, PROVIDER credential_chain);
+```
+
+This command automatically retrieves the stored token from `~/.cache/huggingface/token`.
+
+If you haven't configured your token, execute the following command in the terminal:
+
+```bash
+huggingface-cli login
+```
+
+Alternatively, you can set your Hugging Face token as an environment variable:
+
+```bash
+export HF_TOKEN="HF_XXXXXXXXXXXXX"
+```
+
+For more information on authentication, see the [Hugging Face authentication](https://huggingface.co/docs/huggingface_hub/main/en/quick-start#authentication) documentation.
diff --git a/docs/source/duckdb_cli_combine_and_export.md b/docs/source/duckdb_cli_combine_and_export.md
new file mode 100644
index 0000000000..c0d504b87e
--- /dev/null
+++ b/docs/source/duckdb_cli_combine_and_export.md
@@ -0,0 +1,105 @@
+# Combine datasets and export
+
+In this section, we'll combine two datasets and export the result. Let's start with our datasets:
+
+
+The first will be [TheFusion21/PokemonCards](https://huggingface.co/datasets/TheFusion21/PokemonCards):
+
+```bash
+FROM 'hf://datasets/TheFusion21/PokemonCards/train.csv' LIMIT 3;
+┌─────────┬──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┬───────┬─────────────────┐
+│   id    │      image_url       │                                                                 caption                                                                 │    name    │  hp   │    set_name     │
+│ varchar │       varchar        │                                                                 varchar                                                                 │  varchar   │ int64 │     varchar     │
+├─────────┼──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼───────┼─────────────────┤
+│ pl3-1   │ https://images.pok…  │ A Basic, SP Pokemon Card of type Darkness with the title Absol G and 70 HP of rarity Rare Holo from the set Supreme Victors.  It has …  │ Absol G    │    70 │ Supreme Victors │
+│ ex12-1  │ https://images.pok…  │ A Stage 1 Pokemon Card of type Colorless with the title Aerodactyl and 70 HP of rarity Rare Holo evolved from Mysterious Fossil from …  │ Aerodactyl │    70 │ Legend Maker    │
+│ xy5-1   │ https://images.pok…  │ A Basic Pokemon Card of type Grass with the title Weedle and 50 HP of rarity Common from the set Primal Clash and the flavor text: It…  │ Weedle     │    50 │ Primal Clash    │
+└─────────┴──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┴───────┴─────────────────┘
+```
+
+And the second one will be [wanghaofan/pokemon-wiki-captions](https://huggingface.co/datasets/wanghaofan/pokemon-wiki-captions):
+
+```bash
+FROM 'hf://datasets/wanghaofan/pokemon-wiki-captions/data/*.parquet' LIMIT 3;
+
+┌──────────────────────┬───────────┬──────────┬──────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐
+│        image         │  name_en  │ name_zh  │                           text_en                            │                                              text_zh                                               │
+│ struct(bytes blob,…  │  varchar  │ varchar  │                           varchar                            │                                              varchar                                               │
+├──────────────────────┼───────────┼──────────┼──────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
+│ {'bytes': \x89PNG\…  │ abomasnow │ 暴雪王   │ Grass attributes,Blizzard King standing on two feet, with …  │ 草属性，双脚站立的暴雪王，全身白色的绒毛，淡紫色的眼睛，几缕长条装的毛皮盖着它的嘴巴               │
+│ {'bytes': \x89PNG\…  │ abra      │ 凯西     │ Super power attributes, the whole body is yellow, the head…  │ 超能力属性，通体黄色，头部外形类似狐狸，尖尖鼻子，手和脚上都有三个指头，长尾巴末端带着一个褐色圆环 │
+│ {'bytes': \x89PNG\…  │ absol     │ 阿勃梭鲁 │ Evil attribute, with white hair, blue-gray part without ha…  │ 恶属性，有白色毛发，没毛发的部分是蓝灰色，头右边类似弓的角，红色眼睛                               │
+└──────────────────────┴───────────┴──────────┴──────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+Now, let's try to combine these two datasets by joining on the `name` column:
+
+```bash
+SELECT a.image_url
+        , a.caption AS card_caption
+        , a.name
+        , a.hp
+        , b.text_en as wiki_caption 
+FROM 'hf://datasets/TheFusion21/PokemonCards/train.csv' a 
+JOIN 'hf://datasets/wanghaofan/pokemon-wiki-captions/data/*.parquet' b 
+ON LOWER(a.name) = b.name_en
+LIMIT 3;
+
+┌──────────────────────┬──────────────────────┬────────────┬───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
+│      image_url       │     card_caption     │    name    │  hp   │                                                                 wiki_caption                                                                 │
+│       varchar        │       varchar        │  varchar   │ int64 │                                                                   varchar                                                                    │
+├──────────────────────┼──────────────────────┼────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
+│ https://images.pok…  │ A Stage 1 Pokemon …  │ Aerodactyl │    70 │ A Pokémon with rock attributes, gray body, blue pupils, purple inner wings, two sharp claws on the wings, jagged teeth, and an arrow-like …  │
+│ https://images.pok…  │ A Basic Pokemon Ca…  │ Weedle     │    50 │ Insect-like, caterpillar-like in appearance, with a khaki-yellow body, seven pairs of pink gastropods, a pink nose, a sharp poisonous need…  │
+│ https://images.pok…  │ A Basic Pokemon Ca…  │ Caterpie   │    50 │ Insect attributes, caterpillar appearance, green back, white abdomen, Y-shaped red antennae on the head, yellow spindle-shaped tail, two p…  │
+└──────────────────────┴──────────────────────┴────────────┴───────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+We can export the result to a Parquet file using the `COPY` command:
+
+```bash
+COPY (SELECT a.image_url
+        , a.caption AS card_caption
+        , a.name
+        , a.hp
+        , b.text_en as wiki_caption 
+FROM 'hf://datasets/TheFusion21/PokemonCards/train.csv' a 
+JOIN 'hf://datasets/wanghaofan/pokemon-wiki-captions/data/*.parquet' b 
+ON LOWER(a.name) = b.name_en) 
+TO 'output.parquet' (FORMAT PARQUET);
+```
+
+Let's validate the new Parquet file:
+
+```bash
+SELECT COUNT(*) FROM 'output.parquet';
+
+┌──────────────┐
+│ count_star() │
+│    int64     │
+├──────────────┤
+│         9460 │
+└──────────────┘
+
+```
+
+<Tip>
+
+You can also export to [CSV](https://duckdb.org/docs/guides/file_formats/csv_export), [Excel](https://duckdb.org/docs/guides/file_formats/excel_export
+) and [JSON](https://duckdb.org/docs/guides/file_formats/json_export
+) formats.
+
+</Tip>
+
+Finally, let's push the resulting dataset to the Hub using the [Datasets](https://huggingface.co/docs/datasets/index) library in Python:
+
+```python
+from datasets import load_dataset
+
+dataset = load_dataset("parquet", data_files="output.parquet")
+dataset.push_to_hub("asoria/duckdb_combine_demo")
+```
+
+And that's it! You've successfully combined two datasets, exported the result, and uploaded it to the Hugging Face Hub.
diff --git a/docs/source/duckdb_cli_select.md b/docs/source/duckdb_cli_select.md
new file mode 100644
index 0000000000..d126737c04
--- /dev/null
+++ b/docs/source/duckdb_cli_select.md
@@ -0,0 +1,150 @@
+# Query datasets
+
+Querying datasets is a fundamental step in data analysis. Here, we'll guide you through querying datasets using various methods.
+
+There are several [different ways](https://duckdb.org/docs/data/parquet/overview.html) to select your data.
+
+Using the `FROM` syntax:
+```bash
+FROM 'hf://datasets/jamescalam/world-cities-geo/train.jsonl' SELECT city, country, region LIMIT 3;
+
+┌────────────────┬─────────────┬───────────────┐
+│      city      │   country   │    region     │
+│    varchar     │   varchar   │    varchar    │
+├────────────────┼─────────────┼───────────────┤
+│ Kabul          │ Afghanistan │ Southern Asia │
+│ Kandahar       │ Afghanistan │ Southern Asia │
+│ Mazar-e Sharif │ Afghanistan │ Southern Asia │
+└────────────────┴─────────────┴───────────────┘
+
+```
+
+Using the `SELECT` and `FROM` syntax:
+
+```bash
+SELECT city, country, region FROM 'hf://datasets/jamescalam/world-cities-geo/train.jsonl' USING SAMPLE 3;
+
+┌──────────┬─────────┬────────────────┐
+│   city   │ country │     region     │
+│ varchar  │ varchar │    varchar     │
+├──────────┼─────────┼────────────────┤
+│ Wenzhou  │ China   │ Eastern Asia   │
+│ Valdez   │ Ecuador │ South America  │
+│ Aplahoue │ Benin   │ Western Africa │
+└──────────┴─────────┴────────────────┘
+
+```
+
+Count all parquet files matching a glob pattern:
+
+```bash
+SELECT COUNT(*) FROM 'hf://datasets/jamescalam/world-cities-geo/*.jsonl';
+
+┌──────────────┐
+│ count_star() │
+│    int64     │
+├──────────────┤
+│         9083 │
+└──────────────┘
+
+```
+
+You can also query Parquet files using the read_parquet and parquet_scan functions. Let's explore these functions using the auto-converted Parquet files from the same dataset.
+
+Select using [read_parquet](https://duckdb.org/docs/guides/file_formats/query_parquet.html) function:
+
+```bash
+SELECT * FROM read_parquet('hf://datasets/jamescalam/world-cities-geo@~parquet/default/**/*.parquet') LIMIT 3;
+```
+
+Read all files that match a glob pattern and include a filename column specifying which file each row came from:
+
+```bash
+SELECT * FROM read_parquet('hf://datasets/jamescalam/world-cities-geo@~parquet/default/**/*.parquet', filename = true) LIMIT 3;
+```
+
+Using [`parquet_scan`](https://duckdb.org/docs/data/parquet/overview) function:
+
+```bash
+SELECT * FROM parquet_scan('hf://datasets/jamescalam/world-cities-geo@~parquet/default/**/*.parquet') LIMIT 3;
+```
+
+## Get metadata and schema
+
+The [parquet_metadata](https://duckdb.org/docs/data/parquet/metadata.html) function can be used to query the metadata contained within a Parquet file.
+
+```bash
+SELECT * FROM parquet_metadata('hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet');
+
+┌───────────────────────────────────────────────────────────────────────────────┬──────────────┬────────────────────┬─────────────┐
+│                                   file_name                                   │ row_group_id │ row_group_num_rows │ compression │
+│                                    varchar                                    │    int64     │       int64        │   varchar   │
+├───────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────────────┼─────────────┤
+│ hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet │            0 │               1000 │ SNAPPY      │
+│ hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet │            0 │               1000 │ SNAPPY      │
+│ hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet │            0 │               1000 │ SNAPPY      │
+└───────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────────┘
+
+```
+
+Fetch the column names and column types:
+
+```bash
+DESCRIBE SELECT * FROM 'hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet';
+
+┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
+│ column_name │ column_type │  null   │   key   │ default │  extra  │
+│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
+├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
+│ city        │ VARCHAR     │ YES     │         │         │         │
+│ country     │ VARCHAR     │ YES     │         │         │         │
+│ region      │ VARCHAR     │ YES     │         │         │         │
+│ continent   │ VARCHAR     │ YES     │         │         │         │
+│ latitude    │ DOUBLE      │ YES     │         │         │         │
+│ longitude   │ DOUBLE      │ YES     │         │         │         │
+│ x           │ DOUBLE      │ YES     │         │         │         │
+│ y           │ DOUBLE      │ YES     │         │         │         │
+│ z           │ DOUBLE      │ YES     │         │         │         │
+└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
+
+```
+
+Fetch the internal schema (excluding the file name):
+
+```bash
+SELECT * EXCLUDE (file_name) FROM parquet_schema('hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet');
+
+┌───────────┬────────────┬─────────────┬─────────────────┬──────────────┬────────────────┬───────┬───────────┬──────────┬──────────────┐
+│   name    │    type    │ type_length │ repetition_type │ num_children │ converted_type │ scale │ precision │ field_id │ logical_type │
+│  varchar  │  varchar   │   varchar   │     varchar     │    int64     │    varchar     │ int64 │   int64   │  int64   │   varchar    │
+├───────────┼────────────┼─────────────┼─────────────────┼──────────────┼────────────────┼───────┼───────────┼──────────┼──────────────┤
+│ schema    │            │             │ REQUIRED        │            9 │                │       │           │          │              │
+│ city      │ BYTE_ARRAY │             │ OPTIONAL        │              │ UTF8           │       │           │          │ StringType() │
+│ country   │ BYTE_ARRAY │             │ OPTIONAL        │              │ UTF8           │       │           │          │ StringType() │
+│ region    │ BYTE_ARRAY │             │ OPTIONAL        │              │ UTF8           │       │           │          │ StringType() │
+│ continent │ BYTE_ARRAY │             │ OPTIONAL        │              │ UTF8           │       │           │          │ StringType() │
+│ latitude  │ DOUBLE     │             │ OPTIONAL        │              │                │       │           │          │              │
+│ longitude │ DOUBLE     │             │ OPTIONAL        │              │                │       │           │          │              │
+│ x         │ DOUBLE     │             │ OPTIONAL        │              │                │       │           │          │              │
+│ y         │ DOUBLE     │             │ OPTIONAL        │              │                │       │           │          │              │
+│ z         │ DOUBLE     │             │ OPTIONAL        │              │                │       │           │          │              │
+├───────────┴────────────┴─────────────┴─────────────────┴──────────────┴────────────────┴───────┴───────────┴──────────┴──────────────┤
+
+```
+
+## Get statistics
+
+The `SUMMARIZE` command can be used to get various aggregates over a query (min, max, approx_unique, avg, std, q25, q50, q75, count). It returns these statistics along with the column name, column type, and the percentage of NULL values.
+
+```bash
+SUMMARIZE SELECT latitude, longitude FROM 'hf://datasets/jamescalam/world-cities-geo@~parquet/default/train/0000.parquet';
+
+┌─────────────┬─────────────┬──────────────┬─────────────┬───────────────┬────────────────────┬───────────────────┬────────────────────┬───────────────────┬────────────────────┬───────┐
+│ column_name │ column_type │     min      │     max     │ approx_unique │        avg         │        std        │        q25         │        q50        │        q75         │ count │
+│   varchar   │   varchar   │   varchar    │   varchar   │     int64     │      varchar       │      varchar      │      varchar       │      varchar      │      varchar       │ int64 │
+├─────────────┼─────────────┼──────────────┼─────────────┼───────────────┼────────────────────┼───────────────────┼────────────────────┼───────────────────┼────────────────────┼───────┤
+│ latitude    │ DOUBLE      │ -54.8        │ 67.8557214  │          7324 │ 22.5004568364307   │ 26.77045468469093 │ 6.065424395863388  │ 29.33687520478191 │ 44.88357641321427  │  9083 │
+│ longitude   │ DOUBLE      │ -175.2166595 │ 179.3833313 │          7802 │ 14.699333721953098 │ 63.93672742608224 │ -7.077471714978484 │ 19.19758476462836 │ 43.782932169927165 │  9083 │
+└─────────────┴─────────────┴──────────────┴─────────────┴───────────────┴────────────────────┴───────────────────┴────────────────────┴───────────────────┴────────────────────┴───────┘
+
+```
diff --git a/docs/source/duckdb_cli_sql.md b/docs/source/duckdb_cli_sql.md
new file mode 100644
index 0000000000..33714d1e49
--- /dev/null
+++ b/docs/source/duckdb_cli_sql.md
@@ -0,0 +1,159 @@
+# Perform SQL operations
+
+Performing SQL operations with DuckDB opens up a world of possibilities for querying datasets efficiently. Let's dive into some examples showcasing the power of DuckDB functions.
+
+For our demonstration, we'll explore a fascinating dataset. The [MMLU](https://huggingface.co/datasets/cais/mmlu) dataset is a multitask test containing multiple-choice questions spanning various knowledge domains.
+
+To preview the dataset, let's select a sample of 3 rows:
+
+```bash
+FROM 'hf://datasets/cais/mmlu/all/test-*.parquet' USING SAMPLE 3;
+
+┌──────────────────────┬──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────┐
+│       question       │       subject        │                                                                         choices                                                                          │ answer │
+│       varchar        │       varchar        │                                                                        varchar[]                                                                         │ int64  │
+├──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┤
+│ Dr. Harry Holliday…  │ professional_psych…  │ [discuss his vacation plans with his current clients ahead of time so that they know he’ll be unavailable during that time., give his clients a phone …  │      2 │
+│ A resident of a st…  │ professional_law     │ [The resident would succeed, because the logging company's selling of the timber would entitle the resident to re-enter and terminate the grant to the…  │      2 │
+│ Moderate and frequ…  │ miscellaneous        │ [dispersed alluvial fan soil, heavy-textured soil, such as silty clay, light-textured soil, such as loamy sand, region of low humidity]                  │      2 │
+└──────────────────────┴──────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┘
+
+```
+
+This command retrieves a random sample of 3 rows from the dataset for us to examine.
+
+Let's start by examining the schema of our dataset. The following table outlines the structure of our dataset:
+
+```bash
+DESCRIBE FROM 'hf://datasets/cais/mmlu/all/test-*.parquet' USING SAMPLE 3;
+┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
+│ column_name │ column_type │  null   │   key   │ default │  extra  │
+│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
+├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
+│ question    │ VARCHAR     │ YES     │         │         │         │
+│ subject     │ VARCHAR     │ YES     │         │         │         │
+│ choices     │ VARCHAR[]   │ YES     │         │         │         │
+│ answer      │ BIGINT      │ YES     │         │         │         │
+└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
+
+```
+Next, let's analyze if there are any duplicated records in our dataset:
+
+```bash
+SELECT   *,
+         COUNT(*) AS counts
+FROM     'hf://datasets/cais/mmlu/all/test-*.parquet'
+GROUP BY ALL
+HAVING   counts > 2; 
+
+┌──────────┬─────────┬───────────┬────────┬────────┐
+│ question │ subject │  choices  │ answer │ counts │
+│ varchar  │ varchar │ varchar[] │ int64  │ int64  │
+├──────────┴─────────┴───────────┴────────┴────────┤
+│                      0 rows                      │
+└──────────────────────────────────────────────────┘
+
+```
+
+Fortunately, our dataset doesn't contain any duplicate records.
+
+Let's see the proportion of questions based on the subject in a bar representation:
+
+```bash
+SELECT 
+    subject, 
+    COUNT(*) AS counts, 
+    BAR(COUNT(*), 0, (SELECT COUNT(*) FROM 'hf://datasets/cais/mmlu/all/test-*.parquet')) AS percentage 
+FROM 
+    'hf://datasets/cais/mmlu/all/test-*.parquet' 
+GROUP BY 
+    subject 
+ORDER BY 
+    counts DESC;
+
+┌──────────────────────────────┬────────┬────────────────────────────────────────────────────────────────────────────────┐
+│           subject            │ counts │                                   percentage                                   │
+│           varchar            │ int64  │                                    varchar                                     │
+├──────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────────────────┤
+│ professional_law             │   1534 │ ████████▋                                                                      │
+│ moral_scenarios              │    895 │ █████                                                                          │
+│ miscellaneous                │    783 │ ████▍                                                                          │
+│ professional_psychology      │    612 │ ███▍                                                                           │
+│ high_school_psychology       │    545 │ ███                                                                            │
+│ high_school_macroeconomics   │    390 │ ██▏                                                                            │
+│ elementary_mathematics       │    378 │ ██▏                                                                            │
+│ moral_disputes               │    346 │ █▉                                                                             │
+├──────────────────────────────┴────────┴────────────────────────────────────────────────────────────────────────────────┤
+│ 57 rows (8 shown)                                                                                           3 columns  │
+└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+Now, let's prepare a subset of the dataset containing questions related to **nutrition** and create a mapping of questions to correct answers.
+Notice that we have the column **choices** from which we can get the correct answer using the **answer** column as an index.
+
+```bash
+SELECT *
+FROM   'hf://datasets/cais/mmlu/all/test-*.parquet'
+WHERE  subject = 'nutrition' LIMIT 3;
+
+┌──────────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────┐
+│       question       │  subject  │                                                                               choices                                                                               │ answer │
+│       varchar        │  varchar  │                                                                              varchar[]                                                                              │ int64  │
+├──────────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┤
+│ Which foods tend t…  │ nutrition │ [Meat, Confectionary, Fruits and vegetables, Potatoes]                                                                                                              │      2 │
+│ In which one of th…  │ nutrition │ [If the incidence rate of the disease falls., If survival time with the disease increases., If recovery of the disease is faster., If the population in which the…  │      1 │
+│ Which of the follo…  │ nutrition │ [The flavonoid class comprises flavonoids and isoflavonoids., The digestibility and bioavailability of isoflavones in soya food products are not changed by proce…  │      0 │
+└──────────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┘
+
+```
+
+```bash
+SELECT question,
+       choices[answer] AS correct_answer
+FROM   'hf://datasets/cais/mmlu/all/test-*.parquet'
+WHERE  subject = 'nutrition' LIMIT 3;
+
+┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────┐
+│                                                              question                                                               │               correct_answer                │
+│                                                               varchar                                                               │                   varchar                   │
+├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────┤
+│ Which foods tend to be consumed in lower quantities in Wales and Scotland (as of 2020)?\n                                           │ Confectionary                               │
+│ In which one of the following circumstances will the prevalence of a disease in the population increase, all else being constant?\n │ If the incidence rate of the disease falls. │
+│ Which of the following statements is correct?\n                                                                                     │                                             │
+└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────┘
+
+```
+
+To ensure data cleanliness, let's remove any newline characters at the end of the questions and filter out any empty answers:
+
+```bash
+SELECT regexp_replace(question, '\n', '') AS question,
+       choices[answer] AS correct_answer
+FROM   'hf://datasets/cais/mmlu/all/test-*.parquet'
+WHERE  subject = 'nutrition' AND LENGTH(correct_answer) > 0 LIMIT 3;
+
+┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────┐
+│                                                             question                                                              │               correct_answer                │
+│                                                              varchar                                                              │                   varchar                   │
+├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────┤
+│ Which foods tend to be consumed in lower quantities in Wales and Scotland (as of 2020)?                                           │ Confectionary                               │
+│ In which one of the following circumstances will the prevalence of a disease in the population increase, all else being constant? │ If the incidence rate of the disease falls. │
+│ Which vitamin is a major lipid-soluble antioxidant in cell membranes?                                                             │ Vitamin D                                   │
+└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────┘
+
+```
+
+Finally, lets hightlight some of the DuckDB functions used in this section:
+- `DESCRIBE`, returns the table schema.
+- `USING SAMPLE`, samples are used to randomly select a subset of a dataset.
+- `BAR`, draws a band whose width is proportional to (x - min) and equal to width characters when x = max. Width defaults to 80.
+- `string[begin:end]`, extracts a string using slice conventions. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
+- `regexp_replace`, if the string contains the regexp pattern, replaces the matching part with replacement.
+- `LENGTH`, gets the number of characters in the string.
+
+<Tip>
+
+There are plenty of useful functions available in DuckDB's [SQL functions overview](https://duckdb.org/docs/sql/functions/overview). The best part is that you can use them directly on Hugging Face datasets.
+
+</Tip>
diff --git a/docs/source/duckdb_cli_vector_similarity_search.md b/docs/source/duckdb_cli_vector_similarity_search.md
new file mode 100644
index 0000000000..ef6aed3907
--- /dev/null
+++ b/docs/source/duckdb_cli_vector_similarity_search.md
@@ -0,0 +1,63 @@
+# Perform vector similarity search
+
+The Fixed-Length Arrays feature was added in DuckDB version 0.10.0. This lets you use vector embeddings in DuckDB tables, making your data analysis even more powerful.
+
+Additionally, the array_cosine_similarity function was introduced. This function measures the cosine of the angle between two vectors, indicating their similarity. A value of 1 means they’re perfectly aligned, 0 means they’re perpendicular, and -1 means they’re completely opposite.
+
+Let's explore how to use this function for similarity searches. In this section, we’ll show you how to perform similarity searches using DuckDB.
+
+We will use the [asoria/awesome-chatgpt-prompts-embeddings](https://huggingface.co/datasets/asoria/awesome-chatgpt-prompts-embeddings) dataset.
+
+First, let's preview a few records from the dataset:
+
+```bash
+FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet' SELECT act, prompt, len(embedding) as embed_len LIMIT 3;
+
+┌──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────┐
+│         act          │                                                                                    prompt                                                                                    │ embed_len │
+│       varchar        │                                                                                   varchar                                                                                    │   int64   │
+├──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────┤
+│ Linux Terminal       │ I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output insid…  │       384 │
+│ English Translator…  │ I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language and you will detect the language, translate it and answer…  │       384 │
+│ `position` Intervi…  │ I want you to act as an interviewer. I will be the candidate and you will ask me the interview questions for the `position` position. I want you to only reply as the inte…  │       384 │
+└──────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────┘
+
+```
+
+Next, let's choose an embedding to use for the similarity search:
+
+```bash
+FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet' SELECT  embedding  WHERE act = 'Linux Terminal';
+
+┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
+│                                                                                                    embedding                                                                                                    │
+│                                                                                                     float[]                                                                                                     │
+├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
+│ [-0.020781303, -0.029143505, -0.0660217, -0.00932716, -0.02601602, -0.011426172, 0.06627567, 0.11941507, 0.0013917526, 0.012889079, 0.053234346, -0.07380514, 0.04871567, -0.043601237, -0.0025319182, 0.0448…  │
+└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
+
+```
+
+Now, let's use the selected embedding to find similar records:
+
+
+```bash
+SELECT act,
+       prompt,
+       array_cosine_similarity(embedding::float[384], (SELECT embedding FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet' WHERE  act = 'Linux Terminal')::float[384]) AS similarity 
+FROM 'hf://datasets/asoria/awesome-chatgpt-prompts-embeddings/data/*.parquet'
+ORDER BY similarity DESC
+LIMIT 3;
+
+┌──────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┐
+│         act          │                                                                                   prompt                                                                                    │ similarity │
+│       varchar        │                                                                                   varchar                                                                                   │   float    │
+├──────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤
+│ Linux Terminal       │ I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output insi…  │        1.0 │
+│ JavaScript Console   │ I want you to act as a javascript console. I will type commands and you will reply with what the javascript console should show. I want you to only reply with the termin…  │  0.7599728 │
+│ R programming Inte…  │ I want you to act as a R interpreter. I'll type commands and you'll reply with what the terminal should show. I want you to only reply with the terminal output inside on…  │  0.7303775 │
+└──────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┘
+
+```
+
+That's it! You have successfully performed a vector similarity search using DuckDB.