diff --git a/README.md b/README.md
index 5439658c5..b39bb9a63 100644
--- a/README.md
+++ b/README.md
@@ -31,7 +31,7 @@ An overview of folder structure and files
| `/assets/images` | Site image files found in logically named subfolders | As of 2023-02-21 contains images from /molecula/documentation which will be removed in future PRs |
| `/docs` | All content pages found in this folder. | |
| `/docs/cloud` | All FeatureBase Cloud help pages | `old` prefix folders and files are originals to be rewritten |
-| `/docs/concepts` | High level conceptual information regarding FeatureBase applications | Updates in progress |
+| `/docs/cloud/cloud-faq` | High level conceptual information regarding FeatureBase applications | Updates in progress |
| `/docs/pql-guide` | All FeatureBase PQL-Guide help pages | Files largely unchanged from `/molecula/documentation` |
| `/docs/sql-guide` | All FeatureBase SQL-Guide help pages | Was `/sql-preview` |
| `/help-on-help` | Internal only help on FeatureBase Docs | Folder excluded from production build in `/_config.yml` |
diff --git a/_includes/concepts/concept-anti-entropy.md b/_includes/faq/concept-anti-entropy.md
similarity index 100%
rename from _includes/concepts/concept-anti-entropy.md
rename to _includes/faq/concept-anti-entropy.md
diff --git a/_includes/concepts/concept-bitmap-index-summary.md b/_includes/faq/concept-bitmap-index-summary.md
similarity index 100%
rename from _includes/concepts/concept-bitmap-index-summary.md
rename to _includes/faq/concept-bitmap-index-summary.md
diff --git a/_includes/concepts/concept-bitmap-source-data-table.md b/_includes/faq/concept-bitmap-source-data-table.md
similarity index 100%
rename from _includes/concepts/concept-bitmap-source-data-table.md
rename to _includes/faq/concept-bitmap-source-data-table.md
diff --git a/_includes/concepts/concept-bitmap-storage-overhead-table.md b/_includes/faq/concept-bitmap-storage-overhead-table.md
similarity index 100%
rename from _includes/concepts/concept-bitmap-storage-overhead-table.md
rename to _includes/faq/concept-bitmap-storage-overhead-table.md
diff --git a/_includes/concepts/concept-data-modeling-summary.md b/_includes/faq/concept-data-modeling-summary.md
similarity index 100%
rename from _includes/concepts/concept-data-modeling-summary.md
rename to _includes/faq/concept-data-modeling-summary.md
diff --git a/_includes/concepts/concept-ingest-summary.md b/_includes/faq/concept-ingest-summary.md
similarity index 100%
rename from _includes/concepts/concept-ingest-summary.md
rename to _includes/faq/concept-ingest-summary.md
diff --git a/_includes/concepts/concept-table-def-save-to-disk.md b/_includes/faq/concept-table-def-save-to-disk.md
similarity index 100%
rename from _includes/concepts/concept-table-def-save-to-disk.md
rename to _includes/faq/concept-table-def-save-to-disk.md
diff --git a/_includes/concepts/standard-naming-obj.md b/_includes/faq/standard-naming-obj.md
similarity index 100%
rename from _includes/concepts/standard-naming-obj.md
rename to _includes/faq/standard-naming-obj.md
diff --git a/_includes/concepts/summary-data-modeling.md b/_includes/faq/summary-data-modeling.md
similarity index 100%
rename from _includes/concepts/summary-data-modeling.md
rename to _includes/faq/summary-data-modeling.md
diff --git a/_includes/concepts/summary-db-states.md b/_includes/faq/summary-db-states.md
similarity index 100%
rename from _includes/concepts/summary-db-states.md
rename to _includes/faq/summary-db-states.md
diff --git a/_includes/concepts/summary-table-create.md b/_includes/faq/summary-table-create.md
similarity index 100%
rename from _includes/concepts/summary-table-create.md
rename to _includes/faq/summary-table-create.md
diff --git a/docs/cloud/cloud-databases/cloud-db-create-custom.md b/docs/cloud/cloud-databases/cloud-db-create-custom.md
index 18c34b430..bfd7bc38f 100644
--- a/docs/cloud/cloud-databases/cloud-db-create-custom.md
+++ b/docs/cloud/cloud-databases/cloud-db-create-custom.md
@@ -22,7 +22,7 @@ nav_order: 3
## Naming standards
-{% include /concepts/standard-naming-obj.md %}
+{% include /faq/standard-naming-obj.md %}
{: .note}
FeatureBase Cloud database names can be up to 300 characters in length
diff --git a/docs/cloud/cloud-databases/cloud-db-create-sample.md b/docs/cloud/cloud-databases/cloud-db-create-sample.md
index 798c80673..797589c1f 100644
--- a/docs/cloud/cloud-databases/cloud-db-create-sample.md
+++ b/docs/cloud/cloud-databases/cloud-db-create-sample.md
@@ -20,7 +20,7 @@ The sample database uses 32GB and costs $1USD/hour. Remember to [Delete the data
## Naming standards
-{% include /concepts/standard-naming-obj.md %}
+{% include /faq/standard-naming-obj.md %}
## How do I provision a sample database?
diff --git a/docs/cloud/cloud-faq/cloud-faq-bitmaps-bit-slice.md b/docs/cloud/cloud-faq/cloud-faq-bitmaps-bit-slice.md
index 25ace477b..2ea237caf 100644
--- a/docs/cloud/cloud-faq/cloud-faq-bitmaps-bit-slice.md
+++ b/docs/cloud/cloud-faq/cloud-faq-bitmaps-bit-slice.md
@@ -30,7 +30,7 @@ User data mapped to the following data types is converted to bit-sliced bitmaps:
## How does FeatureBase bit-slice integer data?
-{% include /concepts/concept-bitmap-source-data-table.md %}
+{% include /faq/concept-bitmap-source-data-table.md %}
By bit-slicing, the `downloads` data can be encoded:
* as 3 bits
@@ -95,7 +95,7 @@ The bit-slice columns can now be saved as individual bitmaps. For example:
## Bitmap storage overheads
-{% include /concepts/concept-bitmap-storage-overhead-table.md %}
+{% include /faq/concept-bitmap-storage-overhead-table.md %}
* [Learn about Roaring Bitmap Format](/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format)
diff --git a/docs/cloud/cloud-faq/cloud-faq-bitmaps-equality-encoded.md b/docs/cloud/cloud-faq/cloud-faq-bitmaps-equality-encoded.md
index 27e47cfd5..c7c3b9519 100644
--- a/docs/cloud/cloud-faq/cloud-faq-bitmaps-equality-encoded.md
+++ b/docs/cloud/cloud-faq/cloud-faq-bitmaps-equality-encoded.md
@@ -39,7 +39,7 @@ FeatureBase equality encoding:
## How does FeatureBase equality encode data?
-{% include /concepts/concept-bitmap-source-data-table.md %}
+{% include /faq/concept-bitmap-source-data-table.md %}
The `historical_name` data can be equality-encoded as follows:
@@ -112,7 +112,7 @@ FeatureBase avoids these issues by bit-slicing integer values.
## Bitmap storage overheads
-{% include /concepts/concept-bitmap-storage-overhead-table.md %}
+{% include /faq/concept-bitmap-storage-overhead-table.md %}
* [Learn about Roaring Bitmap Format](/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format)
* [Learn about bitmap compression with Roaring Bitmap Format](/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format)
diff --git a/docs/cloud/cloud-faq/cloud-faq-bitmaps.md b/docs/cloud/cloud-faq/cloud-faq-bitmaps.md
index e2ddad905..e5688421d 100644
--- a/docs/cloud/cloud-faq/cloud-faq-bitmaps.md
+++ b/docs/cloud/cloud-faq/cloud-faq-bitmaps.md
@@ -60,7 +60,7 @@ FeatureBase overcomes low-cardinality issues with four unique data types suitabl
### Data storage overheads
-{% include /concepts/concept-bitmap-storage-overhead-table.md %}
+{% include /faq/concept-bitmap-storage-overhead-table.md %}
* [Learn about Roaring Bitmap Format](/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format)
@@ -82,10 +82,11 @@ FeatureBase Cloud stores shards on disk in the `etc` directory.
## Are column names converted to bitmaps?
-{% include /concepts/concept-table-def-save-to-disk.md %}
+{% include /faq/concept-table-def-save-to-disk.md %}
## Further information
* [Learn about equality-encoded bitmaps](/docs/cloud/cloud-faq/cloud-faq-bitmaps-equality-encoded)
* [Learn about bit-sliced bitmaps](/docs/cloud/cloud-faq/cloud-faq-bitmaps-bit-slice)
-* [Learn about importing data to FeatureBase](/docs/concepts/overview-data-modeling)
+* [Learn about data modeling for FeatureBase](/docs/cloud/cloud-faq/cloud-faq-data-modeling)
+* [Learn about importing data to FeatureBase](/docs/cloud/cloud-ingest/cloud-ingest-manage)
diff --git a/docs/cloud/cloud-faq/cloud-faq-home.md b/docs/cloud/cloud-faq/cloud-faq-home.md
index ff7b2f550..47c8d2ca8 100644
--- a/docs/cloud/cloud-faq/cloud-faq-home.md
+++ b/docs/cloud/cloud-faq/cloud-faq-home.md
@@ -16,13 +16,9 @@ has_toc: false
## Looking for older documentation?
-{: .important}
-FeatureBase Community help has been moved.
-
| Question | Answer |
|---|---|
-Where do I find older documentation? | [Link to FB Community help repo](url here) |
-| GRPC endpoint has gone boom | [GRPC Help]()
+| Where do I find older documentation? | [Link to FB Community help repo](https://featurebasedb.github.io/FB-community-help/){:target="_blank"} |
## Conceptual FAQ
diff --git a/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format.md b/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format.md
index 56df98bbe..909ea5975 100644
--- a/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format.md
+++ b/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format.md
@@ -8,7 +8,7 @@ nav_order: 3
# How does FeatureBase reduce storage overheads?
{: .no_toc }
-{% include /concepts/concept-bitmap-storage-overhead-table.md %}
+{% include /faq/concept-bitmap-storage-overhead-table.md %}
{% include page-toc.md %}
diff --git a/docs/cloud/cloud-faq/glossary.md b/docs/cloud/cloud-faq/glossary.md
index f09c3ed22..9ee2d7894 100644
--- a/docs/cloud/cloud-faq/glossary.md
+++ b/docs/cloud/cloud-faq/glossary.md
@@ -15,9 +15,6 @@ nav_order: 1
| Term | Context | Additional information |
|---|---|---|---|
-| ALL | PQL query | [PQL ALL read query](/docs/pql-guide/pql-read-all) |
-| APPLY | PQL query | [PQL APPLY read query](/docs/pql-guide/pql-read-apply) |
-| ARROW | PQL query | [PQL ARROW read query](/docs/pql-guide/pql-read-arrow) |
| Authentication | FeatureBase Cloud | [Manage cloud users](/docs/cloud/cloud-users/cloud-users-manage) |
| Authentication | FeatureBase Cloud | [Cloud authentication](/docs/cloud/cloud-authentication/cloud-auth-manage) |
@@ -26,35 +23,22 @@ nav_order: 1
| Term | Context | Additional information |
|---|---|---|
| Batch (ingest) | Data import to FeatureBase | [BULK INSERT statement](/docs/sql-guide/statements/statement-insert-bulk) |
-| Bitmap
Bitmap Index (BMI)
Roaring B-Tree format (RBT) | FeatureBase database table rows | FeatureBase uses the [Roaring Bitmap](https://roaringbitmap.org/){:target="_blank"} format to store data. |
-| Bit Sliced Indexing (BSI) | Multi-bit integer and timestamp data types used for Range, Min, Max and Sum queries | * [INT data type](/docs/sql-guide/data-types/data-type-int)
* [TIMESTAMP data type](/docs/sql-guide/data-types/data-type-timestamp)
* [MIN query](/docs/pql-guide/pql-read-min)
* [MAX query](/docs/pql-guide/pql-read-max)
* [SUM query](/docs/pql-guide/pql-read-sum) |
+| Bitmap
Bitmap Index (BMI)
Roaring B-Tree format (RBT) | FeatureBase database table rows | [Learn about FeatureBase Bitmaps](/docs/cloud/cloud-faq/cloud-faq-bitmaps) |
## C
| Term | Context | Additional information |
|---|---|---|
-| CLEAR | PQL query | [PQL CLEAR write query](/docs/pql-guide/pql-write-clear) |
-| CLEARROW | PQL query | [PQL CLEARROW write query](/docs/pql-guide/pql-write-clearrow) |
| Concurrency | SQL/PQL Queries | Number of concurrent users running queries on data and how this may affect query latency |
-| CONSTROW | PQL query | [PQL CONSTROW read query](/docs/pql-guide/pql-read-constrow) |
-| COUNT | PQL query | [PQL COUNT read query](/docs/pql-guide/pql-read-count) |
## D
| Term | Context | Additional information |
|---|---|---|
| Database | FeatureBase database | Dedicated resources which contain tables and data. [Manage Cloud databases](/docs/cloud/cloud-databases/cloud-db-manage) |
-| Data source | Source of data imported to FeatureBase | FeatureBase imports data from external data sources via HTTPS, Kafka, SQL or CSV ingest processing |
-| Data types | Table columns | [Data types and constraints](/docs/sql-guide/data-types/data-types-home) |
-| DELETE | PQL query | [PQL DELETE write query](/docs/pql-guide/pql-write-delete) |
-| DIFFERENCE | PQL query | [PQL DIFFERENCE read query](/docs/pql-guide/pql-read-difference) |
-| DISTINCT | PQL query | [PQL DISTINCT read query](/docs/pql-guide/pql-read-distinct) |
-
-## E
-
-| Term | Context | Additional information |
-|---|---|---|
-| EXTRACT | PQL query | [PQL EXTRACT read query](/docs/pql-guide/pql-read-extract) |
+| Data modeling | Data curation prior to ingestion | [Learn about Data modeling in FeatureBase](/docs/cloud/cloud-faq/cloud-faq-data-modeling) |
+| Data source | External source of data which inclues CSV files, inline and other sources | [BULK INSERT](/docs/sql-guide/statements/statement-insert-bulk) |
+| Data type | Table columns | [Data types and constraints](/docs/sql-guide/data-types/data-types-home) |
## F
@@ -68,33 +52,14 @@ nav_order: 1
| Term | Context | Additional information |
|---|---|---|
-| Group By | PQL Query | [PQL Group By Query](/docs/pql-guide/pql-read-groupby) |
-
-## H
-
-| Term | Context | Additional information |
-|---|---|---|
-| | | |
+| Group By | SELECT statement | [SELECT statement](/docs/sql-guide/statements/statement-select) |
## I
| Term | Context | Additional information |
|---|---|---|
-| INCLUDESCOLUMN | PQL query | [PQL INCLUDESCOLUMN read query](/docs/pql-guide/pql-read-includescolumn) |
-| INTERSECT | PQL query | [PQL INTERSECT read query](/docs/pql-guide/pql-read-intersect) |
-| Index | FeatureBase tables | Denormalized top-level container roughly the same as an RDBMS table. |
-
-## J
-
-| Term | Context | Additional information |
-|---|---|---|
-| | | |
-
-## K
-
-| Term | Context | Additional information |
-|---|---|---|
-| | | |
+| `_id` | FeatureBase tables | [CREATE TABLE statement](/docs/sql-guide/statements/statement-table-create) |
+| Index | FeatureBase tables | [Learn about FeatureBase bitmap indexes](/docs/cloud/cloud-faq/cloud-faq-bitmaps) |
## L
@@ -102,45 +67,36 @@ nav_order: 1
|---|---|---|
| Latency | SQL/PQL Queries | How much time elapses between when a query is sent to a system and when the results return to the client. |
-
## M
| Term | Context | Additional information |
|---|---|---|
-| MAX | PQL Read query | [PQL MAX Read query](/docs/pql-guide/pql-read-max) |
-| MAX | SQL `int` constraint | [INT data type](/docs/sql-guide/data-types/data-type-int) |
-| MIN | PQL Read query | [PQL MIN Read query](/docs/pql-guide/pql-read-min) |
-| Min | SQL `int` constraint | [INT data type](/docs/sql-guide/data-types/data-type-int) |
+| MAX | SQL `INT` constraint | [INT data type](/docs/sql-guide/data-types/data-type-int) |
+| Min | SQL `INT` constraint | [INT data type](/docs/sql-guide/data-types/data-type-int) |
| Mutex | String Data type | A FeatureBase field type similar to the Set type, in which only a single value can be set at any time. Conceptually similar to an enum type, but implemented on top of Set fields, with a performance cost from the single-value constraint. Not to be confused with the mutex synchronization primitive. |
## N
| Term | Context | Additional information |
|---|---|---|
-| Normalizing | The act of identifying the cardinality of your data in order to design the relationships between different tables. | [FeatureBase concepts](/docs/cloud/cloud-faq/cloud-faq-data-cardinality) |
-| NOT | PQL query | [PQL NOT read query](/docs/pql-guide/pql-read-not) |
+| Normalizing | Data relationships | [Learn about data cardinality](/docs/cloud/cloud-faq/cloud-faq-data-cardinality) |
## O
| Term | Context | Additional information |
|---|---|---|
-| Options | PQL Options query | [PQL OPTIONS](/docs/pql-guide/pql-options) |
-| Organization | FeatureBase Cloud | [FeatureBase Organization](/docs/cloud/cloud-org/cloud-org-manage) |
+| Organization | FeatureBase Cloud account| [FeatureBase Organization](/docs/cloud/cloud-org/cloud-org-manage) |
## P
| Term | Context | Additional information |
|---|---|---|
-| PERCENTILE | PQL query | [PQL PERCENTILE read query](/docs/pql-guide/pql-read-percentile) |
-| Pilosa | Former name of FeatureBase | [Pilosa + Molecula = FeatureBase blog post](https://www.featurebase.com/blog/pilosa-molecula-featurebase-a-story-of-evolution) |
-| Pilosa Query Language (PQL) | Database queries | [PQL-Guide](/docs/pql-guide/pql-home) |
| Protobuf | | Binary serialization format used for internal messages which can be used by clients as an alternative to JSON. [Protobuf](https://developers.google.com/protocol-buffers/) |
## Q
| Term | Context | Additional information |
|---|---|---|
-| Query (PQL) | Pilosa Query Language | [PQL Guide](/docs/pql-guide/pql-home) |
| Query (SQL) | Structured Query Language | [SQL Guide](/docs/sql-guide/sql-guide-home) |
## R
@@ -149,50 +105,27 @@ nav_order: 1
|---|---|---|
| Record
Row | Database table row | Equivalent to RDBMS table row. FeatureBase uses "Record" to avoid confusion |
| Roaring Bitmap | FeatureBase database | [roaringbitmap.org](https://roaringbitmap.org/){:target="_blank"} |
-| Row | | Rows are the fundamental vertical data axis within FeatureBase. Rows are namespaced by field so the same row ID in a different field refers to a different row. |
-| Row `_id` | | |
-| Row (Ranged) | PQL query | [PQL Row read query](/docs/pql-guide/pql-read-row) |
-| Row (Timestamp) | PQL query | [PQL Row read query](/docs/pql-guide/pql-read-row) |
-| Rows | PQL query | [PQL Rows read query](/docs/pql-guide/pql-read-rows) |
## S
| Term | Context | Additional information |
|---|---|---|
-| SET | PQL query | [PQL SET write query](/docs/pql-guide/pql-write-set) |
+| `SET` and `SETQ` | SQL data types | [Low cardinality data types](/docs/sql-guide/data-types/data-types-home/#low-cardinality-data-types) |
| Shard | Roaring Bitmap format | [Roaring Bitmap Format](/docs/cloud/cloud-faq/cloud-faq-roaring-bitmap-format) |
-| SORT | PQL query | [PQL SORT read query](/docs/pql-guide/pql-read-sort) |
-| STORE | PQL query | [PQL STORE write query](/docs/pql-guide/pql-write-store) |
-| SUM | PQL query | [PQL SUM read query](/docs/pql-guide/pql-read-sum) |
## T
| Term | Context | Additional information |
|---|---|---|
| Throughput | Data import/ingestion | Quantity of data that can be imported/ingested in a given time. May involve trade-off between Latency and Freshness |
-| Time Quantum | SQL IDSET and STRINGSET constraint | [IDSET data type](/docs/sql-guide/data-types/data-type-idset)
[STRINGSET data type](/docs/sql-guide/data-types/data-type-stringset) |
+| Time Quantum | `SETQ` constraints | [IDSETQ data type](/docs/sql-guide/data-types/data-type-idsetq)
[STRINGSETQ data type](/docs/sql-guide/data-types/data-type-stringsetq) |
| Timestamp | Data type | [Timestamp data type](/docs/sql-guide/data-types/data-type-timestamp) |
-| TTL (Time To Live) | IDSET and STRINGSET constraint | [IDSET data type](/docs/sql-guide/data-types/data-type-idset)
[STRINGSET data type](/docs/sql-guide/data-types/data-type-stringset) |
-| TopK | PQL query | [PQL TOPK read query](/docs/pql-guide/pql-read-topk) |
-| TopN | PQL query | [PQL TOPN read query](/docs/pql-guide/pql-read-topn) |
-
-## U
-
-| Term | Context | Additional information |
-|---|---|---|
-| UNION | PQL query | [PQL UNION read query](/docs/pql-guide/pql-read-union) |
-| UNIONROWS | PQL query | [PQL UNIONROWS read query](/docs/pql-guide/pql-read-unionrows) |
+| TTL (Time To Live) | `SETQ` constraints | [IDSETQ data type](/docs/sql-guide/data-types/data-type-idsetq)
[STRINGSETQ data type](/docs/sql-guide/data-types/data-type-stringsetq) |
## V
| Term | Context | Additional information |
|---|---|---|
-| View | FeatureBase fields | Internally managed method to separate data layouts within a field. Not exposed by the API |
+| View | FeatureBase fields | [CREATE VIEW statement](/docs/sql-guide/statements/statement-view-create) |
| View (Primary) | FeatureBase fields | Standard view that represents typical base data |
| View (Time-based) | FeatureBase fields | Automatically generated view for time quantum fields |
-
-## W - X - Y - Z
-
-| Term | Context | Additional information |
-|---|---|---|
-| XOR | PQL query | [PQL XOR read query](/docs/pql-guide/pql-read-xor) |
diff --git a/docs/cloud/cloud-ingest/cloud-table-upload-data.md b/docs/cloud/cloud-ingest/cloud-table-upload-data.md
index 5537bce4a..fad006c7b 100644
--- a/docs/cloud/cloud-ingest/cloud-table-upload-data.md
+++ b/docs/cloud/cloud-ingest/cloud-table-upload-data.md
@@ -21,7 +21,7 @@ You can also upload CSV data using the [BULK INSERT statement](/docs/sql-guide/s
## Naming standards
-{% include /concepts/standard-naming-obj.md%}
+{% include /faq/standard-naming-obj.md%}
{% include /cloud-table/cloud-standard-naming-table.md %}
## CSV file structure
diff --git a/docs/cloud/cloud-tables/cloud-table-add-column.md b/docs/cloud/cloud-tables/cloud-table-add-column.md
index 7a2b99694..c9127a24c 100644
--- a/docs/cloud/cloud-tables/cloud-table-add-column.md
+++ b/docs/cloud/cloud-tables/cloud-table-add-column.md
@@ -21,7 +21,7 @@ Add a column to an existing table and set constraints if required.
## Naming standard
-{% include /concepts/standard-naming-obj.md%}
+{% include /faq/standard-naming-obj.md%}
{% include /cloud-table/cloud-standard-naming-col.md %}
{: .note }
diff --git a/docs/cloud/cloud-tables/cloud-table-create.md b/docs/cloud/cloud-tables/cloud-table-create.md
index 540136611..78cc6005c 100644
--- a/docs/cloud/cloud-tables/cloud-table-create.md
+++ b/docs/cloud/cloud-tables/cloud-table-create.md
@@ -9,7 +9,7 @@ nav_order: 1
# How do I create a table in FeatureBase Cloud?
{: .no_toc }
-{% include /concepts/summary-table-create.md %}
+{% include /faq/summary-table-create.md %}
{% include page-toc.md %}
@@ -22,7 +22,7 @@ nav_order: 1
## Naming standards
-{% include /concepts/standard-naming-obj.md%}
+{% include /faq/standard-naming-obj.md%}
{% include /cloud-table/cloud-standard-naming-table.md %}
## Step 1: create table
diff --git a/docs/cloud/cloud-tables/cloud-table-manage.md b/docs/cloud/cloud-tables/cloud-table-manage.md
index 60b52f468..49d6f10f8 100644
--- a/docs/cloud/cloud-tables/cloud-table-manage.md
+++ b/docs/cloud/cloud-tables/cloud-table-manage.md
@@ -11,23 +11,19 @@ has_toc: false
This page provides an overview of FeatureBase tables and links to guide you through creating, altering and dropping tables.
-{% include /concepts/summary-table-create.md %}
+{% include /faq/summary-table-create.md %}
{% include page-toc.md %}
## Before you begin
{% include /cloud/cloud-before-begin.md %}
+* [Learn about FeatureBase bitmaps](/docs/cloud/cloud-faq/cloud-faq-bitmaps)
* [Learn how to manage Cloud databases](/docs/cloud/cloud-databases/cloud-db-manage)
## Data modeling
-{% include /concepts/summary-data-modeling.md %}
-
-{: .important}
-Perform data modeling **before** creating tables to avoid issues.
-
-* [Learn about data modeling](/docs/concepts/overview-data-modeling)
+{% include /faq/summary-data-modeling.md %}
## Table primary key
@@ -39,7 +35,7 @@ Perform data modeling **before** creating tables to avoid issues.
## Naming standard
-{% include /concepts/standard-naming-obj.md%}
+{% include /faq/standard-naming-obj.md%}
{% include /cloud-table/cloud-standard-naming-table.md %}
{% include /cloud-table/cloud-standard-naming-col.md%}
diff --git a/docs/concepts/WIP-data-modeling b/docs/concepts/WIP-data-modeling
deleted file mode 100644
index bea6abf47..000000000
--- a/docs/concepts/WIP-data-modeling
+++ /dev/null
@@ -1,146 +0,0 @@
----
-Data Modeling
-nav_exclude: true
----
-
-
-# This excercise will help with data modeling in FeatureBase
-
-# We'll begin with how to "key" data when ingesting into FeatureBase, in short which values from data will be used as the primary key, unqiue key, or primary entity.
-
-
-# Create a table named data_modeling that has the following schema:
-# _id = Primary Key (ID Type)
-# unqiue_id = Secondary Unique ID
-# device_name = STRING information
-# total_bandwidth = Mutex INT Data
-# product_type = Mutex STRING Data
-# device_type = Non-Mutex STRING Data (String Arrays)
-# bandwidth_channel = Non-Mutex ID Data (ID Array)
-
-
-create table if not exists data_modeling (
- _id ID,
- unique_id ID,
- device_name STRING,
- total_bandwidth INT,
- product_type STRING,
- device_type STRINGSET,
- bandwidth_channel IDSET
-);
-
-# The following is generated comma delimited data for wifi devices containing general information about each.
-# In the first scenario we will key on monotonically increasing ID value from 1-20, this will create 20 records with each containing one row of data.
-
-```sql
-INSERT INTO data_modeling (_id, unique_id, device_name, total_bandwidth, product_type, device_type, bandwidth_channel) VALUES
-(1, 123456, 'BananaPhone', 54321, 'WiFi Router', ['Smart', 'Wireless'], [1, 6, 11]),
-(2, 789012, 'ToasterBot', 98765, 'WiFi Extender', ['Powerful', 'Range','Wireless'], [2, 7, 12]),
-(3, 345678, 'SockPuppet', 23456, 'WiFi Camera', ['HD', 'Night Vision','Wireless'], [3, 8, 13]),
-(4, 901234, 'PogoStick', 67890, 'Smart Home Hub', ['Voice Control', 'Automation'], [4, 9, 14]),
-(5, 567890, 'BubbleWrap', 12345, 'WiFi Speaker', ['Portable', 'Bluetooth','Wireless','Voice Control'], [5, 10, 15]),
-(6, 234567, 'SquirrelCage', 76543, 'WiFi Thermostat', ['Programmable', 'Energy Saving','Wireless','Energy Monitoring'], [6, 11, 16]),
-(7, 890123, 'RubberDucky', 32145, 'WiFi Doorbell', ['HD Video', 'Two-Way Audio'], [7, 12, 17]),
-(8, 456789, 'CrazyStraw', 65432, 'WiFi Light Bulb', ['Color Changing', 'Dimmable'], [8, 13, 18]),
-(9, 123123, 'PotatoLauncher', 98765, 'WiFi Security Camera', ['Motion Detection', 'Cloud Storage'], [9, 14, 19]),
-(10, 321321, 'MarshmallowGun', 54321, 'Smart Plug', ['Voice Control', 'Energy Monitoring'], [10, 15, 20]),
-(11, 111111, 'ToothbrushBot', 22222, 'WiFi Scale', ['Body Composition', 'Syncs Data','Wireless'], [11, 16, 21]),
-(12, 222222, 'CheeseGrater', 33333, 'WiFi Vacuum Cleaner', ['Mapping', 'App Control','Wireless'], [12, 17, 22]),
-(13, 333333, 'FryingPan', 44444, 'WiFi Coffee Maker', ['Programmable', 'Brew Strength','Wireless'], [13, 18, 23]),
-(14, 444444, 'TeaInfuser', 55555, 'WiFi Blender', ['Variable Speed', 'Pulse Function','Wireless'], [14, 19, 24]),
-(15, 555555, 'PillowFortress', 66600, 'WiFi Smart Lock', ['Keyless Entry', 'Remote Access','Wireless'], [15, 20, 25]),
-(16, 908743, 'TangoMango', 543210, 'WiFi Router', ['Smart', 'Wireless'], [5, 7, 14]),
-(17, 567986, 'JumboMumbo', 12300, 'WiFi Speaker', ['Portable', 'Bluetooth','Wireless','Voice Control'], [9, 11, 15]),
-(18, 987421, 'ZippyFondue', 44354, 'Smart Plug', ['Voice Control', 'Energy Monitoring'], [4, 17, 20]),
-(19, 128885, 'NumbPancake', 98789, 'WiFi Security Camera', ['Motion Detection', 'Cloud Storage'], [8, 11, 20]),
-(20, 489112, 'WillowPunch', 666253, 'WiFi Smart Lock', ['Keyless Entry', 'Remote Access','Wireless'], [14, 15, 20]);
-```
-
-# Now that the data is inserted into our table:
-
-# Upon running the following query:
-```sql
-select count(*) from data_modeling;
-```
-
-# We return a result of 20, as expected.
-
-# Taking a look at a groupby on the device_type which is a stringset, we see results are <20 as a few of our records have identical set arrays: ['Smart', 'Wireless']
-```sql
-select count(*), device_type from data_modeling group by device_type;
-```
-
-# Now altering this query slightly:
-```sql
-select count(*), device_type from data_modeling WITH (flatten(device_type)) group by device_type;
-
-```
-# We can flatten out the array in this set field to group over the individual elements, producing 28 groupings with their respective counts,
-
-
-# Now, let's ingest this same data into a new table and shift which column we key the data:
-
-create table if not exists data_modeling_2 (
- _id STRING,
- unique_id ID,
- device_name STRING,
- total_bandwidth INT,
- product_type ID,
- device_type STRINGSET,
- bandwidth_channel IDSET
-);
-
-# Notice we now are using a STRING instead of an ID type as the primary key (_id) allowing us to use arbitrary strings found in the records.
-
-
-# For this new insert statement the _id is now in the 5th ordinal position(instead of the first) and we've moved product_type to the first position.
-```sql
-INSERT INTO data_modeling_2 (product_type, unique_id, device_name, total_bandwidth, _id , device_type, bandwidth_channel) VALUES
-(1, 123456, 'BananaPhone', 54321, 'WiFi Router', ['Smart', 'Wireless'], [1, 6, 11]),
-(2, 789012, 'ToasterBot', 98765, 'WiFi Extender', ['Powerful', 'Range','Wireless'], [2, 7, 12]),
-(3, 345678, 'SockPuppet', 23456, 'WiFi Camera', ['HD', 'Night Vision','Wireless'], [3, 8, 13]),
-(4, 901234, 'PogoStick', 67890, 'Smart Home Hub', ['Voice Control', 'Automation'], [4, 9, 14]),
-(5, 567890, 'BubbleWrap', 12345, 'WiFi Speaker', ['Portable', 'Bluetooth','Wireless','Voice Control'], [5, 10, 15]),
-(6, 234567, 'SquirrelCage', 76543, 'WiFi Thermostat', ['Programmable', 'Energy Saving','Wireless','Energy Monitoring'], [6, 11, 16]),
-(7, 890123, 'RubberDucky', 32145, 'WiFi Doorbell', ['HD Video', 'Two-Way Audio'], [7, 12, 17]),
-(8, 456789, 'CrazyStraw', 65432, 'WiFi Light Bulb', ['Color Changing', 'Dimmable'], [8, 13, 18]),
-(9, 123123, 'PotatoLauncher', 98765, 'WiFi Security Camera', ['Motion Detection', 'Cloud Storage'], [9, 14, 19]),
-(10, 321321, 'MarshmallowGun', 54321, 'Smart Plug', ['Voice Control', 'Energy Monitoring'], [10, 15, 20]),
-(11, 111111, 'ToothbrushBot', 22222, 'WiFi Scale', ['Body Composition', 'Syncs Data','Wireless'], [11, 16, 21]),
-(12, 222222, 'CheeseGrater', 33333, 'WiFi Vacuum Cleaner', ['Mapping', 'App Control','Wireless'], [12, 17, 22]),
-(13, 333333, 'FryingPan', 44444, 'WiFi Coffee Maker', ['Programmable', 'Brew Strength','Wireless'], [13, 18, 23]),
-(14, 444444, 'TeaInfuser', 55555, 'WiFi Blender', ['Variable Speed', 'Pulse Function','Wireless'], [14, 19, 24]),
-(15, 555555, 'PillowFortress', 66600, 'WiFi Smart Lock', ['Keyless Entry', 'Remote Access','Wireless'], [15, 20, 25]),
-(16, 908743, 'TangoMango', 543210, 'WiFi Router', ['Smart', 'Wireless'], [5, 7, 14]),
-(17, 567986, 'JumboMumbo', 12300, 'WiFi Speaker', ['Portable', 'Bluetooth','Wireless','Voice Control'], [9, 11, 15]),
-(18, 987421, 'ZippyFondue', 44354, 'Smart Plug', ['Voice Control', 'Energy Monitoring'], [4, 17, 20]),
-(19, 128885, 'NumbPancake', 98789, 'WiFi Security Camera', ['Motion Detection', 'Cloud Storage'], [8, 11, 20]),
-(20, 489112, 'WillowPunch', 666253, 'WiFi Smart Lock', ['Keyless Entry', 'Remote Access','Wireless'], [14, 15, 20]);
-```
-
-
-# Now running the same count query: select count(*) from data_modeling_2;
-
-# We see a result of: 15
-
-# Curious, we ingested 20 rows of data, where did 5 of them go??
-
-# Well since there are 2 wifi routers and 2 security camera product_types the values for those are now underneath the primary key for those types:
-
-# Running a simple select statement will show this new distribution:
-```sql
-select * from data_modeling_2:
-```
-
-# Uh oh, the bandwidth_channel and device_type have extra values from both records, but the device_name only has 1 value, what happened??
-
-# The device_name String is a mutex field while the other 2 are non-mutex set fields, meaning the record contains the most recent value for mutex fields
-
-# Running the device_type groupby:
-```sql
-select count(*), device_type from data_modeling_2 group by device_type;
-```
-
-# This still returns 15, which means there are still 15 different arrays to grouping without flattening.
-
-# So this new data model, returns correct results for one scenario but not for another in which I want to count each distinct row of original data
diff --git a/docs/concepts/concept-ingest-avro-schema.md b/docs/concepts/concept-ingest-avro-schema.md
deleted file mode 100644
index cbab7c276..000000000
--- a/docs/concepts/concept-ingest-avro-schema.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title: Ingesting from Avro schema registry
-layout: default
-parent: Data modeling overview
-grand_parent: Concepts
-nav-order: 5
----
-
-# Ingesting from Avro Schema Registry
-
-How the Ingester indexes data in FeatureBase can be controlled to some extent via the schema registry. Avro schemas allow arbitrary properties to be associated with any item to implement features like [logical types](https://avro.apache.org/docs/current/spec.html#Logical+Types).
-
-A "float" or "double" type field in an Avro schema will be ingested into FeatureBase as a decimal field. If the property "scale" is provided, and is an integer, the value will be multiplied by 10^scale before being ingested. FeatureBase also stores the scale internally, so decimal fields will scale their query parameters appropriately, and floating point numbers are accepted as query parameters. A type which uses the logical type "decimal" will also be ingested as a decimal provided that it is 8 bytes or less (64 bit).
-
-A "boolean" type field (or a union of boolean and null), will be ingested according to the "pack-bools" setting on the ingester. By default, boolean fields are packed into two "set" fields in FeatureBase which has a few benefits. It reduces fragmentation internally in FeatureBase, and allows one to perform "TopN" queries on all boolean fields together. The reason there are two fields is to distinguish between true, false, and null. Each row in the "bools" field represents whether the boolean value is true. Each row in the "bools-exists" field represents whether or not the value is null. So, a set bit in the "bools" field always implies the corresponding set bit in the "bools-exists" field, but the lack of a set bit in the "bools" field needs to check "bools-exists" to determine if the value is null or false.
-
-An "enum" type will be ingested into a FeatureBase mutex field by default. Unlike a set field, if a different value comes in for the same record, the existing value will automatically be cleared—that is, each record (FeatureBase column) can only have one value for a mutex field.
-
-A "string" type will be ingested into a FeatureBase set field by default. One can choose to use a mutex field instead by adding the property '"mutex": true' to the schema for that field.
-
-Currently, the ingester supports a limited subset of Avro types. The top level type must be a Record, and nested fields are not supported—meaning that fields must not be of type Record or Map. Unions are only supported if it is a union of a supported type and null. Arrays are supported as long as they contain strings, bytes, fixed or enum types.
-
-Field names must be valid FeatureBase field names, so they must be all lower case, start with a letter, contain only letters, numbers, or dashes, and be 64 characters or less. We're hoping to lift these restrictions in an upcoming release.
diff --git a/docs/concepts/concept-ingest-eg-large-dataset.md b/docs/concepts/concept-ingest-eg-large-dataset.md
deleted file mode 100644
index 5d07b3176..000000000
--- a/docs/concepts/concept-ingest-eg-large-dataset.md
+++ /dev/null
@@ -1,236 +0,0 @@
----
-title: A True Crime Story
-layout: default
-parent: Examples
-grand_parent: Concepts
-nav_order: 2
----
-
-# A True Crime Story...Well, A Story About Modeling True Crime Data
-
-Ok, not a crime story, but a data modeling story with true crime… Data! This doc describes the flow of how you, an individual in the data community, might think about modeling data in FeatureBase versus a traditional RDBMS. Many of the differences derive from the fact that FeatureBase is built entirely on bitmaps, which can be read about [here](https://www.featurebase.com/blog/bitmaps-making-real-time-analytics-real). The data referenced in this post is real crime data from Boston and can be referenced [here](https://www.kaggle.com/datasets/AnalyzeBoston/crimes-in-boston) to follow along.
-
-As a data professional, you are tasked with helping the city of Boston reduce crime by finding insights in past and current data. You have just gotten your hands on a flat file with a couple of years of intriguing crime data, so now what? You realize this data is way too big to analyze on your local/virtual workspace (not really… but let’s say it is), so the first thing is getting it into a database in order to analyze and query the data much more easily.. You’ve decided you want to try FeatureBase because of all the great things you’ve heard. The first thing you’ll likely do is `head` the file and look at the provided schema to get a sense of the columns and, most importantly, what the grain of the data is. This ideally leads you to a unique identifier to operate as the primary key (or equivalent) for your table. FeatureBase requires a primary key, which is usually denoted as `_id` in the data model.
-
-```csv
-INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
-I182070945,00619,Larceny,LARCENY ALL OTHERS,D14,808,,2018-09-02 13:00:00,2018,9,Sunday,13,Part One,LINCOLN ST,42.35779134,-71.13937053,"(42.35779134, -71.13937053)"
-I182070943,01402,Vandalism,VANDALISM,C11,347,,2018-08-21 00:00:00,2018,8,Tuesday,0,Part Two,HECLA ST,42.30682138,-71.06030035,"(42.30682138, -71.06030035)"
-```
-
-Quickly looking at the file, you identify “INCIDENT_NUMBER” as the perfect candidate for your key. After jotting down the other columns and their potential types, you also decide “Location” seems unnecessary given “Lat” and “Long” already exist. Now you are off to ingest the data! For most databases, including FeatureBase, this means [creating a table](/docs/sql-guide/statements/statement-table-create) and modeling the columns based on what you see in the file. You’ve come up with the following for FeatureBase:
-
-```sql
-CREATE TABLE IF NOT EXISTS crime (
-_id STRING,
-offense_code INT MAX 10000,
-offense_code_group STRING,
-offense_description STRING,
-district STRING,
-reporting_area STRING,
-shooting STRING,
-occurred_on_date TIMESTAMP,
-year INT MIN -1 MAX 10000,
-month INT MIN -1 MAX 12,
-day_of_week STRING,
-hour INT MIN -1 MAX 24,
-ucr_part STRING,
-street STRING,
-lat DECIMAL(8),
-long DECIMAL(8)
-) WITH COMMENT 'table containing Boston crime data';
-```
-
-
-FeatureBase’s use of bitmaps and bit slice indexing means you don’t have to worry about manually creating indexes on this table to improve city analysts' query performance. All that’s needed is the schema. This is followed by sending the data to be ingested. For some databases this is a drag and drop GUI, and in others, like FeatureBase, it’s through [SQL statements](/docs/sql-guide/statements/statement-insert-bulk). After you learn about `BULK INSERT` and some expected back and forth troubleshooting with syntax and the date, you've come up with the following SQL statement:
-
-```sql
-BULK INSERT INTO crime (
-_id,
-offense_code,
-offense_code_group,
-offense_description,
-district,
-reporting_area,
-shooting,
-occurred_on_date,
-year,
-month,
-day_of_week,
-hour,
-ucr_part,
-street,
-lat,
-long)
-MAP (
-0 STRING,
-1 INT,
-2 STRING,
-3 STRING,
-4 STRING,
-5 STRING,
-6 STRING,
-7 TIMESTAMP,
-8 INT,
-9 INT,
-10 STRING,
-11 INT,
-12 STRING,
-13 STRING,
-14 DECIMAL(8),
-15 DECIMAL(8)
-)
-FROM
-'https://featurebase-public-data.s3.us-east-2.amazonaws.com/crime.csv'
-WITH
- BATCHSIZE 100000
- FORMAT 'CSV'
- INPUT 'URL'
- HEADER_ROW;
-```
-
-After running the above, you now have a FeatureBase table with 282,517 records! Job well done! Nothing could’ve gone wrong, but because it’s not your first rodeo, you do some simple record count validation to make sure no data was lost. Lo and behold you notice the file had 319,074 records! What is this madness? Well, it’s one of the differences between FeatureBase and other databases. It appears you made a mistake thinking “INCIDENT_NUMBER” was unique. Some databases may have thrown errors here because they would have seen duplicate values attempting to be loaded. Others may have ingested all 319,074 records because the backend implementation doesn’t require (or generates) unique keys. FeatureBase maintains unique keys and treats all ingest operations as UPSERTs. So every time a repeat incident number was loaded, all of the values in your table were updated for that incident’s record. The update behavior of UPSERTs depends on the data type that is being updated:
-
-|Datatype Being Updated| Behavior| Example |
-| ------- | ------------ | ------------ |
-| STRING, ID, INT, DECIMAL, TIMESTAMP | Replace existing value | Existing FeatureBase Value:
app_status (STRING): Pending
New Value Sent:
Approved
New FeatureBase Value:
app_status (STRING): Approve
-| IDSET, STRINGSET | Add (not delete) new values | Existing FeatureBase Value:
app_status (STRINGSET): Pending
New Value Sent:
Approved
New FeatureBase Value:
app_status (STRINGSET): Pending, Approve
-
-Well now you are conflicted. On the one hand, you know this is the actual number of unique incidents, so counts on this table will reflect actuals (versus needing a count distinct with 319,074 records). This is nice because you know the city’s analysts won’t make mistakes and count the same crime multiple times. However, you have lost some of the definition of your data like offense codes and groups that have multiple values for a single incident. You can verify this by using a `grep I162097077 .csv` on the file for incident "I162097077", which will return 3 records:
-
-```csv
-I162097077,3125,Warrant Arrests,WARRANT ARREST,C11,367,NULL,2016-11-28T12:00:00Z,2016,11,Monday,12,Part Three,GIBSON ST,42.29755533,-71.0597091
-I162097077,735,Auto Theft Recovery,RECOVERED - MV RECOVERED IN BOSTON (STOLEN OUTSIDE BOSTON),C11,367,NULL,2016-11-28T12:00:00Z,2016,11,Monday,12,Other,GIBSON ST,42.29755533,-71.0597091
-I162097077,1300,Recovered Stolen Property,STOLEN PROPERTY - BUYING / RECEIVING / POSSESSING,C11,367,NULL,2016-11-28T12:00:00Z,2016,11,Monday,12,Part Two,GIBSON ST,42.29755533,-71.0597091
-```
-
-And running the following query on your new table, which will have 1 record with partial data (i.e. the value for offense_code_group is "Recovered Stolen Property" and "Warrant Arrests" and "Auto Theft Recovery" are missing):
-
-```sql
-select * from crime where _id = 'I162097077'
-```
-
-
-You now consider creating a new unique key for each record so there is no data loss, but you find a FeatureBase superpower, [IDSET data type](/docs/sql-guide/data-types/data-type-idset) and [STRINGSET data type](/docs/sql-guide/data-types/data-type-stringset). These datatypes give individual records the ability to store multiple values for a single column.
-
-First, you look into `IDSET` but find you don’t know what the `ID` type is. After looking into the [ID data type](/docs/sql-guide/data-types/data-type-id), you find it is for unsigned integers that are more meaningful to represent as discrete values. Looking at your data model, you’ve made a mistake assigning some columns like “OFFENSE_CODE'' as integers. These codes are discrete values that should be treated categorically, as they will be used in `GROUP BY` and `WHERE` queries and not aggregated on or used in range queries. Others, like “YEAR”, are appropriate because you might use range queries in addition to `GROUP BY` statements. Now understanding `ID`, you see `IDSET` can be used to store multiple `ID` values for a single column. This is exactly what columns like “OFFENSE_CODE'' need. Next, you see `STRINGSET` operates similarly and can be used to store multiple `STRING` values for a single column, such as “OFFENSE_CODE_GROUP”. This type would be appropriate for others like “STREET” if different values were populated with the data, which they are not today. You revisit your data model (updated types below) and now consider if this is a good move:
-
-```sql
-CREATE TABLE IF NOT EXISTS crime (
-_id STRING,
-offense_code IDSET,
-offense_code_group STRINGSET,
-offense_description STRINGSET,
-district STRING,
-reporting_area STRING,
-shooting STRING,
-occurred_on_date TIMESTAMP,
-year INT MIN -1 MAX 10000,
-month INT MIN -1 MAX 12,
-day_of_week STRING,
-hour INT MIN -1 MAX 24,
-ucr_part STRINGSET,
-street STRING,
-lat DECIMAL(8),
-long DECIMAL(8)
-) WITH COMMENT 'table containing Boston crime data using SETs';
-```
-
-
-
-With this model, you won’t lose any of your data’s definition and will still maintain the true count of 282,517 unique incidents. What’s more, you see the space savings compared to both implementing a new unique key in FeatureBase and using a traditional database. A new unique key would have meant storing 36,557 additional records, and while these would be stored as efficient bitmaps, they would further grow your data footprint and potentially have an impact over time. You’d also be storing the same “INCIDENT_NUMBER” multiple times in addition to the new keys for every record. A traditional database would have meant writing many records with duplicate values for all the columns that don’t change (all date/time columns ( i.e "OCCURRED_ON_DATE"), “REPORTING_AREA”, lat/long, et al):
-
-|PK(_id)| INCIDENT_NUMBER | OFFENSE_CODE| OFFENSE_CODE_GROUP | OCCURRED_ON_DATE |
-| ------- | ------------ | ------------ | ------------ | ------------ |
-| 1 | *I162097077* | 00735 | Auto Theft Recovery | *2016-11-28T12:00:00Z* |
-| 2 | *I162097077* | 01300 | Recovered Stolen Property | *2016-11-28T12:00:00Z* |
-| 3 | *I162097077* | 03125 | Warrant Arrests | *2016-11-28T12:00:00Z* |
-
-
-Your new data model is much more efficient and only needs an additional bit tracked for each additional value in the `IDSET` and `STRINGSET` type columns, so you feel good about this call! In fact it’d be a crime not to do this… Ok sorry for that. You drop your old table (`DROP TABLE crime;`), run the new DDL above, and reload it with an altered `BULK INSERT`:
-
-```sql
-BULK INSERT INTO crime (
-_id,
-offense_code,
-offense_code_group,
-offense_description,
-district,
-reporting_area,
-shooting,
-occurred_on_date,
-year,
-month,
-day_of_week,
-hour,
-ucr_part,
-street,
-lat,
-long)
-MAP (
-0 STRING,
-1 IDSET,
-2 STRINGSET,
-3 STRINGSET,
-4 STRING,
-5 STRING,
-6 STRING,
-7 TIMESTAMP,
-8 INT,
-9 INT,
-10 STRING,
-11 INT,
-12 STRINGSET,
-13 STRING,
-14 DECIMAL(8),
-15 DECIMAL(8)
-)
-FROM
-'https://featurebase-public-data.s3.us-east-2.amazonaws.com/crime.csv'
-WITH
- BATCHSIZE 100000
- FORMAT 'CSV'
- INPUT 'URL'
- HEADER_ROW;
-```
-
-Now you can run the same query as before (`select * from crime where _id = 'I162097077'`) to validate all of the data is present and stored efficiently within a single record:
-
-|INCIDENT_NUMBER (_id)| OFFENSE_CODE| OFFENSE_CODE_GROUP | OCCURRED_ON_DATE |
-| ------- | ------------ | ------------ | ------------ |
-| I162097077 | 00735,01300,00735 | Auto Theft Recovery, Warrant Arrests, Recovered Stolen Property | 2016-11-28T12:00:00Z |
-
-Now you start thinking about how Boston could improve their data over time. Today, everything in the data is rolled up to one timestamp, “OCCURRED_ON_DATE”, so there is no way to know when each of the unique offense_codes were added. However, you have the foresight to know the city would love to track crime much more granularly. It seems like incidents in real life evolve over time, so it would be great to have each incident’s attributes updated at a time you are generically calling “UPDATE_DATE” for now. An example might be a robbery that starts at a certain location, like the bank (street, lat, long, et al), but then a couple hours later is given a car crash offense code at a different location when the robbers are stopped by the police. You want to add this ability but don’t want to add superfluous records for each “UPDATE_DATE”. Luckily, you find FeatureBase has another trick up its sleeve, [time quantums](/docs/concepts/time-quantums). With time quantums, you are able to associate a time with each value in `IDSET` and `STRINGSET` type columns. In the robbery example, you could set a new value for “STREET” and associate the appropriate time this value occurred at with the “UPDATE_DATE”. A record with this data model is represented below, but it's important to note that the times cannot be extracted/returned with your query result set, only to filter by.
-
-|INCIDENT_NUMBER (_id)| OFFENSE_CODE| STREET |
-| ------- | ------------ | ------------ |
-| I162097077 | 00735 (2016-11-28T12:00:00Z)
01300 (2016-11-28T16:00:00Z)
00735 (2016-11-29T11:00:00Z) | GIBSON ST (2016-11-28T12:00:00Z)
BROOKS ST(2016-11-28T16:00:00Z)
CENTRAL AVE (2016-11-29T11:00:00Z) |
-
-Now the city can analyze the data even further, such as seeing how an incident progresses over time (i.e. what streets were visited between two times), without having to create a new record every time there is an update for the incident. This is really powerful because the city can now accurately run queries that give them answers to question like “what crimes were occurring on this street between time A and time B?” This, in combination with the smaller data footprint and many low-latency advantages FeatureBase brings, has you feeling pretty good about your proposed data model for Boston. What’s more, you feel much more confident about what you can do with FeatureBase for other data sources in the future.
-
-Interested in following along with this exploration of Boston crime data? [Start your FREE FeatureBase Cloud trial today!](https://cloud.featurebase.com/signup)
diff --git a/docs/concepts/concept_ingest_id.md b/docs/concepts/concept_ingest_id.md
deleted file mode 100644
index 626ad7ad3..000000000
--- a/docs/concepts/concept_ingest_id.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-title: Unique identifier
-layout: default
-parent: Data modeling overview
-grand_parent: Concepts
-nav_order: 1
----
-
-# Conceptualising the unique identifier
-
-When ingesting into FeatureBase, each record must be associated with a key.
-
-
-
-Ingesters support four ways to do this, three suitable for production workloads:
-
-- `primary-key-fields`,
-- `id-field`,
-- `external-generate`, to use the FeatureBase ID allocator, optionally including `offset-mode`,
-- `auto-generate`, suitable for testing.
-
-## Identifying the best ID field
-
-The `id-field` option should be considered when there is an existing field in the data which uniquely identifies each record and consists of contiguous positive integers. For example, the auto-incremented ID field from a relational database is usually perfect for this.
-
-In most other cases, the `primary-key-fields` option should be used. This uses one or more fields, converted to strings, then concatenated (using `|` as the delimiter), to create unique record IDs. When only a single field is used for this, it will *not* be indexed as a field in FeatureBase. When multiple source fields are used, each individual field will be indexed in FeatureBase, in addition to being used for the record ID.
-
-As an example, consider a data set of students across multiple schools, perhaps with a different CSV file for each school:
-
-| school | studentID | UUID | age | grade | ... |
-| --- | --- | --- | --- | --- | --- |
-| (string) | (int) | (string) | (int) | (int) | |
-| Anderson | 0 | 63a8 | 14 | 9 | |
-| Anderson | 1 | 98e9 | 16 | 11 | |
-| Anderson | 2 | 9ccb | 16 | 11 | |
-| Anderson | 3 | 7325 | 15 | 10 | |
-| Bowie | 0 | 6ed3 | 17 | 12 | |
-| Bowie | 1 | 62a5 | 16 | 11 | |
-| Bowie | 2 | bd6c | 15 | 10 | |
-| Bowie | 3 | 5651 | 16 | 10 | |
-
-The studentID column, unique within a single school, serves as an identifier. When ingesting a single file corresponding to a single school, an ingest option like `--id-field=studentID` might work well. This will result in an index with `studentID` as FeatureBase record IDs, and every *other* column potentially represented as a FeatureBase field, including `school`, `UUID`, `age`, and `grade`.
-
-To ingest multiple files without conflicting IDs, a different approach is required. When an appropriate identifier like a UUID is available, that can be used directly, with an option like `--primary-key-fields=UUID`. This will result in an index with `UUID` as FeatureBase record keys, so the index depends on key translation to convert UUID string values to integer record IDs. Every other column would potentially be represented as a FeatureBase field, including `school`, `studentID`, `age`, and `grade`.
-
-Sometimes, an appropriate unique identifier is not directly available, or perhaps a data set is designed to use a composite key as a unique identifier. For example, if the students data set did not include a UUID column. In this case, multiple values can be combined to produce a composite identifier that is unique. One option that would work well here is the pair (school, studentID), which would be specified as `--primary-key-fields=school,studentID`. This would result in an index with this composite key as FeatureBase record keys. The key for the first row in the data set would be "Anderson|0". Again, this index would depend on key translation. This index, in contrast to the previous, could include *every* column as a FeatureBase field, including both `school` and `studentID` as separate fields.
-
-The `auto-generate` option can create auto-incrementing integer IDs, when generating test data, or when ingesting from a CSV source, for example. This option is suitable for quick testing purposes, but does not support using multiple ingest processes or stopping and restarting ingest.
-
-Finally, setting `external-generate` in addition to `auto-generate` uses FeatureBase's ID generation feature. Additionally, `offset-mode` can be set for use with Kafka.
diff --git a/docs/concepts/concepts-ingest-workflow.md b/docs/concepts/concepts-ingest-workflow.md
deleted file mode 100644
index 5f00a4066..000000000
--- a/docs/concepts/concepts-ingest-workflow.md
+++ /dev/null
@@ -1,55 +0,0 @@
----
-title: Ingest workflow
-layout: default
-parent: Data modeling overview
-grand_parent: Concepts
-nav_order: 4
----
-
-# Conceptualising the Ingest workflow
-
-
-The FeatureBase Ingest Development Kit is a system for efficiently loading large amounts of data into a FeatureBase cluster.
-It provides services which convert other data formats to FeatureBase's Roaring data format and load it into FeatureBase.
-
-The ingester has three steps:
-1. Collect records from a data source.
-2. Translate records into FeatureBase's Roaring Bitmap format.
-3. Copy the converted data into FeatureBase.
-
-## 1. Collect records from a data source.
-
-This process operates in large "batches" of records.
-The entirety of a single batch is copied into FeatureBase at the same time.
-Large batches mean that the per-batch overhead is less significant.
-A batch is created once a specified number of records have been pulled.
-
-{: .note}
-When using the Kafka ingester, a smaller batch will be created if Kafka stops supplying records for at least a second.
-
-
-
-### 2. Translate records into FeatureBase's Roaring Bitmap format.
-
-During the first step, the records are accumulated in a mostly uncompressed format. In order to compress them, the ingester needs to acquire "Key IDs" for all keyed rows and columns. In the case of a string field, there is one ID for each string value which can be present in the field. For a string-keyed index, there is one ID for each row. If the specified row/column did not previously exist, FeatureBase will generate an ID in this step.
-
-The process of obtaining these Key IDs is referred to as translation in the ingester's logs:
-
-```text
-2020/07/20 14:14:47 translating batch of 10 took: 10.1172ms
-```
-
-Once all of the IDs have been mapped, the ingester converts the batch into roughly the format that FeatureBase will store it in.
-
-### 3. Copy the converted data into FeatureBase.
-
-The ingester acquires a transaction in order to ensure that no other application accesses an incompletely written index, and then copies all of the data into FeatureBase. This step is typically bottlenecked either by the network or the storage device backing the FeatureBase cluster.
-
-The process of copying this data into FeatureBase is referred to as "flushing" in the ingester's logs, and typically takes a very small amount of time.
-
-For example:
-
-```text
-2020/07/20 14:14:47 flushing batch of 10 to fragments took 84.2µs
-```
diff --git a/docs/concepts/old-size-featurebase-database.md b/docs/concepts/old-size-featurebase-database.md
deleted file mode 100644
index 309cfa2c8..000000000
--- a/docs/concepts/old-size-featurebase-database.md
+++ /dev/null
@@ -1,66 +0,0 @@
----
-title: Sizing database
-layout: default
-parent: Data modeling overview
-grand_parent: Concepts
-nav-order: 10
----
-
-# Sizing FeatureBase database
-
-## Determining Hardware Requirements
-
-Ingesters in FeatureBase are stateless and can be deployed in containers and easily scaled up and down. FeatureBase is stateful and has widely varying hardware requirements depending on the size of the data and query workload. FeatureBase can also be scaled up and down, but there's enough overhead in this process that you wouldn't want to be resizing it constantly in response to shifting demand. You may also need to adjust some operating system configuration features to take full advantage of larger systems.
-
-## Memory
-
-If possible, determine the rough dimensions of the data you'll be storing in FeatureBase. The most important factors are the number of records, number of fields, type of each field (as it will be indexed in FeatureBase), and the cardinality of each field (number of distinct values).
-
-FeatureBase breaks data into shards which are, by default, 2^20 (1,048,576) records. It is useful to figure out approximately how large each of your shards will be, and then use that to extrapolate memory requirements. The most accurate way to do this is to load a shard's worth of data into FeatureBase and measure its size on disk. Below is a table of some typical field configurations, and how much space they use, as a starting point for estimating hardware sizes. Please keep in mind that depending on data distribution, the actual size in your case might vary significantly from these numbers.
-
-Depending on your storage backend, memory usage and disk usage can both vary. In general, you want at least a bit more memory than the on-disk storage of your data, possibly as much as twice as much memory available. This memory may look like it's directly being used by the FeatureBase engine, or may just be kernel disk caches.
-
-The rough formula for calculating total cluster data storage (across all hosts) is
-
-```math
-(num_records/shard_width)*size_per_shard*2
-```
-
-For more detailed information on data size, see the [Data Modeling](/docs/concepts/overview-data-modeling) section.
-
-| Field Type | Cardinality | Size (per shard) |
-| - | - | - |
-| Int | 20 Million | 3.1 MB |
-| Int | 10 Billion | 4.3 MB |
-| Int | 256 | 1 MB |
-| Set/Bool/Mutex | 2 | 0.3 MB |
-| Set/Bool/Mutex (sparse) | 500 | 2.1 MB |
-| Set/Bool/Mutex (sparse) | 1000 | 2.2 MB |
-| Set (dense) | 10 | 1.3 MB |
-| Set (dense) | 100 | 13 MB |
-
-
-As a worked example, if you expected to have about 100 million records, with the set of fields above, the calculation would look like:
-
-`(100,000,000/1,048,576)*(3.1+4.3+1+0.3+2.1+2.2+1.3+13)*2 = 5207MB ~= 5GB` of disk storage across your whole cluster. If you were using a single node, you'd want at least 8-10GB of available memory for it.
-
-When you split a cluster into multiple nodes, each node will have some duplication and overhead. So, if you were using 5GB on a single node, and switched to 5 nodes, you should budget for at least 2GB of storage on each node.
-
-## Disk
-
-For disk size requirements, refer to the [memory](#memory) section. Faster disks such as SSDs will affect startup time and ingest performance. Read performance may be affected by disk speed, depending on your backend, but if you have enough memory, the kernel will usually keep everything in disk cache anyway.
-
-## CPU
-
-In general, adding more CPU cores to a FeatureBase cluster improves query latency and throughput. For a single query, FeatureBase fans the query out to all shards which have data pertinent to the query, and each shard can be processed concurrently by a different CPU core. Adding cores past the number of shards in the cluster will not improve single query performance, though it will help with query throughput in the case of concurrent query loads. The number of CPU cores to allocate depends on latency needs, query workload, and data size and structure. A reasonable starting point might be to allocate 1 core for every 10 shards. You should also aim to provide at least one more core for general overhead not specific to processing results from shards.
-
-## Network
-
-While network typically isn't a bottleneck, FeatureBase hosts should typically be on the same LAN to minimize latency. Use placement groups (in AWS) or similar functionality to ensure minimum latency between hosts.
-
-## Other Considerations
-
-* All FeatureBase hosts should be the same size. FeatureBase doesn't currently have the ability to shard data unevenly, so adding hosts of different sizes limits utilization to the size of the smallest host.
-* Typical databases have fewer but larger hosts — 8+ cores and 16+GB of RAM are typical.
-* OS: a recent version of Linux.
-* Filesystem: Most options will work well. We have occasionally encountered [problems with file truncation on XFS](https://stackoverflow.com/questions/47077828/xfs-rhel7-3-cold-reboot-file-truncate), so we do not recommend it.
diff --git a/docs/concepts/overview-data-modeling.md b/docs/concepts/overview-data-modeling.md
deleted file mode 100644
index 710f9a511..000000000
--- a/docs/concepts/overview-data-modeling.md
+++ /dev/null
@@ -1,119 +0,0 @@
----
-title: Data modeling overview
-layout: default
-parent: Concepts
-has_children: true
-nav_order: 2
----
-
-# Data modeling overview
-{: .no_toc }
-
-{% include /concepts/summary-data-modeling.md %}
-
-{% include page-toc.md %}
-
-## Concepts
-
-When importing data into FeatureBase, you have a number of choices about how to represent that data.
-
-Choices about data representation mean trade-offs in both storage and runtime performance, and there are no best answers for all circumstances.
-
-This section offers guidance on likely ways to make these decisions, and a bit of theory describing what's happening under the hood to help you make better choices. If you're not sure, it's always a good idea to try things out and compare results.
-
-## Facts and Dimensions
-
-In a standard relational model, one often hears about "fact" tables vs "dimensions". Each record in a fact table typically represents an immutable event (e.g. someone clicked a link or made a purchase, a temperature reading was recorded, etc). Dimensions on the other hand usually represent slower changing "metadata". If your fact is that a user performed a certain action, one of your dimensions might be a "users" table that records things like date of birth, gender, address. Recording this information along with every fact would lead to a huge amount of duplication so it is typically split out.
-
-In FeatureBase, you can model things as you typically would in a relational database with facts and dimensions split apart, but FeatureBase has some unique capabilities that give you more options. Usually when you're doing queries that involve facts, you're not interested in the events themselves, but one of the dimensions that they affect. For example, you might want to know how many users visited a certain blog post as opposed to how many times that blog post was visited. They sound similar, but the first query is typically much more difficult because you're counting the distinct number of users rather than the number of events. In FeatureBase, you could add a "pages_visited" set type field directly to your users dimension and get the distinct functionality essentially for free. The power of the set field is that it can track multiple pages visited per user without additional overhead.
-
-But wait! There's more. What if you only wanted to get the set of users who visited a page within the past month? You'd have to go back to joining the facts with the dimension right? Nope. FeatureBase also has "time" fields which are just like set fields except you have the option to associate a coarse-grained timestamp with every user-page association (in fact you can have multiple timestamps associated with a single user-page pair). Currently the timestamps can be at yearly, monthly, daily, or hourly granularity, and FeatureBase lets you query across arbitrary time ranges.
-
-It takes up more space to store things like this, but if you have a workload that demands low latency for these types of queries it can be a very worthwhile tradeoff over storing the facts separately and joining across the dimensions at query time.
-
-## Fields
-
-Fields are used to segment rows within an index, for example to define different functional groups. A FeatureBase field might correspond to a single field in a relational table, where each row in a standard FeatureBase field represents a single possible value of the relational field. Similarly, an integer field could represent all possible integer values of a relational field.
-
-### Field Options
-
-this section is a placeholder, to provide minimal information about field options that are still exposed in the API, and linked from the http-api page -->
-
-## Ranked
-
-Ranked Fields maintain a sorted cache of column counts by Row ID (yielding the top rows by columns with a bit set in each). This cache facilitates the TopN query. The cache size defaults to 50,000 and can be set at Field creation.
-
-
-
-### LRU
-
-The LRU cache maintains the most recently accessed Rows.
-
-
-
-### Time Quantums
-
-Setting a time quantum on a field creates extra views which allow ranged Row queries down to the time interval specified.
-
-* [Learn about IDSet and Time quantums](/docs/sql-guide/data-types/data-type-idset)
-* [Learn about STRINGSET and Time quantums](/docs/sql-guide/data-types/data-type-stringset)
-
-### TTL (Time To Live)
-
-TTL is an option for fields with the type of time. Time quantum is required for TTL to function.
-
-* [Learn about IDSet and TTL](/docs/sql-guide/data-types/data-type-idset)
-* [Learn about STRINGSET and TTL](/docs/sql-guide/data-types/data-type-stringset)
-
-### Numeric Types
-
-FeatureBase has three ways to represent numeric data; sets, mutexes, and integer fields. Each of these field types uses a set of bitmaps under the hood where each bitmap represents a particular value for the field, and each position in a bitmap represents whether a record has that value. In a set field, each record can have any number of different values. Each value is logically independent. In general, sets are a good way to represent data where multiple traits or parameters are logically independent. A mutex is like a set, but each record can only have one value at a time; setting one value will clear the others. Int fields represent arbitrary values within a range, using multiple bitmaps to store binary digits of the values. Like a mutex, an int field has only one value per record at any given time.
-
-Even in the case where only one value is likely to be set for a given record, you may prefer set fields. If you always know the previous value, clearing that value directly will be more efficient than relying on the mutex logic to clear the other possible values. Integer fields support range queries, but any query will generally have to access all the bitmaps in the field since each one represents a binary digit. Set and mutex fields don't support range queries, but can query only the values they care about.
-
-
-### Integer Field Implementation
-
-In FeatureBase's current architecture, integer fields are implemented using bitplanes. The values in the field are decomposed into bits, and corresponding bits from integer field become bitmaps in the storage. So, one of the bitmaps represents the lowest-order bit (value 1) of every record's value. An integer field has existence and sign bits, and represents values around a given base value. Thus, the total number of rows used will be 2 + log2(N), where N is the distance from the base to the highest or lowest value. (The exact size might vary depending on how you set the field up; a range from 0 to 100,000 which never uses negative values has a sign bit which is never set, a range from -50,000 to +50,000 with an offset of 50,000 has the same range but needs one less row for data values.)
-
-The following table gives approximate estimated storage density for about a million records, assuming every record has values. A "weighted" distribution implies one with a significant variance in distribution, such as power-law or zipfian distributions, where some rows are very populated and some lightly populated.
-
-Storage requirements for data
-
-
-| | Integer | | | Mutex | | |
-| --- | --- | --- | --- | --- | --- | --- |
-| **Range, Distribution** | Rows | Storage/Row | Total Storage | Rows | Storage/Row | Total Storage |
-| 0-15, even | 4 | 128KiB* | 513KiB | 16 | 128KiB | 2040KiB |
-| 0-15, weighted | 4 | 60-128KiB* | 445KiB | 16 | 4KiB-128KiB | 638KiB |
-| 0-63, even | 6 | 128KiB* | 769KiB | 64 | 33KiB | 2064KiB |
-| 0-63, weighted | 6 | 14-128KiB | 506KiB | 64 | 0.5-128KiB | 687KiB |
-| 0-1023, even | 10 | 128KiB | 1282KiB | 1024 | 2KiB | 2304KiB |
-| 0-1023, weighted | 10 | 1-128KiB | 536KiB | 1024 | 0-128KiB | 743KiB |
-
-
-[*] Existence bitmaps are 352 bytes, and sign bitmaps are 0, in this data set; the table only shows sizes for the value bitmaps. In sparser data, the existence and sign bitmaps might be non-trivial.
-
-Integer fields with evenly distributed values will tend to have fairly high cardinality -- every value will probably set every bit in its range about half the time, so if you have values for most records, the individual bitmaps will tend to be fairly full, and will approach the maximum storage requirements, slightly over one bit per record per bitmap. With weighted values, the top bits may well have low enough cardinality to produce some space savings. The differences are much more significant with set/mutex type fields; most of the higher values in the 1024-value mutex field were empty (no file created on disk at all), and most of them were under 50 bytes.
-
-### Timestamp Field Implementation
-
-Timestamp fields are implemented internally the same way as integer fields and store the number of time units (e.g. seconds) since an epoch.
-By default, the `timeUnit` is in seconds (`s`) and the epoch is midnight, January 1, 1970 UTC. Other `timeUnit` values are `ms`, `us`, `ns`.
-Adjusting the `timeUnit` and epoch can limit the range of integer value and reduce the storage requirements and computation time when processing records.
-
-The following table gives approximate estimated storage density for about a million records, assuming every record has values.
-Storage requirements for timestamp data when using a "seconds" time unit
-
-| | Integer | | |
-| --- | --- | --- | --- |
-| **Range, Distribution** | Rows | Storage/Row | Total Storage |
-| 1 day | 17 | 128KiB* | 2176KiB |
-| 1 week | 20 | 128KiB* | 2560KiB |
-| 1 month | 22 | 128KiB* | 2816KiB |
-| 1 year | 25 | 128KiB* | 3200KiB |
-| 10 years | 29 | 128KiB* | 3712KiB |
-
-
-Bottom line: If you're storing timestamps at second granularity, you can expect it to use about 3.7MB per million records.
-At millisecond granularity, it would use 4.9MB per million records.
diff --git a/docs/concepts/time-quantums.md b/docs/concepts/time-quantums.md
deleted file mode 100644
index ff670a431..000000000
--- a/docs/concepts/time-quantums.md
+++ /dev/null
@@ -1,83 +0,0 @@
----
-title: Time Quantums
-layout: default
-parent: Data modeling overview
-grand_parent: Concepts
-nav_order: 2
----
-
-# Time Quantums - a `SETQ` constraint
-
-## Before you begin
-
-* [Learn about data modeling](/docs/concepts/overview-data-modeling)
-
-## What are time quantums?
-
-A time quantum is a feature for `IDSETQ` and `STRINGSETQ` type columns that allows you to associate a time (or multiple times) with each value in the column. Setting a time quantum creates views on the column that allow range queries down to the time granularity specified. You can think of a view as a rollup of your data based on the granularity of time you specify. If no time quantums are set, your data has one "standard" view by default.
-
-## When should you use time quantums?
-
-You should use time quantums when you want to associate a time with each value in `IDSET` and `STRINGSET` type columns, in addition to querying by that time.
-
-## When should you avoid time quantums?
-
-You should avoid time quantums if you don’t have a time you want to associate with a value, if you aren't interested in deleting values over time to save space, if you are trying to count the number of distinct time quantums associated to a particular value, and if you are looking to pull out time values as opposed to filtering by them.
-
-## How do you use time quantums?
-
-When creating a column, you specify the granularity of time you want views created for. FeatureBase supports hour (`H`), day (`D`), month (`M`), or year (`Y`) or any combination of the four (in descending order with no gaps in time. i.e. `YMD` but not `YD`). Setting these allows for lower latency queries depending on the period of time you are querying over, but at the cost of increased storage. For example, If you plan to have queries with a range of multiple months, `MD` is the best option, but if you will be querying over only a couple of days, `D` will be preferred. Note you can set just `D` and still query over multiple months, but it will not be as fast as using `MD`.
-
-Once created, a timestamp must be passed with each record during ingest that will be associated with all time quantum columns. Note this means you can only pass one time for all the time quantums in a record. For more information on configuring ingest, see the appropriate section in "Data Ingestion" navigation.
-
-Querying using time quantums is only supported in [PQL Rows Queries](/docs/pql-guide/pql-read-rows). You can pass a timestamp in the `to` and `from` arguments. In the example below, the `customer` table will pull back the customer IDs and what stores they visited between `2018-08-31` and `2022-02-18`
-
-```
-[customer]Extract(All(), Rows(stores_visited,from='2018-08-31', to='2022-02-18'))
-```
-
-You can associate multiple times with each value, so a value only has to exist in one view to be returned. This will not return the value twice and will only be counted once. You cannot return the underlying timestamps associated with each value.
-
-## What is happening when you use time quantums?
-
-Whenever a record with time quantums is ingested, a view is created for each level of granularity specified. This is essentially a copy of the column over a specific time range. If `YMDH` is specified and the time `2018-08-31T22:30:00Z` is ingested, a time view will exist for `2018`, `2018-08`, `2018-08-31`, and `2018-08-31T22`. This means data which has times for every hour for two days (say May 2nd and 3rd) in a column with `YMDH` time quantums configured will have 48+2+1+1+1 views (53) in total. 48 hours, 2 days, 1 month, 1 year, and the standard view.
-
-
-
-
-
-## Further information
-
-* [Learn about Time To Live](/docs/concepts/time-to-live)
diff --git a/docs/concepts/time-to-live.md b/docs/concepts/time-to-live.md
deleted file mode 100644
index 6575d7c22..000000000
--- a/docs/concepts/time-to-live.md
+++ /dev/null
@@ -1,93 +0,0 @@
----
-title: Time To Live (TTL)
-layout: default
-parent: Data modeling overview
-grand_parent: Concepts
-nav-order: 3
----
-
-# Time to live - a `SETQ` constraint
-
-TTL stands for time to live. TTL allows you to delete time views. Views are only deleted when the end of the time range the view represents is older than the TTL. TTL is only an option for `IDSET` and `STRINGSET` columns with time quantums set.
-
-## Before you begin
-
-* [Learn about data modeling](/docs/concepts/overview-data-modeling)
-* [Learn about time quantums](/docs/concepts/time-quantums)
-
-## When should you use TTL?
-
-When you don't care about older views and want to reduce the growth of your data footprint over time.
-
-## When should you avoid TTL?
-
-When you want to keep every view across your full historical data or you are looking for a solution that demands consistent removal of views.
-
-## How do you use TTL?
-
-TTL holds the duration for the views created by FeatureBase based on:
-
-* the time quantum time and the current time
-* the times associated with the data in time quantum views
-
-Once the TTL duration exprires, those views will be deleted.
-
-## Time units
-
-Allowed time units for TTL are `h`, `m`, `s`. Time unit is required. Default value is `0s`.
-
-
-
-Example:
-- "ttl":"1s" is equal to 1 second.
-- "ttl":"7200s" is equal to 720 seconds (2 hours).
-- "ttl":"72h" is equal to 72 hours.
-- "ttl":"6000second" will return error `error: unknown unit "second" in duration "6000second"`.
-
-If TTL's value is `0s` (default value), the views created based on time quantum will not be deleted.
-
-## TTL removal
-
-TTL removal is set to run when FeatureBase starts and every hour thereafter. This means view deletion is eventually consistent.
-
-TTL removal is, in general, not guaranteed to run at any particular time, and you should always use closed time ranges on your queries if you need to guarantee that results older than the TTL don't show up.
-
-For this reason, while you may specify times below an hour, it is recommended to use a TTL of one hour or more.
-
-## What is happening when you use TTL?
-
-A process runs periodically that looks at the views and the current time to see if they have exceeded the configured TTL. Each view may be deleted at a different time based on its granularity and how long it's been since the end of that view's time range.
-
-The rule is, if the end of the time range represented by the view is older than the TTL, it can be deleted.
-
-### Example of TTL
-
-Acolumn with `YMD` has four views for 2022-09-02 and TTL is set to `30d`
-* 2022
-* 2022-09
-* 2022-09-02 and standard
-
-This means that the following views are deleted:
-* 2022-09-02 view is cleared after 30 days (roughly on 2022-10-02),
-* 2022-09 view is cleared on October 30, 2022
-* 2022 view is deleted January 30, 2023.
-
-
-## Create a new field using TTL
-
-```
-curl -XPOST http://localhost:10101/index/**test_ttl_index**/field/**data_ttl** -d'{ "options": {"type":"time", "timeQuantum":"YMDH", **"ttl":"24h"**}}'
-```
-
-## Update an existing field with TTL
-
-```
-curl -XPATCH http://localhost:10101/index/**test_ttl_index**/field/**data_ttl** -d'{ **"option": "ttl", "value": "24h"**}
-```
-
-## Further information
-
-* [Large dataset import example](/docs/concepts/concept-ingest-eg-large-dataset)
diff --git a/docs/sql-guide/data-types/data-type-vector.md b/docs/sql-guide/data-types/data-type-vector.md
index e37477f03..dd3effc63 100644
--- a/docs/sql-guide/data-types/data-type-vector.md
+++ b/docs/sql-guide/data-types/data-type-vector.md
@@ -39,5 +39,5 @@ VECTOR({length})
## Further information
The following functions can be included in SELECT queries to measure VECTOR values:
-* [COSINE_DISTANCE function](/docs/sql-guide/functions/function-cosine-distance)
-* [EUCLIDEAN_DISTANCE function](/docs/sql-guide/functions/function-euclidean-distance)
+* [COSINE_DISTANCE function](/docs/sql-guide/functions/function-vector-distances)
+* [EUCLIDEAN_DISTANCE function](/docs/sql-guide/functions/function-vector-distances)
diff --git a/docs/sql-guide/expressions/expressions-home.md b/docs/sql-guide/expressions/expressions-home.md
index 3964b47d3..5baf37bed 100644
--- a/docs/sql-guide/expressions/expressions-home.md
+++ b/docs/sql-guide/expressions/expressions-home.md
@@ -17,7 +17,7 @@ nav_order: 5
## `identifier`
![expr](/assets/images/sql-guide/identifier.svg)
-{% include /concepts/standard-naming-obj.md %}
+{% include /faq/standard-naming-obj.md %}
## `expr`
diff --git a/docs/sql-guide/statements/statement-table-create.md b/docs/sql-guide/statements/statement-table-create.md
index c9940f39e..88b842eb0 100644
--- a/docs/sql-guide/statements/statement-table-create.md
+++ b/docs/sql-guide/statements/statement-table-create.md
@@ -49,7 +49,7 @@ CREATE TABLE
### Naming standards
-{% include /concepts/standard-naming-obj.md %}
+{% include /faq/standard-naming-obj.md %}
{% include /cloud-table/cloud-standard-naming-table.md %}
{% include /cloud-table/cloud-standard-naming-col.md %}
diff --git a/docs/sql-guide/statements/statement-tables-show.md b/docs/sql-guide/statements/statement-tables-show.md
index 7852826ca..913a2ef8f 100644
--- a/docs/sql-guide/statements/statement-tables-show.md
+++ b/docs/sql-guide/statements/statement-tables-show.md
@@ -49,7 +49,7 @@ SHOW TABLES [WITH SYSTEM];
### Naming standards
-{% include /concepts/standard-naming-obj.md %}
+{% include /faq/standard-naming-obj.md %}
{% include /cloud-table/cloud-standard-naming-table.md %}
{% include /cloud-table/cloud-standard-naming-col.md %}
diff --git a/docs/sql-guide/statements/statement-view-create.md b/docs/sql-guide/statements/statement-view-create.md
index bc79ec76e..fbae9d84d 100644
--- a/docs/sql-guide/statements/statement-view-create.md
+++ b/docs/sql-guide/statements/statement-view-create.md
@@ -35,7 +35,7 @@ CREATE VIEW view_name
### Naming standards
-{% include /concepts/standard-naming-obj.md %}
+{% include /faq/standard-naming-obj.md %}
{% include /cloud-table/cloud-standard-naming-table.md %}
{% include /cloud-table/cloud-standard-naming-col.md %}
diff --git a/help-on-help/style-guide/file-naming.md b/help-on-help/style-guide/file-naming.md
index 5e81f89d7..daef8d70a 100644
--- a/help-on-help/style-guide/file-naming.md
+++ b/help-on-help/style-guide/file-naming.md
@@ -6,7 +6,7 @@ Content files are added to `/docs` subfolders.
| Area of interest | Parent | Subfolders | Filenames | Example |
|---|---|---|---|---|
-| High level overviews | `/docs/concepts` | none | `/concepts/overview-.md` | `/concepts/overview-data-modeling.md` |
+| High level overviews | `/docs/cloud/cloud-faq/cloud-faq-home` | none | `/docs/cloud/cloud-faq/cloud-faq-` | `/docs/cloud/cloud-faq/cloud-faq-data-modeling` |
| Cloud product | `/docs/cloud` | `/cloud-` | `cloud--.md` | `/cloud/cloud-database/cloud-database-manage.md` |
| Community product | `/docs/community/` | `/com-` | `com--.md` | `/community/com-tables/com-tables-create.md` |
| SQL-guide (was SQL-preview) | `/docs/sql-guide` | `/docs/sql-guide/statements/statement-table-create.md`
`/docs/sql-guide/functions/functions-home.md` |