From dd020acd97b98ed530064f06b7b2a6582c1045e9 Mon Sep 17 00:00:00 2001 From: Sharon Lifshitz Date: Sun, 15 Sep 2019 18:08:01 +0300 Subject: [PATCH 1/3] [DOC] Frames GS MB doc review (#1) (v2.3.0 outputs) [IG-12272 IG-12092] --- getting-started/frames.ipynb | 788 +++++++++++++++++++++++------------ 1 file changed, 515 insertions(+), 273 deletions(-) diff --git a/getting-started/frames.ipynb b/getting-started/frames.ipynb index 71f03f7f..e0097fb7 100644 --- a/getting-started/frames.ipynb +++ b/getting-started/frames.ipynb @@ -4,28 +4,60 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Using Iguazio Frames Library for High-Performance Data Access \n", - "iguazio `v3io_frames` is a streaming oriented multi-model (generic) data API which allow high-speed data loading and storing
\n", - "frames currently support iguazio key/value, time-series, and streaming data models (called backends), additional backends will be added.\n", + "# Using the V3IO Frames Library for High-Performance Data Access \n", "\n", - "For detailed description of the Frames API go to https://github.com/v3io/frames\n", + "- [Overview](#frames-overview)\n", + "- [Initialization](#frames-init)\n", + "- [Working with NoSQL Tables (\"kv\" Backend)](#frames-kv)\n", + "- [Working with Time-Series Databases (\"tsdb\" Backend)](#frames-tsdb)\n", + "- [Working with Streams (\"stream\" Backend)](#frames-stream)\n", + "- [Cleanup](#frames-cleanup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Overview\n", "\n", - "to use frames you first create a `client` and provide it the session and credential details, the client is used to for 5 basic operations:\n", + "[V3IO Frames](https://github.com/v3io/frames) (**\"Frames\"**) is a multi-model open-source data-access library, developed by Iguazio, which provides a unified high-performance DataFrame API for loading, storing, and accessing data in the data store of the Iguazio Data Science Platform (**\"the platform**).\n", + "Frames currently supports the NoSQL (key/value), stream, and time-series (TSDB) data models via its \"kv\", \"stream\", and \"tsdb\" backends.\n", + "\n", + "To use Frames, you first need to import the **v3io_frames** library and create and initialize a client object — an instance of the`Client` class.
\n", + "The `Client` class features the following object methods for supporting basic data operations:\n", + "\n", + "- `create` — create a new NoSQL or TSDB table or a stream (\"the backend\").\n", + "- `delete` — delete the backend.\n", + "- `read` — read data from the backend (as a pandas DataFrame or DataFrame iterator).\n", + "- `write` — write one or more DataFrames to the backend.\n", + "- `execute` — execute a command on the backend. Each backend may support multiple commands.\n", + "\n", + "\n", + "For a detailed description of the Frames API, see the [Frames documentation](https://github.com/v3io/frames/blob/development/README.md).
\n", + "For more help and usage details, use the internal API help — `.?` in Jupyter Notebook or `print(..__doc__)`.
\n", + "For example, the following command returns information about the read operation for a client object named `client`:\n", "```\n", - " create - create a new time-series table or a stream \n", - " delete - delete the table or stream\n", - " read - read data from the backend (as pandas DataFrame or dataFrame iterator)\n", - " write - write one or more DataFrames into the backend\n", - " execute - execute a command on the backend, each backend may support multiple commands \n", - "``` \n", + "client.read?\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Initialization\n", "\n", - "Content:\n", - "- [Working with key/value and SQL data](kv)\n", - "- [Working with Time-series data](#tsdb)\n", - "- [Working with Streams](#stream)\n", + "To use V3IO Frames, first ensure that your platform tenant has a shared tenant-wide instance of the V3IO Frames service.\n", + "This can be done by a platform service administrator from the **Services** dashboard page.
\n", + "Then, import the required libraries and create a Frames client object (an instance of the `Client` class), as demonstrated in the following code, which creates a client object named `client`.\n", "\n", - "The following sections describe how to use frames, for more help and details use the internal documentation, e.g. run the following command\n", - "``` client.read?```\n" + "> **Note:**\n", + "> - The client constructor's `container` parameter is set to `\"users\"` for accessing data in the platform's \"users\" data container.\n", + "> - Because no authentication credentials are passed to the constructor, Frames will use the access token that's assigned to the `V3IO_ACCESS_KEY` environment variable.\n", + "> The platform's Jupyter Notebook service defines this variable automatically and initializes it to an access token for the running user of the service.\n", + "> You can pass different credentials by using the constructor's `token` parameter (platform access token) or `user` and `password` parameters (platform username and password)." ] }, { @@ -37,23 +69,72 @@ "import pandas as pd\n", "import v3io_frames as v3f\n", "import os\n", - "client = v3f.Client('framesd:8081', container='users')" + "\n", + "# Create a Frames client\n", + "client = v3f.Client(\"framesd:8081\", container=\"users\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Working with NoSQL Tables (\"kv\" Backend)\n", + "\n", + "This section demonstrates how to use the `\"kv\"` Frames backend to write and read NoSQL data in the platform.\n", + "\n", + "- [Initialization](#frames-kv-init)\n", + "- [Load Data from Amazon S3](frames-kv-load-data-s3)\n", + "- [Write to a NoSQL Table](#frames-kv-write)\n", + "- [Read from the Table Using an SQL Query](#frames-kv-read-sql-query)\n", + "- [Read from the Table Using the Frames API](#frames-kv-read-frames-api)\n", + " - [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)\n", + " - [Read Using a DataFrame Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)\n", + "- [Write Data Using an Update Expression](#frames-kv-write-update-expression)\n", + " - [Use the Write Method to Perform a Batch Update](#frames-kv-write-expression-batch-update)\n", + " - [Use the Update Method's Execute Command to Update a Single Item](#frames-kv-write-expression--singe-item-update-w-execute-update-cmd)\n", + "- [Delete the NoSQL Table](#frames-kv-delete)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "## Working with key/value and SQL data\n", + "\n", + "### Initialization\n", "\n", - "### Load data from Amazon S3" + "Start out by defining table-path variables that will be used in the tutorial's code examples.
\n", + "The table path (`table`) is relative to the configured parent data container; see [Write to a NoSQL Table](#frames-kv-write)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, + "outputs": [], + "source": [ + "# Relative path to the NoSQL table within the parent platform data container\n", + "table = os.path.join(os.getenv(\"V3IO_USERNAME\") + \"/examples/bank\")\n", + "\n", + "# Full path to the NoSQL table for SQL queries (platform Presto data-path syntax);\n", + "# use the same data container as used for the Frames client (\"users\")\n", + "sql_table_path = 'v3io.users.\"' + table + '\"'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Load Data from Amazon S3\n", + "\n", + "Read a file from an Amazon Simple Storage (S3) bucket into a Frames pandas DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, "outputs": [ { "data": { @@ -216,15 +297,14 @@ "4 unknown 5 may 226 1 -1 0 unknown no " ] }, - "execution_count": 2, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# read S3 file into a data frame and show its data & metadata\n", - "tablename = os.path.join(os.getenv('V3IO_USERNAME')+'/examples/bank')\n", - "df = pd.read_csv('https://s3.amazonaws.com/iguazio-sample-data/bank.csv', sep=';')\n", + "# Read an AWS S3 file into a DataFrame and show its data and metadata\n", + "df = pd.read_csv(\"https://s3.amazonaws.com/iguazio-sample-data/bank.csv\", sep=\";\")\n", "df.head()" ] }, @@ -232,33 +312,42 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Write data frames into the database using a single command\n", - "data is streamed into the database via fast NoSQL APIs, note the backend is `kv`
\n", - "the input data can be a single dataframe or a dataframe iterator (for streaming)" + "\n", + "### Write to a NoSQL Table\n", + "\n", + "Use the `write` method of the Frames client with the `\"kv\"` backend to write the data that was read in the previous step to a NoSQL table.
\n", + "The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).\n", + "In the following example, the relative table path is set by using the `table` variable that was defined in the [\"kv\" backend initialization](#frames-kv-init) step.
\n", + "The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames — either as a DataFrame iterator or as a list of DataFrames." ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ - "out = client.write('kv', tablename, df)" + "out = client.write(\"kv\", table=table, dfs=df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Read from the Database with DB side SQL\n", - "offload data filtering, grouping, joins, etc to a scale-out high speed DB engine
\n", - "Note that we're using a V3IO_USERNAME as environment variable as therefore we need to define the string for the \"From\" section
\n", - "The from convention is select * from v3io..\"path\"" + "\n", + "### Read from the Table Using an SQL Query\n", + "\n", + "You can run SQL queries on your NoSQL table (using Presto) to offload data filtering, grouping, joins, etc. to a scale-out high-speed database engine.\n", + "\n", + "> **Note:** To query a table in a platform data container, the table path in the `from` section of the SQL query should be of the format `v3io..\"/path/to/table\"`.\n", + "> See [Presto Data Paths](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/fundamentals/#data-paths-presto) in the platform documentation.\n", + "> In the following example, the path is set by using the `sql_table_path` variable that was defined in the [\"kv\" backend initialization](#frames-kv-init) step.\n", + "> Unless you changed the code, this variable translates to `v3io.users.\"/examples/bank\"`; for example, `v3io.users.\"iguazio/examples/bank\"` for user \"iguazio\"." ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -295,68 +384,83 @@ " no\n", " secondary\n", " 0\n", - " yes\n", + " no\n", " unknown\n", - " 249\n", + " 219\n", " married\n", " no\n", - " 19317\n", - " aug\n", - " cellular\n", - " 1\n", - " yes\n", + " 26452\n", + " jul\n", + " telephone\n", + " 2\n", + " no\n", " retired\n", - " 4\n", - " 68\n", + " 15\n", + " 75\n", " -1\n", " \n", " \n", " no\n", " secondary\n", " 0\n", - " no\n", + " yes\n", " unknown\n", - " 219\n", + " 249\n", " married\n", " no\n", - " 26452\n", - " jul\n", - " telephone\n", - " 2\n", - " no\n", + " 19317\n", + " aug\n", + " cellular\n", + " 1\n", + " yes\n", " retired\n", - " 15\n", - " 75\n", + " 4\n", + " 68\n", " -1\n", " \n", "" ], "text/plain": [ - "[('no', 'secondary', 0, 'yes', 'unknown', 249, 'married', 'no', 19317, 'aug', 'cellular', 1, 'yes', 'retired', 4, 68, -1),\n", - " ('no', 'secondary', 0, 'no', 'unknown', 219, 'married', 'no', 26452, 'jul', 'telephone', 2, 'no', 'retired', 15, 75, -1)]" + "[('no', 'secondary', 0, 'no', 'unknown', 219, 'married', 'no', 26452, 'jul', 'telephone', 2, 'no', 'retired', 15, 75, -1),\n", + " ('no', 'secondary', 0, 'yes', 'unknown', 249, 'married', 'no', 19317, 'aug', 'cellular', 1, 'yes', 'retired', 4, 68, -1)]" ] }, - "execution_count": 4, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "table_path = os.path.join('v3io.users.\"'+os.getenv('V3IO_USERNAME')+'/examples/bank\"')\n", - "%sql select * from $table_path where balance > 10000" + "%sql select * from $sql_table_path where balance > 10000" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Read the data through frames API\n", - "the frames API returns a dataframe or a dataframe iterator (a stream)
" + "\n", + "### Read from the Table Using the Frames API\n", + "\n", + "Use the `read` method of the Frames client with the `\"kv\"` backend to read data from your NoSQL table.
\n", + "The `read` method can return a DataFrame or a DataFrame iterator (a stream), as demonstrated in the following examples.\n", + "\n", + "- [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)\n", + "- [Read Using a DataFrame Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Read Using a Single DataFrame\n", + "\n", + "The following example uses a single command to read data from the NoSQL table into a DataFrame." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -380,26 +484,26 @@ " \n", " \n", " \n", - " housing\n", - " contact\n", - " education\n", - " loan\n", + " age\n", + " balance\n", " campaign\n", - " pdays\n", - " poutcome\n", + " contact\n", + " day\n", " default\n", - " balance\n", " duration\n", - " previous\n", + " education\n", + " housing\n", " job\n", + " loan\n", " marital\n", " month\n", - " day\n", - " age\n", + " pdays\n", + " poutcome\n", + " previous\n", " y\n", " \n", " \n", - " __name\n", + " index\n", " \n", " \n", " \n", @@ -422,22 +526,22 @@ " \n", " \n", " 75\n", - " no\n", - " telephone\n", - " secondary\n", - " no\n", + " 75.0\n", + " 26452.0\n", " 2.0\n", - " -1.0\n", - " unknown\n", + " telephone\n", + " 15.0\n", " no\n", - " 26452.0\n", " 219.0\n", - " 0.0\n", + " secondary\n", + " no\n", " retired\n", + " no\n", " married\n", " jul\n", - " 15.0\n", - " 75.0\n", + " -1.0\n", + " unknown\n", + " 0.0\n", " no\n", " \n", " \n", @@ -445,22 +549,22 @@ "" ], "text/plain": [ - " housing contact education loan campaign pdays poutcome default \\\n", - "__name \n", - "75 no telephone secondary no 2.0 -1.0 unknown no \n", + " age balance campaign contact day default duration education \\\n", + "index \n", + "75 75.0 26452.0 2.0 telephone 15.0 no 219.0 secondary \n", "\n", - " balance duration previous job marital month day age y \n", - "__name \n", - "75 26452.0 219.0 0.0 retired married jul 15.0 75.0 no " + " housing job loan marital month pdays poutcome previous y \n", + "index \n", + "75 no retired no married jul -1.0 unknown 0.0 no " ] }, - "execution_count": 5, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "df = client.read(backend='kv', table=tablename, filter=\"balance>20000\")\n", + "df = client.read(backend=\"kv\", table=table, filter=\"balance > 20000\")\n", "df.head(8)" ] }, @@ -468,36 +572,36 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Read the data as a stream iterator\n", - "to use iterator and allow cuncurent data movement and processing add `iterator=True`, you will need to iterate over the returned value or use `concat`\n", - "iterators work with all backends (not just stream), they allow streaming when placed as an input to write functions which support iterators as input" + "\n", + "#### Read Using a DataFrame Iterator (Streaming)\n", + "\n", + "The following example uses a DataFrame iterator to stream data from the NoSQL table into multiple DataFrames and allow concurrent data movement and processing.
\n", + "The example sets the `iterator` parameter to `True` to receive a DataFrame iterator (instead of the default single DataFrame), and then iterates the DataFrames in the returned iterator; you can also use `concat` instead of iterating the DataFrames.\n", + "\n", + "> **Note:** Iterators work with all Frames backends and can be used as input to write functions that support this, such as the `write` method of the Frames client." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - " balance campaign marital default loan contact y age \\\n", - "__name \n", - "75 26452.0 2.0 married no no telephone no 75.0 \n", + " age balance campaign contact day default duration education \\\n", + "index \n", + "75 75.0 26452.0 2.0 telephone 15.0 no 219.0 secondary \n", "\n", - " duration previous day housing pdays education job poutcome \\\n", - "__name \n", - "75 219.0 0.0 15.0 no -1.0 secondary retired unknown \n", - "\n", - " month \n", - "__name \n", - "75 jul \n" + " housing job loan marital month pdays poutcome previous y \n", + "index \n", + "75 no retired no married jul -1.0 unknown 0.0 no \n" ] } ], "source": [ - "dfs = client.read(backend='kv', table=tablename, filter=\"balance>20000\", iterator=True)\n", + "dfs = client.read(backend=\"kv\", table=table, filter=\"balance > 20000\", iterator=True)\n", "for df in dfs:\n", " print(df.head())" ] @@ -506,108 +610,139 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Batch updates with expression\n", - "in many cases we want to update specific column values or update a column using an expression (e.g. counter = counter + x)
\n", - "when using the key/value backend it can run an expression against each of the rows (specified in the index), and use the dataframe columns as parameters
\n", - "columns are specified using `{}`, e.g. specifing `expression=\"packets=packets+{pkt};bytes=bytes+{bytes};last_update={mytime}\"` will add the data in `pkt` and `bytes` column from the input dataframe to the `packets` and `bytes` columns in the row and set the `last_update` field to `mytime`. the rows are selected based on the input dataframe index" + "\n", + "### Write Data Using an Update Expression\n", + "\n", + "In many cases, it's useful to update specific attributes (columns) by using an update expression (for example, `counter = counter + 1`).\n", + "The `write` method and the `update` command of the `execute` method of the Frames client support an optional `expression` parameter for the `\"kv\"` backend, which can be set to a [platform update expression](https://www.iguazio.com/docs/reference/latest-release/expressions/update-expression/).\n", + "The difference is that `write` applies the expression to all the DataFrame items (rows) while `update` applies the expression only to a single item, as explained in the following examples.\n", + "\n", + "In Frames update expressions, attributes (columns) in the written DataFrame are embedded within curly braces (`{ATTRIBUTE}`); attributes in the target table are specified simply by their names (`ATTRIBUTE`), as with all platform expressions.\n", + "For example, `expression=\"packets=packets+{pkt}; bytes=bytes+{bytes}; last_update={mytime}\"` updates the values of the `packets` and `bytes` attributes in the table item by adding to their current values the values of the `pkt` and `bytes` DataFrame columns, and sets the value of the `last_update` attribute in the table item to the value of the `mytime` DataFrame column (creating the attribute if it doesn't already exist in the table item).\n", + "\n", + "> **Note:**\n", + "> - When setting the expression parameter, Frames doesn't update the table schema (unlike in standard writes).\n", + "> - Both the `write` method and the `update` command of the `execute` method also support an optional `condition` parameter for the `\"kv\"` backend.
\n", + "> This parameter can be set to a [platform condition expression](https://www.iguazio.com/docs/reference/latest-release/expressions/condition-expression/) to perform a conditional update — i.e., only update or create new items if specific conditions are met.\n", + "> Note that when the condition expression references a non-existing attribute, the condition evaluates to `false`.\n", + "\n", + "- [Use the Write Method to Perform a Batch Update](#frames-kv-write-expression-batch-update)\n", + "- [Use the Update Method's Execute Command to Update a Single Item](#frames-kv-write-expression-single-item-update-w-execute-update-cmd)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "#### Use the Write Method to Perform a Batch Update\n", + "\n", + "The `write` method applies the update expression of the `expression` parameter to all items in the DataFrame (\"batch\" update); i.e., all table items (rows) whose primary-key attribute (index-column) values match those of the DataFrame items are updated, and items that don't exist in the table are created." ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# example: creating a new column which reflect the delta between the old `balance` column and the one provided in df (should result in 0 since df didnt change)\n", - "out = client.write('kv', tablename, df, expression='balance_delta=balance-{balance}')" + "# Add a new \"balance_delta\" attribute (column) to all table items (rows) and set its value to the difference (delta) between the\n", + "# current value of the \"balance\" attribute in the table and the value provided for this attribute in the DataFrame.\n", + "# Because the value of \"balance\" in the DataFrame wasn't modified since it was written to the table, the attribute value that is written to table (for all items) should be 0.\n", + "out = client.write(\"kv\", table, df, expression=\"balance_delta = balance - {balance}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Making a single row update using execute command\n", - "The use of `condition` is optional and allow to implement safe/conditional transactions " + "\n", + "#### Use the Update Method's Execute Command to Update a Single Item\n", + "\n", + "The `update` command of the `execute` method updates or creates a single item whose primary-key attribute (index-column) value is specified in the command's `key` parameter, as demonstrated in the following example.\n", + "The example also uses the optional `condition` parameter to perform the update only if the specified condition is met." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "
" - ], - "text/plain": [ - "Empty DataFrame\n", - "Columns: []\n", - "Index: []" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "client.execute('kv',tablename,'update', args={'key':'44', 'expression': 'age=44', 'condition':'balance>0'})" + "# Conditionally update the table item whose primary-key attribute (index-column) value is 44 (`key`) and\n", + "# set its \"age\" attribute to 44, provided the value of the item's \"balance\" attribute is greater than 0.\n", + "client.execute(\"kv\", table, \"update\", args={\"key\": \"44\", \"expression\": \"age=44\", \"condition\": \"balance > 0\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Delete the table\n", - "note: in kv (NoSQL) tabels there is no need to create a table before using it" + "\n", + "### Delete the NoSQL Table\n", + "\n", + "Use the `delete` method of the Frames client with the `\"kv\"` backend to delete the NoSQL table that was used in the previous steps." ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "client.delete('kv',table=tablename)" + "# Delete the `table` NoSQL table\n", + "client.delete(\"kv\", table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "## Working with time-series data" + "\n", + "## Working with Time-Series Databases (\"tsdb\" Backend)\n", + "\n", + "This section demonstrates how to use the `\"tsdb\"` Frames backend to create a time-series database (TSDB) table in the platform, ingest data into the table, and read from the table (i.e., submit TSDB queries).\n", + "\n", + "- [Initialization](#frames-tsdb-init)\n", + "- [Create a TSDB Table](#frames-tsdb-create)\n", + "- [Write to the TSDB Table](#frames-tsdb-write)\n", + "- [Read from the TSDB Table](#frames-tsdb-read)\n", + "- [Delete the TSDB Table](#frames-tsdb-delete)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Note that the tsdb table example will be created under the root of the \"users\" container" + "\n", + "### Initialization\n", + "\n", + "Start out by defining a TSDB table-path variable that will be used in the tutorial's code examples.
\n", + "The table path (`tsdb_table`) is relative to the configured parent data container; see [Create a TSDB Table](#frames-tsdb-create)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# Relative path to the TSDB table within the parent platform data container\n", + "tsdb_table = os.path.join(os.getenv(\"V3IO_USERNAME\") + \"/examples/tsdb_tab\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Create a TSDB Table\n", + "\n", + "Use the `create` method of the Frames client with the `\"tsdb\"` backend to create a new TSDB table.
\n", + "The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).\n", + "In the following example, the relative table path is set by using the `tsdb_table` variable that was defined in the [\"tsdb\" backend initialization](#frames-tsdb-init) step.
\n", + "You can optionally use the `attrs` parameter to provide additional arguments.\n", + "For example, you can set the `rate` argument to the TSDB’s metric-samples ingestion rate (`\"[0-9]+/[smh]\"`; for example, `1/s`); the rate should be calculated according to the slowest expected ingestion rate." ] }, { @@ -616,8 +751,33 @@ "metadata": {}, "outputs": [], "source": [ - "# create a time series table, rate specifies the typical ingestion rate (e.g. one sample per minute)\n", - "client.create(backend='tsdb', table='tsdb_tab',attrs={'rate':'1/m'})" + "# Create a new TSDB table; ingestion rate = one sample per minute (\"1/m\")\n", + "client.create(backend=\"tsdb\", table=tsdb_table, attrs={\"rate\": \"1/m\"})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Write to the TSDB Table\n", + "\n", + "Use the `write` method of the Frames client with the `\"tsdb\"` backend to ingest data from a pandas DataFrame into your TSDB table.
\n", + "The primary-key attribute of platform TSDB tables (i.e., the DataFrame index column) must hold the sample time of the data (displayed as `time` in read outputs).
\n", + "In addition, TSDB table items (rows) can optionally have sub-index columns (attributes) that are called labels.\n", + "You can add labels to TSDB table items in one of two ways; you can also combine these methods:\n", + "\n", + "- Use the `labels` dictionary parameter of the `write` method to add labels to all the written metric-sample table items (DataFrame rows) — `{\"