diff --git a/docs/content/getting-started/data-out.md b/docs/content/getting-started/data-out.md
new file mode 100644
index 000000000000..9a3de7657ce3
--- /dev/null
+++ b/docs/content/getting-started/data-out.md
@@ -0,0 +1,16 @@
+---
+title: Get data out of Rerun
+order: 450
+---
+
+At its core, Rerun is a database. The viewer includes the [dataframe view](../reference/types/views/dataframe_view) to explore data in tabular form, and the SDK includes an API to export the data as dataframes from the recording. These features can be used, for example, to perform analysis on the data and log back the results to the original recording.
+
+In this three-part guide, we explore such a workflow by implementing an "open jaw detector" on top of our [face tracking example](https://rerun.io/examples/video-image/face_tracking). This process is split into three steps:
+
+1. [Explore a recording with the dataframe view](data-out/explore-as-dataframe)
+2. [Export the dataframe](data-out/export-dataframe)
+3. [Analyze the data and log the results](data-out/analyze-and-log)
+
+Note: this guide uses the popular [Pandas](https://pandas.pydata.org) dataframe package. The same concept however applies in the same way for alternative dataframe packages such as [Polars](https://pola.rs).
+
+If you just want to see the final result, jump to the [complete script](data-out/analyze-and-log.md#complete-script) at the end of the third section.
diff --git a/docs/content/getting-started/data-out/analyze-and-log.md b/docs/content/getting-started/data-out/analyze-and-log.md
new file mode 100644
index 000000000000..1e5c178d31dd
--- /dev/null
+++ b/docs/content/getting-started/data-out/analyze-and-log.md
@@ -0,0 +1,89 @@
+---
+title: Analyze the data and log the results
+order: 3
+---
+
+
+
+In the previous sections, we explored our data and exported it to a Pandas dataframe. In this section, we will analyze the data to extract a "jaw open state" signal and log it back to the viewer.
+
+
+
+## Analyze the data
+
+We already identified that thresholding the `jawOpen` signal at 0.15 is all we need to produce a binary "jaw open state" signal.
+
+In the [previous section](export-dataframe.md#inspect-the-dataframe), we prepared a flat, floating point column with the signal of interest called `"jawOpen"`. Let's add a boolean column to our Pandas dataframe to hold our jaw open state:
+
+```python
+df["jawOpenState"] = df["jawOpen"] > 0.15
+```
+
+
+## Log the result back to the viewer
+
+The first step is to initialize the logging SDK targeting the same recording we just analyzed.
+This requires matching both the application ID and recording ID precisely.
+By using the same identifiers, we're appending new data to an existing recording.
+If the recording is currently open in the viewer (and it's listening for new connections), this approach enables us to seamlessly add the new data to the ongoing session.
+
+```python
+rr.init(
+ recording.application_id(),
+ recording_id=recording.recording_id(),
+)
+rr.connect()
+```
+
+_Note_: When automating data analysis, it is typically preferable to log the results to an distinct RRD file next to the source RRD (using `rr.save()`). In such a situation, it is also valid to use the same app ID and recording ID. This allows opening both the source and result RRDs in the viewer, which will display data from both files under the same recording.
+
+We will log our jaw open state data in two forms:
+1. As a standalone `Scalar` component, to hold the raw data.
+2. As a `Text` component on the existing bounding box entity, such that we obtain a textual representation of the state in the visualization.
+
+Here is how to log the data as a scalar:
+
+```python
+rr.send_columns(
+ "/jaw_open_state",
+ times=[rr.TimeSequenceColumn("frame_nr", df["frame_nr"])],
+ components=[
+ rr.components.ScalarBatch(df["jawOpenState"]),
+ ],
+)
+```
+
+With use the [`rr.send_column()`](../../howto/send_columns.md) API to efficiently send the entire column of data in a single batch.
+
+Next, let's log the same data as `Text` component:
+
+```python
+target_entity = "/video/detector/faces/0/bbox"
+rr.log_components(target_entity, [rr.components.ShowLabels(True)], static=True)
+rr.send_columns(
+ target_entity,
+ times=[rr.TimeSequenceColumn("frame_nr", df["frame_nr"])],
+ components=[
+ rr.components.TextBatch(np.where(df["jawOpenState"], "OPEN", "CLOSE")),
+ ],
+)
+```
+
+Here we first log the [`ShowLabel`](../../reference/types/components/show_labels.md) component as static to enable the display of the label. Then, we use `rr.send_column()` again to send an entire batch of text labels. We use the [`np.where()`](https://numpy.org/doc/stable/reference/generated/numpy.where.html) to produce a label matching the state for each timestamp.
+
+### Final result
+
+With some adjustments to the viewer blueprint, we obtain the following result:
+
+
+
+The OPEN/CLOSE label is displayed along the bounding box on the 2D view, and the `/jaw_open_state` signal is visible in both the timeseries and dataframe views.
+
+
+### Complete script
+
+Here is the complete script used by this guide to load data, analyze it, and log the result back:
+
+snippet: tutorials/data_out
diff --git a/docs/content/getting-started/data-out/explore-as-dataframe.md b/docs/content/getting-started/data-out/explore-as-dataframe.md
new file mode 100644
index 000000000000..21ed0aed09c8
--- /dev/null
+++ b/docs/content/getting-started/data-out/explore-as-dataframe.md
@@ -0,0 +1,72 @@
+---
+title: Explore a recording with the dataframe view
+order: 1
+---
+
+
+
+
+In this first part of the guide, we run the [face tracking example](https://rerun.io/examples/video-image/face_tracking) and explore the data in the viewer.
+
+## Create a recording
+
+The first step is to create a recording in the viewer using the face tracking example. Check the [face tracking installation instruction](https://rerun.io/examples/video-image/face_tracking#run-the-code) for more information on how to run this example.
+
+Here is such a recording:
+
+
+
+A person's face is visible and being tracked. Their jaws occasionally open and close. In the middle of the recording, the face is also temporarily hidden and no longer tracked.
+
+
+## Explore the data
+
+Amongst other things, the [MediaPipe Face Landmark](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker) package used by the face tracking example outputs so-called blendshapes signals, which provide information on various aspects of the face expression. These signals are logged under the `/blendshapes` root entity by the face tracking example.
+
+One signal, `jawOpen` (logged under the `/blendshapes/0/jawOpen` entity as a [`Scalar`](../../reference/types/components/scalar.md) component), is of particular interest for our purpose. Let's inspect it further using a timeseries view:
+
+
+
+
+This signal indeed seems to jump from approximately 0.0 to 0.5 whenever the jaws are open. We also notice a discontinuity in the middle of the recording. This is due to the blendshapes being [`Clear`](../../reference/types/archetypes/clear.md)ed when no face is detected.
+
+Let's create a dataframe view to further inspect the data:
+
+
+
+Here is how this view is configured:
+- Its content is set to `/blendshapes/0/jawOpen`. As a result, the table only contains columns pertaining to that entity (along with any timeline(s)). For this entity, a single column exists in the table, corresponding to entity's single component (a `Scalar`).
+- The `frame_nr` timeline is used as index for the table. This means that the table will contain one row for each distinct value of `frame_nr` for which data is available.
+- The rows can further be filtered by time range. In this case, we keep the default "infinite" boundaries, so no filtering is applied.
+- The dataframe view has other advanced features which we are not using here, including filtering rows based on the existence of data for a given column, or filling empty cells with latest-at data.
+
+
+
+Now, let's look at the actual data as represented in the above screenshot. At around frame #140, the jaws are open, and, accordingly, the `jawOpen` signal has values around 0.55. Shortly after, they close again and the signal decreases to below 0.1. Then, the signal becomes empty. This happens in rows corresponding to the period of time when the face cannot be tracked and all the signals are cleared.
+
+
+## Next steps
+
+Our exploration of the data in the viewer so far provided us with two important pieces of information useful to implement the jaw open detector.
+
+First, we identified that the `Scalar` value contained in `/blendshapes/0/jawOpen` contains relevant data. In particular, thresholding this signal with a value of 0.15 should provide us with a closed/opened jaw state binary indicator.
+
+Then, we explored the numerical data in a dataframe view. Importantly, the way we configured this view for our needs informs us on how to query the recording from code such as to obtain the correct output.
+
+
+
+From there, our next step is to query the recording and extract the data as a Pandas dataframe in Python. This is covered in the [next section](export-dataframe.md) of this guide.
diff --git a/docs/content/getting-started/data-out/export-dataframe.md b/docs/content/getting-started/data-out/export-dataframe.md
new file mode 100644
index 000000000000..c9d9599167d5
--- /dev/null
+++ b/docs/content/getting-started/data-out/export-dataframe.md
@@ -0,0 +1,204 @@
+---
+title: Export the dataframe
+order: 2
+---
+
+
+In the [previous section](explore-as-dataframe.md), we explored some face tracking data using the dataframe view. In this section, we will see how we can use the dataframe API of the Rerun SDK to export the same data into a [Pandas](https://pandas.pydata.org) dataframe to further inspect and process it.
+
+## Load the recording
+
+The dataframe SDK loads data from an .RRD file.
+The first step is thus to save the recording as RRD, which can be done from the Rerun menu:
+
+
+
+We can then load the recording in a Python script as follows:
+
+```python
+import rerun as rr
+import numpy as np # We'll need this later.
+
+# load the recording
+recording = rr.dataframe.load_recording("face_tracking.rrd")
+```
+
+
+## Query the data
+
+Once we loaded a recording, we can query it to extract some data. Here is how it is done:
+
+```python
+# query the recording into a pandas dataframe
+view = recording.view(
+ index="frame_nr",
+ contents="/blendshapes/0/jawOpen"
+)
+table = view.select().read_all()
+```
+
+A lot is happening here, let's go step by step:
+1. We first create a _view_ into the recording. The view specifies which index column we want to use (in this case the `"frame_nr"` timeline), and which other content we want to consider (here, only the `/blendshapes/0/jawOpen` entity). The view defines a subset of all the data contained in the recording where each row has a unique value for the index, and columns are filtered based on the value(s) provided as `contents` argument.
+2. A view can then be queried. Here we use the simplest possible form of querying by calling `select()`. No filtering is applied, and all view columns are selected. The result thus corresponds to the entire view.
+3. The object returned by `select()` is a [`pyarrow.RecordBatchReader`](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html). This is essentially an iterator that returns the stream of [`pyarrow.RecordBatch`](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html#pyarrow-recordbatch)es containing the query data.
+4. Finally, we use the [`pyarrow.RecordBatchReader.read_all()`](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader.read_all) function to read all record batches as a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table).
+
+**Note**: queries can be further narrowed by filtering rows and/or selecting a subset of the view columns. See the reference documentation for more information.
+
+
+
+Let's have a look at the resulting table:
+
+```python
+print(table)
+```
+
+Here is the result:
+```
+pyarrow.Table
+frame_nr: int64
+frame_time: timestamp[ns]
+log_tick: int64
+log_time: timestamp[ns]
+/blendshapes/0/jawOpen:Scalar: list
+ child 0, item: double
+----
+frame_nr: [[0],[1],...,[412],[413]]
+frame_time: [[1970-01-01 00:00:00.000000000],[1970-01-01 00:00:00.040000000],...,[1970-01-01 00:00:16.480000000],[1970-01-01 00:00:16.520000000]]
+log_tick: [[34],[92],...,[22077],[22135]]
+log_time: [[2024-10-13 08:26:46.819571000],[2024-10-13 08:26:46.866358000],...,[2024-10-13 08:27:01.722971000],[2024-10-13 08:27:01.757358000]]
+/blendshapes/0/jawOpen:Scalar: [[[0.03306490555405617]],[[0.03812221810221672]],...,[[0.06996039301156998]],[[0.07366073131561279]]]
+```
+
+Again, this is a [PyArrow](https://arrow.apache.org/docs/python/index.html) table which contains the result of our query. Further exploring Arrow structures is beyond the scope of this guide. Yet, it is a reminder that Rerun natively stores—and returns—data in arrow format. As such, it efficiently interoperates with other Arrow-native and/or compatible tools such as [Polars](https://pola.rs) or [DuckDB](https://duckdb.org).
+
+
+## Create a Pandas dataframe
+
+Before exploring the data further, let's convert the table to a Pandas dataframe:
+
+```python
+df = table.to_pandas()
+```
+
+Alternatively, the dataframe can be created directly, without using the intermediate PyArrow table:
+
+```python
+df = view.select().read_pandas()
+```
+
+
+## Inspect the dataframe
+
+Let's have a first look at this dataframe:
+
+```python
+print(df)
+```
+
+Here is the result:
+
+
+
+```
+ frame_nr frame_time log_tick log_time /blendshapes/0/jawOpen:Scalar
+0 0 1970-01-01 00:00:00.000 34 2024-10-13 08:26:46.819571 [0.03306490555405617]
+1 1 1970-01-01 00:00:00.040 92 2024-10-13 08:26:46.866358 [0.03812221810221672]
+2 2 1970-01-01 00:00:00.080 150 2024-10-13 08:26:46.899699 [0.027743922546505928]
+3 3 1970-01-01 00:00:00.120 208 2024-10-13 08:26:46.934704 [0.024137917906045914]
+4 4 1970-01-01 00:00:00.160 266 2024-10-13 08:26:46.967762 [0.022867577150464058]
+.. ... ... ... ... ...
+409 409 1970-01-01 00:00:16.360 21903 2024-10-13 08:27:01.619732 [0.07283800840377808]
+410 410 1970-01-01 00:00:16.400 21961 2024-10-13 08:27:01.656455 [0.07037288695573807]
+411 411 1970-01-01 00:00:16.440 22019 2024-10-13 08:27:01.689784 [0.07556036114692688]
+412 412 1970-01-01 00:00:16.480 22077 2024-10-13 08:27:01.722971 [0.06996039301156998]
+413 413 1970-01-01 00:00:16.520 22135 2024-10-13 08:27:01.757358 [0.07366073131561279]
+
+[414 rows x 5 columns]
+```
+
+
+
+We can make several observations from this output.
+
+- The first four columns are timeline columns. These are the various timelines the data is logged to in this recording.
+- The last columns is named `/blendshapes/0/jawOpen:Scalar`. This is what we call a _component column_, and it corresponds to the [Scalar](../../reference/types/components/scalar.md) component logged to the `/blendshapes/0/jawOpen` entity.
+- Each row in the `/blendshapes/0/jawOpen:Scalar` column consists of a _list_ of (typically one) scalar.
+
+This last point may come as a surprise but is a consequence of Rerun's data model where components are always stored as arrays. This enables, for example, to log an entire point cloud using the [`Points3D`](../../reference/types/archetypes/points3d.md) archetype under a single entity and at a single timestamp.
+
+Let's explore this further, recalling that, in our recording, no face was detected at around frame #170:
+
+```python
+print(df["/blendshapes/0/jawOpen:Scalar"][160:180])
+```
+
+Here is the result:
+
+```
+160 [0.0397215373814106]
+161 [0.037685077637434006]
+162 [0.0402931347489357]
+163 [0.04329492896795273]
+164 [0.0394592322409153]
+165 [0.020853394642472267]
+166 []
+167 []
+168 []
+169 []
+170 []
+171 []
+172 []
+173 []
+174 []
+175 []
+176 []
+177 []
+178 []
+179 []
+Name: /blendshapes/0/jawOpen:Scalar, dtype: object
+```
+
+We note that the data contains empty lists when no face is detected. When the blendshapes entities are [`Clear`](../../reference/types/archetypes/clear.md)ed, this happens for the corresponding timestamps and all further timestamps until a new value is logged.
+
+While this data representation is in general useful, a flat floating point representation with NaN for missing values is typically more convenient for scalar data. This is achieved using the [`explode()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html) method:
+
+```python
+df["jawOpen"] = df["/blendshapes/0/jawOpen:Scalar"].explode().astype(float)
+print(df["jawOpen"][160:180])
+```
+Here is the result:
+```
+160 0.039722
+161 0.037685
+162 0.040293
+163 0.043295
+164 0.039459
+165 0.020853
+166 NaN
+167 NaN
+168 NaN
+169 NaN
+170 NaN
+171 NaN
+172 NaN
+173 NaN
+174 NaN
+175 NaN
+176 NaN
+177 NaN
+178 NaN
+179 NaN
+Name: jawOpen, dtype: float64
+```
+
+This confirms that the newly created `"jawOpen"` column now contains regular, 64-bit float numbers, and missing values are represented by NaNs.
+
+_Note_: should you want to filter out the NaNs, you may use the [`dropna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html) method.
+
+## Next steps
+
+With this, we are ready to analyze the data and log back the result to the Rerun viewer, which is covered in the [next section](analyze-and-log.md) of this guide.
diff --git a/docs/content/getting-started/troubleshooting.md b/docs/content/getting-started/troubleshooting.md
index 38194ae097f7..e78bed9360e0 100644
--- a/docs/content/getting-started/troubleshooting.md
+++ b/docs/content/getting-started/troubleshooting.md
@@ -1,6 +1,6 @@
---
title: Troubleshooting
-order: 600
+order: 800
---
You can set `RUST_LOG=debug` before running to get some verbose logging output.
diff --git a/docs/content/howto/dataframe-api.md b/docs/content/howto/dataframe-api.md
index 6a5d766e549e..3a01ab4d3c4a 100644
--- a/docs/content/howto/dataframe-api.md
+++ b/docs/content/howto/dataframe-api.md
@@ -23,8 +23,8 @@ Although RRD files generally contain a single recording, they may occasionally c
For such RRD, the `load_archive()` function can be used:
-
+
```python
import rerun as rr
@@ -35,6 +35,7 @@ print(f"The archive contains {archive.num_recordings()} recordings.")
for recording in archive.all_recordings():
...
```
+
The overall content of the recording can be inspected using the `schema()` method:
@@ -45,7 +46,6 @@ schema.index_columns() # list of all index columns (timelines)
schema.component_columns() # list of all component columns
```
-
### Creating a view
The first step for getting data out of a recording is to create a view, which requires specifying an index column and what content to include.
@@ -84,7 +84,7 @@ A view has several APIs to further filter the rows it will return.
-**Filtering by time range**
+#### Filtering by time range
Rows may be filtered to keep only a given range of values from its index column:
@@ -94,13 +94,14 @@ view = view.filter_range_sequence(0, 10)
```
This API exists for both temporal and sequence timeline, and for various units:
+
- `view.filter_range_sequence(start_frame, end_frame)` (takes `int` arguments)
- `view.filter_range_seconds(stat_second, end_second)` (takes `float` arguments)
- `view.filter_range_nanos(start_nano, end_nano)` (takes `int` arguments)
(all ranges are including both start and end values)
-**Filtering by index value**
+#### Filtering by index value
Rows may be filtered to keep only those whose index corresponds to a specific set of value:
@@ -112,8 +113,7 @@ Note that a precise match is required.
Since Rerun internally stores times as `int64`, this method is only available for integer arguments (nanos or sequence number).
Floating point seconds would risk false mismatch due to numerical conversion.
-
-**Filtering by column not null**
+##### Filtering by column not null
Rows where a specific column has null values may be filtered out using the `filter_is_not_null()` method. When using this method, only rows for which a logging event exist for the provided column are returned.
@@ -137,7 +137,6 @@ For this reason, a floating point version of this method is not provided for thi
Note that this feature is typically used in conjunction with `fill_latest_at()` (see next paragraph) to enable arbitrary resampling of the original data.
-
### Filling empty values with latest-at data
By default, the rows returned by the view may be sparse and contain values only for the columns where a logging event actually occurred at the corresponding index value.
@@ -151,7 +150,6 @@ view = view.fill_latest_at()
Once the view is fully set up (possibly using the filtering features previously described), its content can be read using the `select()` method. This method optionally allows specifying which subset of columns should be produced:
-
```python
# select all columns
record_batches = view.select()
@@ -169,7 +167,6 @@ The `select()` method returns a [`pyarrow.RecordBatchReader`](https://arrow.apac
For the rest of this page, we explore how these `RecordBatch`es can be ingested in some of the popular data science packages.
-
## Load data to a PyArrow `Table`
The `RecordBatchReader` provides a [`read_all()`](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader.read_all) method which directly produces a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table):
@@ -183,12 +180,10 @@ view = recording.view(index="frame_nr", contents="/**")
table = view.select().read_all()
```
-
## Load data to a Pandas dataframe
The `RecordBatchReader` provides a [`read_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html#pyarrow.RecordBatchReader.read_pandas) method which returns a [Pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html):
-
```python
import rerun as rr
@@ -212,7 +207,6 @@ view = recording.view(index="frame_nr", contents="/**")
df = pl.from_arrow(view.select().read_all())
```
-
## Load data to a DuckDB relation
A [DuckDB](https://duckdb.org) relation can be created directly using the `pyarrow.RecordBatchReader` returned by `select()`:
diff --git a/docs/snippets/all/tutorials/data_out.py b/docs/snippets/all/tutorials/data_out.py
new file mode 100644
index 000000000000..c0f6e214c077
--- /dev/null
+++ b/docs/snippets/all/tutorials/data_out.py
@@ -0,0 +1,50 @@
+from __future__ import annotations
+
+import numpy as np
+import rerun as rr
+
+# ----------------------------------------------------------------------------------------------
+# Load and prepare the data
+
+# load the recording
+recording = rr.dataframe.load_recording("face_tracking.rrd")
+
+# query the recording into a pandas dataframe
+record_batches = recording.view(index="frame_nr", contents="/blendshapes/0/jawOpen").select()
+df = record_batches.read_pandas()
+
+# convert the "jawOpen" column to a flat list of floats
+df["jawOpen"] = df["/blendshapes/0/jawOpen:Scalar"].explode().astype(float)
+
+# ----------------------------------------------------------------------------------------------
+# Analyze the data
+
+# compute the mouth state
+df["jawOpenState"] = df["jawOpen"] > 0.15
+
+# ----------------------------------------------------------------------------------------------
+# Log the data back to the viewer
+
+# Connect to the viewer
+rr.init(recording.application_id(), recording_id=recording.recording_id())
+rr.connect()
+
+# log the jaw open state signal as a scalar
+rr.send_columns(
+ "/jaw_open_state",
+ times=[rr.TimeSequenceColumn("frame_nr", df["frame_nr"])],
+ components=[
+ rr.components.ScalarBatch(df["jawOpenState"]),
+ ],
+)
+
+# log a `Label` component to the face bounding box entity
+target_entity = "/video/detector/faces/0/bbox"
+rr.log_components(target_entity, [rr.components.ShowLabels(True)], static=True)
+rr.send_columns(
+ target_entity,
+ times=[rr.TimeSequenceColumn("frame_nr", df["frame_nr"])],
+ components=[
+ rr.components.TextBatch(np.where(df["jawOpenState"], "OPEN", "CLOSE")),
+ ],
+)
diff --git a/docs/snippets/snippets.toml b/docs/snippets/snippets.toml
index 6271b774320e..968eec0a987a 100644
--- a/docs/snippets/snippets.toml
+++ b/docs/snippets/snippets.toml
@@ -81,6 +81,9 @@ views = [
"cpp", # Not implemented
"rust", # Not implemented
]
+"tutorials/data_out" = [
+ "py", # Requires context (an RRD file to be exported by the user)
+]
# These entries will run but their results won't be compared to the baseline.
#