Skip to content

Commit

Permalink
Merge pull request #65 from lincc-frameworks/add_quickstart
Browse files Browse the repository at this point in the history
init quickstart
  • Loading branch information
dougbrn authored May 8, 2024
2 parents 3d65f46 + 01c9419 commit 5a74ad5
Show file tree
Hide file tree
Showing 2 changed files with 235 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/gettingstarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ we encourage you to open an issue on the
:maxdepth: 1

Installing nested-pandas <gettingstarted/installation>
Contribution Guide <gettingstarted/contributing>
Contribution Guide <gettingstarted/contributing>
Quickstart Guide <gettingstarted/quickstart>
233 changes: 233 additions & 0 deletions docs/gettingstarted/quickstart.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quickstart"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With a valid Python environment, nested-pandas and it's dependencies are easy to install using the `pip` package manager. The following command can be used to install it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# % pip install nested-pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nested-Pandas is tailored towards efficient analysis of nested datasets. Let's load a toy dataset to show how it works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nested_pandas.datasets import generate_data\n",
"\n",
"# generate_data creates some toy data\n",
"nf = generate_data(10, 100) # 10 rows, 100 nested rows per row\n",
"nf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above dataframe is a `NestedFrame`, which extends the capabilities of the Pandas `DataFrame` to support columns with nested information. In this example, we have the top level dataframe with 10 rows and 2 typical columns, \"a\" and \"b\". The \"nested\" column contains a dataframe in each row. We can inspect the contents of the \"nested\" column using pandas API tooling like `loc`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nf.loc[0][\"nested\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see that within the \"nested\" column there are `NestedFrame` objects with their own data. In this case we have 3 columns (\"t\", \"flux\", and \"band\"). Alternatively, we could inspect the available columns using some custom properties of the `NestedFrame`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Shows which columns have nested data\n",
"nf.nested_columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Provides a dictionary of \"base\" (top-level) and nested column labels\n",
"nf.all_columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"nested-pandas extends the Pandas API, meaning any operation you could do in Pandas is available within nested-pandas. However, nested-pandas has additional functionality and tooling to better support working with Nested datasets. For example, let's look at `query`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Normal queries work as expected, rejecting rows from the dataframe that don't meet the criteria\n",
"nf.query(\"a > 0.2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above query is native Pandas, however with nested-pandas we can use hierarchical column names to extend `query` to nested layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Applies the query to \"nested\", filtering based on \"t >17\"\n",
"nf_g = nf.query(\"nested.t > 17.0\")\n",
"nf_g"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This query does not affect the rows of the top-level dataframe, but rather applies the query to the \"nested\" dataframes. If we look at one of them, we can see the effect of the query."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# All t <= 17.0 have been removed\n",
"nf_g.loc[0][\"nested\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A limited set of functions have been extended in this way so far, with the aim being to fully support this hierarchical access where applicable in the Pandas API."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to Pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# use hierarchical column names to access the flux column\n",
"# passed as an array to np.mean\n",
"nf.reduce(np.mean, \"nested.flux\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This can be used to apply any custom functions you need for your analysis, and just to illustrate that point further let's define a custom function that just returns it's inputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def show_inputs(*args):\n",
" return args"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Applying some inputs via reduce, we see how it sends inputs to a given function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nf_inputs = nf.reduce(show_inputs, \"a\", \"nested.band\")\n",
"nf_inputs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nf_inputs.loc[0]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 5a74ad5

Please sign in to comment.