-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #65 from lincc-frameworks/add_quickstart
init quickstart
- Loading branch information
Showing
2 changed files
with
235 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Quickstart" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"With a valid Python environment, nested-pandas and it's dependencies are easy to install using the `pip` package manager. The following command can be used to install it:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# % pip install nested-pandas" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Nested-Pandas is tailored towards efficient analysis of nested datasets. Let's load a toy dataset to show how it works." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from nested_pandas.datasets import generate_data\n", | ||
"\n", | ||
"# generate_data creates some toy data\n", | ||
"nf = generate_data(10, 100) # 10 rows, 100 nested rows per row\n", | ||
"nf" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The above dataframe is a `NestedFrame`, which extends the capabilities of the Pandas `DataFrame` to support columns with nested information. In this example, we have the top level dataframe with 10 rows and 2 typical columns, \"a\" and \"b\". The \"nested\" column contains a dataframe in each row. We can inspect the contents of the \"nested\" column using pandas API tooling like `loc`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"nf.loc[0][\"nested\"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here we see that within the \"nested\" column there are `NestedFrame` objects with their own data. In this case we have 3 columns (\"t\", \"flux\", and \"band\"). Alternatively, we could inspect the available columns using some custom properties of the `NestedFrame`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Shows which columns have nested data\n", | ||
"nf.nested_columns" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Provides a dictionary of \"base\" (top-level) and nested column labels\n", | ||
"nf.all_columns" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"nested-pandas extends the Pandas API, meaning any operation you could do in Pandas is available within nested-pandas. However, nested-pandas has additional functionality and tooling to better support working with Nested datasets. For example, let's look at `query`:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Normal queries work as expected, rejecting rows from the dataframe that don't meet the criteria\n", | ||
"nf.query(\"a > 0.2\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The above query is native Pandas, however with nested-pandas we can use hierarchical column names to extend `query` to nested layers." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Applies the query to \"nested\", filtering based on \"t >17\"\n", | ||
"nf_g = nf.query(\"nested.t > 17.0\")\n", | ||
"nf_g" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This query does not affect the rows of the top-level dataframe, but rather applies the query to the \"nested\" dataframes. If we look at one of them, we can see the effect of the query." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# All t <= 17.0 have been removed\n", | ||
"nf_g.loc[0][\"nested\"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A limited set of functions have been extended in this way so far, with the aim being to fully support this hierarchical access where applicable in the Pandas API." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to Pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"\n", | ||
"# use hierarchical column names to access the flux column\n", | ||
"# passed as an array to np.mean\n", | ||
"nf.reduce(np.mean, \"nested.flux\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This can be used to apply any custom functions you need for your analysis, and just to illustrate that point further let's define a custom function that just returns it's inputs." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def show_inputs(*args):\n", | ||
" return args" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Applying some inputs via reduce, we see how it sends inputs to a given function." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"nf_inputs = nf.reduce(show_inputs, \"a\", \"nested.band\")\n", | ||
"nf_inputs" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"nf_inputs.loc[0]" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.11" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |