Skip to content

Commit

Permalink
Merge branch 'main' into single_lc
Browse files Browse the repository at this point in the history
  • Loading branch information
wilsonbb committed Mar 29, 2024
2 parents c9b477a + 3321b58 commit b2a11e9
Show file tree
Hide file tree
Showing 11 changed files with 122 additions and 86 deletions.
2 changes: 1 addition & 1 deletion docs/examples/rrlyr-period.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"outputs": [],
"source": [
"# Load SDSS Stripe 82 RR Lyrae catalog\n",
"ens = Ensemble(client=False).from_dataset(\"s82_rrlyrae\")"
"ens = Ensemble(client=False).from_dataset(\"s82_rrlyrae\", sorted=True)"
]
},
{
Expand Down
26 changes: 18 additions & 8 deletions docs/gettingstarted/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The latest release of TAPE is installable via pip, using the following command:\n",
"\n",
"```\n",
"pip install lf-tape\n",
"```\n",
"\n",
"The latest release of TAPE is installable via pip, using the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install lf-tape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more detailed installation instructions, see the [Installation Guide](installation.html)."
]
},
Expand All @@ -38,7 +48,7 @@
"from tape import Ensemble\n",
"\n",
"ens = Ensemble() # Initialize a TAPE Ensemble\n",
"ens.from_dataset(\"s82_qso\")"
"ens.from_dataset(\"s82_qso\", sorted=True)"
]
},
{
Expand Down Expand Up @@ -200,7 +210,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.14"
},
"vscode": {
"interpreter": {
Expand Down
50 changes: 8 additions & 42 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,59 +11,25 @@ TAPE offers a complete ecosystem for loading, filtering, and analyzing
timeseries data. TAPE is built to enable users to run provided and user-defined
analysis functions at scale in a parallelized and/or distributed manner.

Over the survey lifetime of the [LSST](https://www.lsst.org/about), on order
~billionsof objects will have multiband lightcurves available, and TAPE has
Over the survey lifetime of the `LSST <https://www.lsst.org/about>`_, on order
of ~billions of objects will have multiband lightcurves available, and TAPE has
been built as a framework with the goal of making analysis of LSST-scale
data accessible.

TAPE is built on top of `Dask <https://www.dask.org/>`_, and leverages
its "lazy evaluation" to only load data and run computations when needed.

Start with the Getting Started section to learn the basics of installation and
How to Use This Guide
==============================================

Begin with the `Getting Started <https://tape.readthedocs.io/en/latest/gettingstarted.html>`_ guide to learn the basics of installation and
walk through a simple example of using TAPE.

The Tutorials section showcases the fundamental features of TAPE.
The `Tutorials <https://tape.readthedocs.io/en/latest/tutorials.html>`_ section showcases the fundamental features of TAPE.

API-level information about TAPE is viewable in the
API Reference section.



Dev Guide - Getting Started
---------------------------

Before installing any dependencies or writing code, it's a great idea to create a
virtual environment. LINCC-Frameworks engineers primarily use `conda` to manage virtual
environments. If you have conda installed locally, you can run the following to
create and activate a new environment.

.. code-block:: console
>> conda create env -n <env_name> python=3.11
>> conda activate <env_name>
Once you have created a new environment, you can install this project for local
development using the following commands:

.. code-block:: console
>> pip install -e .'[dev]'
>> pre-commit install
>> conda install pandoc
Notes:
`API Reference <https://tape.readthedocs.io/en/latest/autoapi/index.html>`_ section.

1) The single quotes around ``'[dev]'`` may not be required for your operating system.
2) ``pre-commit install`` will initialize pre-commit for this local repository, so
that a set of tests will be run prior to completing a local commit. For more
information, see the Python Project Template documentation on
`pre-commit <https://lincc-ppt.readthedocs.io/en/latest/practices/precommit.html>`_.
3) Installing ``pandoc`` allows you to verify that automatic rendering of Jupyter notebooks
into documentation for ReadTheDocs works as expected. For more information, see
the Python Project Template documentation on
`Sphinx and Python Notebooks <https://lincc-ppt.readthedocs.io/en/latest/practices/sphinx.html#python-notebooks>`_.


.. toctree::
Expand Down
1 change: 1 addition & 0 deletions docs/tutorials/batch_showcase.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
"ens.from_source_dict(\n",
" source_dict,\n",
" column_mapper=ColumnMapper(id_col=\"id\", time_col=\"mjd\", flux_col=\"flux\", err_col=\"err\", band_col=\"band\"),\n",
" sorted=True,\n",
")"
]
},
Expand Down
55 changes: 48 additions & 7 deletions docs/tutorials/common_data_operations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"\n",
"ens = Ensemble()\n",
"\n",
"ens.from_dataset(\"s82_rrlyrae\", sort=True)"
"ens.from_dataset(\"s82_rrlyrae\", sorted=True)"
]
},
{
Expand Down Expand Up @@ -141,7 +141,10 @@
"source": [
"### Access using a known ID\n",
"\n",
"If you'd like to access a particular lightcurve given an ID, you can use the `to_timeseries()` function. This allows you to supply a given object ID, and returns a `TimeSeries` object (see [working_with_the_timeseries](working_with_the_timeseries.ipynb))."
"If you'd like to access a particular lightcurve given an ID, you can use the `to_timeseries()` function. This allows you to supply a given object ID, and returns a `TimeSeries` object (see [working_with_the_timeseries](working_with_the_timeseries.ipynb)).\n",
"\n",
"> **_Note:_**\n",
"that this loads data from all available bands."
]
},
{
Expand Down Expand Up @@ -249,9 +252,9 @@
"metadata": {},
"outputs": [],
"source": [
"ens.calc_nobs(by_band=True)\n",
"ens.calc_nobs(by_band=True, temporary=False)\n",
"\n",
"ens.object[[\"nobs_u\", \"nobs_g\", \"nobs_r\", \"nobs_i\", \"nobs_z\", \"nobs_total\"]].head(5)"
"ens.object.head(5)[[\"nobs_u\", \"nobs_g\", \"nobs_r\", \"nobs_i\", \"nobs_z\", \"nobs_total\"]]"
]
},
{
Expand Down Expand Up @@ -464,8 +467,8 @@
"metadata": {},
"outputs": [],
"source": [
"ens.source.repartition(partition_size=\"100MB\").update_ensemble() # 100MBs is generally recommended\n",
"ens.source # In this case, we have a small set of data that easily fits into one partition"
"ens.source.repartition(partition_size=\"100MB\") # 100MBs is generally recommended\n",
"# In this case, we have a small set of data that easily fits into one partition"
]
},
{
Expand All @@ -492,6 +495,28 @@
"print(\"Number of post-sampled objects: \", len(subset_ens.object))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For reproducible results, you can also specify a random seed via the `random_state` parameter. By re-using the same seed in your `random_state`, you can ensure that a given `Ensemble` will always be sampled the same way."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"subset_ens = ens.sample(\n",
" frac=0.2, # select a ~fifth of the objects\n",
" random_state=53783594, # set a random seed for reproducibility\n",
")\n",
"\n",
"print(\"Number of pre-sampled objects: \", len(ens.object))\n",
"print(\"Number of post-sampled objects: \", len(subset_ens.object))"
]
},
{
"attachments": {},
"cell_type": "markdown",
Expand Down Expand Up @@ -523,6 +548,15 @@
"In some situations, you may find yourself running a given workflow many times. Due to the nature of lazy-computation, this will involve repeated execution of data I/O, pre-processing steps, initial analysis, etc. In these situations, it may be effective to instead save the ensemble state to disk after completion of these initial processing steps. To accomplish this, we can use the `Ensemble.save_ensemble()` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ens.object.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -551,6 +585,13 @@
"new_ens = Ensemble()\n",
"new_ens.from_ensemble(\"./ensemble\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -569,7 +610,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.14"
},
"vscode": {
"interpreter": {
Expand Down
12 changes: 3 additions & 9 deletions docs/tutorials/structure_function_showcase.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -158,13 +158,6 @@
"in a TAPE `ensemble`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -249,6 +242,7 @@
"ens.from_source_dict(\n",
" {\"id_ens\": id_ens, \"t_ens\": t_ens, \"y_ens\": y_ens, \"yerr_ens\": yerr_ens, \"filter_ens\": filter_ens},\n",
" column_mapper=manual_colmap,\n",
" sorted=True,\n",
")"
]
},
Expand Down Expand Up @@ -564,11 +558,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.9"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
"hash": "83afbb17b435d9bf8b0d0042367da76f26510da1c5781f0ff6e6c518eab621ec"
}
}
},
Expand Down
14 changes: 11 additions & 3 deletions docs/tutorials/tape_datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
")\n",
"\n",
"# Read in data from a parquet file that contains source (timeseries) data\n",
"ens.from_parquet(source_file=f\"{rel_path}/source/test_source.parquet\", column_mapper=col_map)\n",
"ens.from_parquet(source_file=f\"{rel_path}/source/test_source.parquet\", column_mapper=col_map, sorted=True)\n",
"\n",
"ens.source.head(5) # View the first 5 entries of the source table"
]
Expand Down Expand Up @@ -80,6 +80,7 @@
" source_file=f\"{rel_path}/source/test_source.parquet\",\n",
" object_file=f\"{rel_path}/object/test_object.parquet\",\n",
" column_mapper=col_map,\n",
" sorted=True,\n",
")\n",
"\n",
"ens.object.head(5) # View the first 5 entries of the object table"
Expand Down Expand Up @@ -147,7 +148,7 @@
"metadata": {},
"outputs": [],
"source": [
"ens.from_dataset(\"s82_rrlyrae\") # Let's grab the Stripe 82 RR Lyrae\n",
"ens.from_dataset(\"s82_rrlyrae\", sorted=True) # Let's grab the Stripe 82 RR Lyrae\n",
"\n",
"ens.object.head(5)"
]
Expand Down Expand Up @@ -209,10 +210,17 @@
"source": [
"colmap = ColumnMapper(id_col=\"id\", time_col=\"time\", flux_col=\"flux\", err_col=\"error\", band_col=\"band\")\n",
"ens = Ensemble()\n",
"ens.from_source_dict(source_dict, column_mapper=colmap)\n",
"ens.from_source_dict(source_dict, column_mapper=colmap, sorted=True)\n",
"\n",
"ens.info()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials/using_ray_with_the_ensemble.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@
"metadata": {},
"outputs": [],
"source": [
"ens.from_dataset(\"s82_qso\")\n",
"ens.from_dataset(\"s82_qso\", sorted=True)\n",
"ens.source = ens.source.repartition(npartitions=10)\n",
"ens.batch(\n",
" calc_sf2, use_map=False\n",
Expand Down Expand Up @@ -117,7 +117,7 @@
"%%time\n",
"\n",
"ens = Ensemble(client=False) # Do not use a client\n",
"ens.from_dataset(\"s82_qso\")\n",
"ens.from_dataset(\"s82_qso\", sorted=True)\n",
"ens.source = ens.source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False)"
]
Expand Down Expand Up @@ -151,7 +151,7 @@
"%%time\n",
"\n",
"ens = Ensemble()\n",
"ens.from_dataset(\"s82_qso\")\n",
"ens.from_dataset(\"s82_qso\", sorted=True)\n",
"ens.source = ens.source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False).compute()"
]
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/working_with_the_ensemble.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"source": [
"from tape.ensemble import Ensemble\n",
"\n",
"ens = Ensemble().from_dataset(\"s82_rrlyrae\", sort=True)"
"ens = Ensemble().from_dataset(\"s82_rrlyrae\", sorted=True)"
]
},
{
Expand Down Expand Up @@ -197,7 +197,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Storing and Accessing Result Frames\n",
"## Storing and Accessing Result Frames\n",
"The `Ensemble` provides a powerful batching interface, `Ensemble.batch()`, to perform analysis functions in parallel across your lightcurves.\n",
"\n",
"For the below example, we use the included suite of analysis functions to apply `tape.analysis.calc_stetson_J` on our dataset. (For more info on `Ensemble.batch()`, including providing your own custom functions, see the [Ensemble Batch Showcase](https://tape.readthedocs.io/en/latest/tutorials/batch_showcase.html#) )"
Expand Down Expand Up @@ -319,7 +319,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Keeping the Object and Source Tables in Sync\n",
"## Keeping the Object and Source Tables in Sync\n",
"\n",
"The `TAPE` `Ensemble` attempts to lazily \"sync\" the Object and Source tables such that:\n",
"\n",
Expand Down Expand Up @@ -397,7 +397,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
Expand Down
Loading

0 comments on commit b2a11e9

Please sign in to comment.