Skip to content

Commit

Permalink
Merge branch 'main' into new_head
Browse files Browse the repository at this point in the history
  • Loading branch information
wilsonbb committed Mar 28, 2024
2 parents 0370fd3 + 3321b58 commit 64a8710
Show file tree
Hide file tree
Showing 21 changed files with 485 additions and 305 deletions.
2 changes: 1 addition & 1 deletion docs/examples/rrlyr-period.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"outputs": [],
"source": [
"# Load SDSS Stripe 82 RR Lyrae catalog\n",
"ens = Ensemble(client=False).from_dataset(\"s82_rrlyrae\")"
"ens = Ensemble(client=False).from_dataset(\"s82_rrlyrae\", sorted=True)"
]
},
{
Expand Down
26 changes: 18 additions & 8 deletions docs/gettingstarted/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The latest release of TAPE is installable via pip, using the following command:\n",
"\n",
"```\n",
"pip install lf-tape\n",
"```\n",
"\n",
"The latest release of TAPE is installable via pip, using the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install lf-tape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more detailed installation instructions, see the [Installation Guide](installation.html)."
]
},
Expand All @@ -38,7 +48,7 @@
"from tape import Ensemble\n",
"\n",
"ens = Ensemble() # Initialize a TAPE Ensemble\n",
"ens.from_dataset(\"s82_qso\")"
"ens.from_dataset(\"s82_qso\", sorted=True)"
]
},
{
Expand Down Expand Up @@ -200,7 +210,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.14"
},
"vscode": {
"interpreter": {
Expand Down
50 changes: 8 additions & 42 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,59 +11,25 @@ TAPE offers a complete ecosystem for loading, filtering, and analyzing
timeseries data. TAPE is built to enable users to run provided and user-defined
analysis functions at scale in a parallelized and/or distributed manner.

Over the survey lifetime of the [LSST](https://www.lsst.org/about), on order
~billionsof objects will have multiband lightcurves available, and TAPE has
Over the survey lifetime of the `LSST <https://www.lsst.org/about>`_, on order
of ~billions of objects will have multiband lightcurves available, and TAPE has
been built as a framework with the goal of making analysis of LSST-scale
data accessible.

TAPE is built on top of `Dask <https://www.dask.org/>`_, and leverages
its "lazy evaluation" to only load data and run computations when needed.

Start with the Getting Started section to learn the basics of installation and
How to Use This Guide
==============================================

Begin with the `Getting Started <https://tape.readthedocs.io/en/latest/gettingstarted.html>`_ guide to learn the basics of installation and
walk through a simple example of using TAPE.

The Tutorials section showcases the fundamental features of TAPE.
The `Tutorials <https://tape.readthedocs.io/en/latest/tutorials.html>`_ section showcases the fundamental features of TAPE.

API-level information about TAPE is viewable in the
API Reference section.



Dev Guide - Getting Started
---------------------------

Before installing any dependencies or writing code, it's a great idea to create a
virtual environment. LINCC-Frameworks engineers primarily use `conda` to manage virtual
environments. If you have conda installed locally, you can run the following to
create and activate a new environment.

.. code-block:: console
>> conda create env -n <env_name> python=3.11
>> conda activate <env_name>
Once you have created a new environment, you can install this project for local
development using the following commands:

.. code-block:: console
>> pip install -e .'[dev]'
>> pre-commit install
>> conda install pandoc
Notes:
`API Reference <https://tape.readthedocs.io/en/latest/autoapi/index.html>`_ section.

1) The single quotes around ``'[dev]'`` may not be required for your operating system.
2) ``pre-commit install`` will initialize pre-commit for this local repository, so
that a set of tests will be run prior to completing a local commit. For more
information, see the Python Project Template documentation on
`pre-commit <https://lincc-ppt.readthedocs.io/en/latest/practices/precommit.html>`_.
3) Installing ``pandoc`` allows you to verify that automatic rendering of Jupyter notebooks
into documentation for ReadTheDocs works as expected. For more information, see
the Python Project Template documentation on
`Sphinx and Python Notebooks <https://lincc-ppt.readthedocs.io/en/latest/practices/sphinx.html#python-notebooks>`_.


.. toctree::
Expand Down
7 changes: 4 additions & 3 deletions docs/tutorials/batch_showcase.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
"ens.from_source_dict(\n",
" source_dict,\n",
" column_mapper=ColumnMapper(id_col=\"id\", time_col=\"mjd\", flux_col=\"flux\", err_col=\"err\", band_col=\"band\"),\n",
" sorted=True,\n",
")"
]
},
Expand Down Expand Up @@ -391,10 +392,10 @@
"metadata": {},
"outputs": [],
"source": [
"# Overwrite the _meta property\n",
"# Update the metadata\n",
"\n",
"res1_noindex = res1.reset_index()\n",
"res1_noindex._meta = real_meta_from_dataframe\n",
"res1_noindex = res1_noindex.map_partitions(TapeFrame, meta=real_meta_from_dataframe)\n",
"res1_noindex"
]
},
Expand Down Expand Up @@ -584,7 +585,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
Expand Down
36 changes: 29 additions & 7 deletions docs/tutorials/binning_slowly_changing_sources.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
" flux_col=\"psFlux\",\n",
" err_col=\"psFluxErr\",\n",
" band_col=\"filterName\",\n",
" sorted=True,\n",
")"
]
},
Expand Down Expand Up @@ -118,6 +119,19 @@
"metadata": {},
"outputs": [],
"source": [
"ens = Ensemble() # initialize an ensemble object\n",
"\n",
"# Read in data from a parquet file\n",
"ens.from_parquet(\n",
" \"../../tests/tape_tests/data/source/test_source.parquet\",\n",
" id_col=\"ps1_objid\",\n",
" time_col=\"midPointTai\",\n",
" flux_col=\"psFlux\",\n",
" err_col=\"psFluxErr\",\n",
" band_col=\"filterName\",\n",
" sorted=True,\n",
")\n",
"\n",
"ens.bin_sources(time_window=28.0, offset=0.0, custom_aggr={\"midPointTai\": \"min\"})\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
Expand Down Expand Up @@ -147,7 +161,7 @@
" \"band\": [\"g\", \"g\", \"g\", \"g\", \"g\", \"g\"],\n",
"}\n",
"cmap = ColumnMapper(id_col=\"id\", time_col=\"midPointTai\", flux_col=\"flux\", err_col=\"err\", band_col=\"band\")\n",
"ens.from_source_dict(rows, column_mapper=cmap)\n",
"ens.from_source_dict(rows, column_mapper=cmap, sorted=True)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
Expand Down Expand Up @@ -175,7 +189,7 @@
" \"band\": [\"g\", \"g\", \"g\", \"g\", \"g\", \"g\"],\n",
"}\n",
"cmap = ColumnMapper(id_col=\"id\", time_col=\"midPointTai\", flux_col=\"flux\", err_col=\"err\", band_col=\"band\")\n",
"ens.from_source_dict(rows, column_mapper=cmap)\n",
"ens.from_source_dict(rows, column_mapper=cmap, sorted=True)\n",
"ens.bin_sources(time_window=1.0, offset=0.0)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
Expand Down Expand Up @@ -205,7 +219,7 @@
" \"band\": [\"g\", \"g\", \"g\", \"g\", \"g\", \"g\"],\n",
"}\n",
"cmap = ColumnMapper(id_col=\"id\", time_col=\"midPointTai\", flux_col=\"flux\", err_col=\"err\", band_col=\"band\")\n",
"ens.from_source_dict(rows, column_mapper=cmap)\n",
"ens.from_source_dict(rows, column_mapper=cmap, sorted=True)\n",
"ens.bin_sources(time_window=1.0, offset=0.5)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
Expand Down Expand Up @@ -243,6 +257,7 @@
" flux_col=\"psFlux\",\n",
" err_col=\"psFluxErr\",\n",
" band_col=\"filterName\",\n",
" sorted=True,\n",
")\n",
"suggested_offset = ens.find_day_gap_offset()\n",
"print(f\"Suggested offset is {suggested_offset}\")\n",
Expand All @@ -255,19 +270,26 @@
" \"band\": [\"g\", \"g\", \"g\", \"g\", \"g\", \"g\"],\n",
"}\n",
"cmap = ColumnMapper(id_col=\"id\", time_col=\"midPointTai\", flux_col=\"flux\", err_col=\"err\", band_col=\"band\")\n",
"ens.from_source_dict(rows, column_mapper=cmap)\n",
"ens.from_source_dict(rows, column_mapper=cmap, sorted=True)\n",
"ens.bin_sources(time_window=1.0, offset=0.5)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "py310",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
Expand All @@ -281,11 +303,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.11"
},
"vscode": {
"interpreter": {
"hash": "08968836a6367873274ed1d5e98a07391f42fc3a62bd5aba54afbd7b11ba8673"
"hash": "83afbb17b435d9bf8b0d0042367da76f26510da1c5781f0ff6e6c518eab621ec"
}
}
},
Expand Down
55 changes: 48 additions & 7 deletions docs/tutorials/common_data_operations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"\n",
"ens = Ensemble()\n",
"\n",
"ens.from_dataset(\"s82_rrlyrae\", sort=True)"
"ens.from_dataset(\"s82_rrlyrae\", sorted=True)"
]
},
{
Expand Down Expand Up @@ -141,7 +141,10 @@
"source": [
"### Access using a known ID\n",
"\n",
"If you'd like to access a particular lightcurve given an ID, you can use the `to_timeseries()` function. This allows you to supply a given object ID, and returns a `TimeSeries` object (see [working_with_the_timeseries](working_with_the_timeseries.ipynb))."
"If you'd like to access a particular lightcurve given an ID, you can use the `to_timeseries()` function. This allows you to supply a given object ID, and returns a `TimeSeries` object (see [working_with_the_timeseries](working_with_the_timeseries.ipynb)).\n",
"\n",
"> **_Note:_**\n",
"that this loads data from all available bands."
]
},
{
Expand Down Expand Up @@ -249,9 +252,9 @@
"metadata": {},
"outputs": [],
"source": [
"ens.calc_nobs(by_band=True)\n",
"ens.calc_nobs(by_band=True, temporary=False)\n",
"\n",
"ens.object[[\"nobs_u\", \"nobs_g\", \"nobs_r\", \"nobs_i\", \"nobs_z\", \"nobs_total\"]].head(5)"
"ens.object.head(5)[[\"nobs_u\", \"nobs_g\", \"nobs_r\", \"nobs_i\", \"nobs_z\", \"nobs_total\"]]"
]
},
{
Expand Down Expand Up @@ -464,8 +467,8 @@
"metadata": {},
"outputs": [],
"source": [
"ens.source.repartition(partition_size=\"100MB\").update_ensemble() # 100MBs is generally recommended\n",
"ens.source # In this case, we have a small set of data that easily fits into one partition"
"ens.source.repartition(partition_size=\"100MB\") # 100MBs is generally recommended\n",
"# In this case, we have a small set of data that easily fits into one partition"
]
},
{
Expand All @@ -492,6 +495,28 @@
"print(\"Number of post-sampled objects: \", len(subset_ens.object))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For reproducible results, you can also specify a random seed via the `random_state` parameter. By re-using the same seed in your `random_state`, you can ensure that a given `Ensemble` will always be sampled the same way."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"subset_ens = ens.sample(\n",
" frac=0.2, # select a ~fifth of the objects\n",
" random_state=53783594, # set a random seed for reproducibility\n",
")\n",
"\n",
"print(\"Number of pre-sampled objects: \", len(ens.object))\n",
"print(\"Number of post-sampled objects: \", len(subset_ens.object))"
]
},
{
"attachments": {},
"cell_type": "markdown",
Expand Down Expand Up @@ -523,6 +548,15 @@
"In some situations, you may find yourself running a given workflow many times. Due to the nature of lazy-computation, this will involve repeated execution of data I/O, pre-processing steps, initial analysis, etc. In these situations, it may be effective to instead save the ensemble state to disk after completion of these initial processing steps. To accomplish this, we can use the `Ensemble.save_ensemble()` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ens.object.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -551,6 +585,13 @@
"new_ens = Ensemble()\n",
"new_ens.from_ensemble(\"./ensemble\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -569,7 +610,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.14"
},
"vscode": {
"interpreter": {
Expand Down
Loading

0 comments on commit 64a8710

Please sign in to comment.