Add EnsembleFrame Support to Tape (#308)

* A minimal Dask Dataframe subclass for the Ensemble * Addressed comments, added test fixture. * Make convert_flux_to_mag part of the EnsembleFrame * Ensembles can now track a group of labeled frames * Preserve EnsembleFrame metadata after assign() * Parquet support for frame subclasses checkpoint * Reverting changes to tests * Adds test for objsor_from_parquet * Addressed comments * Removed adding column via apply * Fix comment typo * Fix EnsembleFrame.set_index * Add update_ensemble() and Use EnsembleFrames (#252) * Adds EnsembleFrame.update_ensemble() * Use EnsembleFrames throughout the Ensemble * Udpdate ensemble test * Extends update_ensemble test cases * Unpin sphinx to address docs build fail * Fix minor test error * Remove debug line * Propagate EnsembleFrame._is_dirty (#264) * EnsembleFrames should propagate is_dirty * Test that a frame's dirty status propagates * Update doc strings * Address review comment * Have update_frame mark frames as dirty (#267) * Remove calls to set_dirty in ensemble (#269) * Update refactor (#274) * Add ensemble loader functions for dataframes * Updated unit tests * Lint fixes * Always update column mapping * Addressed review comments * Ensure object frame is indexed * adds a dask_on_ray tutorial * add performance comp; add use_map comment --------- Co-authored-by: Doug Branton <[email protected]> * Merge main into tape_ensemble_refactor (#277) * Add ensemble loader functions for dataframes * Updated unit tests * Lint fixes * Always update column mapping * Addressed review comments * Ensure object frame is indexed * adds a dask_on_ray tutorial * add performance comp; add use_map comment * sync with map_partitions * sync with map_partitions * sync with map_partitions * sync with map_partitions * coalesce with map_partitions * use dataframes instead of series * add descriptive comments * implement suggestions * Update TAPE README.md Update the project description for TAPE to better reflect the current state and goals of the project. * Set object table index for from_dask_dataframe * add zero_point as float input :q q * add ensemble default cols * S82 RRLyr notebook * Move rrlyr nb to examples * Update requirements.txt to unpin sphinx * Update pyproject.toml to unpin sphinx * add calc_nobs * add calc_nobs * add calc_nobs * reduce scope of sync_tables * address divisions issue * add temporary cols test * improve coverage * add temporary kwarg to assign * add temporary kwarg to assign * drop divisions * drop brackets * fix bug in sync * Issue 199: Added static Ensemble read constructors to tape namespace (#256) * Added static read constructors to tape namespace * Removed @staticmethod as python 3.9 didn't like it * Reformatted via black * Changed read_dask_dataframe to call from_ method * Collapsed create dask client args to single arg * Fixed dask_client parameter * reformatted via black * Added missing unit test * Resolved code review comments from PR 256 * Fixed failing unit test Removed reference to Ensemble._nobs_band_cols field * fix bug in sync --------- Co-authored-by: Doug Branton <[email protected]> Co-authored-by: Konstantin Malanchev <[email protected]> Co-authored-by: Olivia R. Lynn <[email protected]> Co-authored-by: Chris Wenneman <[email protected]> * Fix EnsembleFrame.set_dirty and map_partitions metadata propagation (#280) * FIx _Frame.set_dirty * Update propgating data in map_partitions * Fix typo * Ensemble.update_frame no longer infers if a frame is dirty by checking if row count changed (#281) * Mark frames dirty without len() call * Move calls to set_dirty to EnsembleFrame * Support storing batch results for custom meta (#285) * Add meta handling for batch * Add unit tests for custom meta * Remove unit test sanity check, fix warning output * Provide default labels for result frames. * Update Remaining TAPE Documentation Notebooks for the Refactor (#298) * Remove ._source and ._object * Update notebooks for refactor * Fix find-replace error * Update Docs for TAPE EnsembleFrame Refactor (#290) * Initial commit for notebooks with refactor API * Removed _object and _source references * Added sync tables example * Address comment * Addressed frame renaming * Update docs/tutorials/working_with_the_ensemble.ipynb Co-authored-by: Konstantin Malanchev <[email protected]> * Addressed comments * Clear output --------- Co-authored-by: Konstantin Malanchev <[email protected]> * Allow EnsembleFrame.compute to Trigger Object-Source Table Syncing (#295) * Allow EnsembleFrame.compue to sync tables * Fixed docstring * Add Explicit Metadata Propagation for EnsembleFrame joins (#301) * Support propagating frame metadata in joins * Update doc strings and test * Update test * Merge Main into Ensemble Refactor Branch (#304) * check divisions, enable lazy syncs * check divisions, enable lazy syncs * initial tests * add tests; calc_nobs preserve divisions * batch with divisions * cleanup * fix sf2 tests * add sync_tables check * cleanup * fix calc_nobs reset_index issue * per table warnings; index comments * add map_partitions mode for calc_nobs when divisions are known * build metadata * build metadata * add multi partition test * add version file to init * add small test * Fix table syncing to use inner joins. (#303) * Fix table syncing to use inner joins. * fix lint error * Update test --------- Co-authored-by: Doug Branton <[email protected]> * Revert "Merge Main into Ensemble Refactor Branch (#304)" This reverts commit 5c847e1. * Fix linting * Remove unsupported type annotations * Fix merge error * Use client=False in test_analysis * Remove '_object' and '_source' fields * Fix linting errors * Address review comments, add tests --------- Co-authored-by: Doug Branton <[email protected]> Co-authored-by: Konstantin Malanchev <[email protected]> Co-authored-by: Olivia R. Lynn <[email protected]> Co-authored-by: Chris Wenneman <[email protected]>
lincc-frameworks · Dec 8, 2023 · 1e8abff · 1e8abff
1 parent 626904c
commit 1e8abff
Show file tree

Hide file tree

Showing 13 changed files with 2,902 additions and 381 deletions.
diff --git a/docs/tutorials/binning_slowly_changing_sources.ipynb b/docs/tutorials/binning_slowly_changing_sources.ipynb
@@ -60,7 +60,7 @@
    "outputs": [],
    "source": [
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 500)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -90,7 +90,7 @@
    "source": [
     "ens.bin_sources(time_window=7.0, offset=0.0)\n",
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 500)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -120,7 +120,7 @@
    "source": [
     "ens.bin_sources(time_window=28.0, offset=0.0, custom_aggr={\"midPointTai\": \"min\"})\n",
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 500)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -150,7 +150,7 @@
     "ens.from_source_dict(rows, column_mapper=cmap)\n",
     "\n",
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -179,7 +179,7 @@
     "ens.bin_sources(time_window=1.0, offset=0.0)\n",
     "\n",
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -209,7 +209,7 @@
     "ens.bin_sources(time_window=1.0, offset=0.5)\n",
     "\n",
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -259,7 +259,7 @@
     "ens.bin_sources(time_window=1.0, offset=0.5)\n",
     "\n",
     "fig, ax = plt.subplots(1, 1)\n",
-    "ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
+    "ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
     "ax.set_xlabel(\"Time (MJD)\")\n",
     "ax.set_ylabel(\"Source Count\")"
    ]
@@ -290,7 +290,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.13"
   },
   "vscode": {
    "interpreter": {

diff --git a/docs/tutorials/scaling_to_large_data.ipynb b/docs/tutorials/scaling_to_large_data.ipynb
@@ -216,7 +216,7 @@
     "\n",
     "print(\"number of lightcurve results in mapres: \", len(mapres))\n",
     "print(\"number of lightcurve results in groupres: \", len(groupres))\n",
-    "print(\"True number of lightcurves in the dataset:\", len(np.unique(ens._source.index)))"
+    "print(\"True number of lightcurves in the dataset:\", len(np.unique(ens.source.index)))"
    ]
   },
   {
@@ -263,7 +263,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.13"
   },
   "vscode": {
    "interpreter": {

diff --git a/docs/tutorials/structure_function_showcase.ipynb b/docs/tutorials/structure_function_showcase.ipynb
@@ -267,7 +267,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ens.head(\"object\", 5)  \n"
+    "ens.object.head(5)  \n"
    ]
   },
   {
@@ -276,7 +276,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ens.head(\"source\", 5)  "
+    "ens.source.head(5)  "
    ]
   },
   {

diff --git a/docs/tutorials/tape_datasets.ipynb b/docs/tutorials/tape_datasets.ipynb
@@ -52,7 +52,7 @@
     "                 column_mapper=col_map\n",
     "                )\n",
     "\n",
-    "ens.head(\"source\") # View the first 5 entries of the source table"
+    "ens.source.head(5) # View the first 5 entries of the source table"
    ]
   },
   {
@@ -93,7 +93,7 @@
     "                 column_mapper=col_map\n",
     "                )\n",
     "\n",
-    "ens.head(\"object\") # View the first 5 entries of the object table"
+    "ens.object.head(5) # View the first 5 entries of the object table"
    ]
   },
   {
@@ -168,7 +168,7 @@
    "source": [
     "ens.from_dataset(\"s82_rrlyrae\")  # Let's grab the Stripe 82 RR Lyrae\n",
     "\n",
-    "ens.head(\"object\", 5)"
+    "ens.object.head(5)"
    ]
   },
   {
@@ -270,7 +270,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.11"
+   "version": "3.10.13"
   },
   "vscode": {
    "interpreter": {

diff --git a/docs/tutorials/using_ray_with_the_ensemble.ipynb b/docs/tutorials/using_ray_with_the_ensemble.ipynb
@@ -81,7 +81,7 @@
    "outputs": [],
    "source": [
     "ens.from_dataset(\"s82_qso\")\n",
-    "ens._source = ens._source.repartition(npartitions=10)\n",
+    "ens.source = ens.source.repartition(npartitions=10)\n",
     "ens.batch(calc_sf2, use_map=False)  # use_map is false as we repartition naively, splitting per-object sources across partitions"
    ]
   },
@@ -116,7 +116,7 @@
     "\n",
     "ens=Ensemble(client=False) # Do not use a client\n",
     "ens.from_dataset(\"s82_qso\")\n",
-    "ens._source = ens._source.repartition(npartitions=10)\n",
+    "ens.source = ens.source.repartition(npartitions=10)\n",
     "ens.batch(calc_sf2, use_map=False)"
    ]
   },
@@ -150,7 +150,7 @@
     "\n",
     "ens = Ensemble()\n",
     "ens.from_dataset(\"s82_qso\")\n",
-    "ens._source = ens._source.repartition(npartitions=10)\n",
+    "ens.source = ens.source.repartition(npartitions=10)\n",
     "ens.batch(calc_sf2, use_map=False)"
    ]
   }
@@ -171,7 +171,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.11"
+   "version": "3.10.13"
   },
   "vscode": {
    "interpreter": {