Skip to content

Commit

Permalink
Add EnsembleFrame Support to Tape (#308)
Browse files Browse the repository at this point in the history
* A minimal Dask Dataframe subclass for the Ensemble

* Addressed comments, added test fixture.

* Make convert_flux_to_mag part of the EnsembleFrame

* Ensembles can now track a group of labeled frames

* Preserve EnsembleFrame metadata after assign()

* Parquet support for frame subclasses checkpoint

* Reverting changes to tests

* Adds test for objsor_from_parquet

* Addressed comments

* Removed adding column via apply

* Fix comment typo

* Fix EnsembleFrame.set_index

* Add update_ensemble() and Use EnsembleFrames (#252)

* Adds EnsembleFrame.update_ensemble()

* Use EnsembleFrames throughout the Ensemble

* Udpdate ensemble test

* Extends update_ensemble test cases

* Unpin sphinx to address docs build fail

* Fix minor test error

* Remove debug line

* Propagate EnsembleFrame._is_dirty (#264)

* EnsembleFrames should propagate is_dirty

* Test that a frame's dirty status propagates

* Update doc strings

* Address review comment

* Have update_frame mark frames as dirty (#267)

* Remove calls to set_dirty in ensemble (#269)

* Update refactor (#274)

* Add ensemble loader functions for dataframes

* Updated unit tests

* Lint fixes

* Always update column mapping

* Addressed review comments

* Ensure object frame is indexed

* adds a dask_on_ray tutorial

* add performance comp; add use_map comment

---------

Co-authored-by: Doug Branton <[email protected]>

* Merge main into tape_ensemble_refactor (#277)

* Add ensemble loader functions for dataframes

* Updated unit tests

* Lint fixes

* Always update column mapping

* Addressed review comments

* Ensure object frame is indexed

* adds a dask_on_ray tutorial

* add performance comp; add use_map comment

* sync with map_partitions

* sync with map_partitions

* sync with map_partitions

* sync with map_partitions

* coalesce with map_partitions

* use dataframes instead of series

* add descriptive comments

* implement suggestions

* Update TAPE README.md

Update the project description for TAPE to better reflect the current state and goals of the project.

* Set object table index for from_dask_dataframe

* add zero_point as float input
:q
q

* add ensemble default cols

* S82 RRLyr notebook

* Move rrlyr nb to examples

* Update requirements.txt to unpin sphinx

* Update pyproject.toml to unpin sphinx

* add calc_nobs

* add calc_nobs

* add calc_nobs

* reduce scope of sync_tables

* address divisions issue

* add temporary cols test

* improve coverage

* add temporary kwarg to assign

* add temporary kwarg to assign

* drop divisions

* drop brackets

* fix bug in sync

* Issue 199: Added static Ensemble read constructors to tape namespace (#256)

* Added static read constructors to tape namespace
* Removed @staticmethod as python 3.9 didn't like it
* Reformatted via black
* Changed read_dask_dataframe to call from_ method
* Collapsed create dask client args to single arg
* Fixed dask_client parameter
* reformatted via black
* Added missing unit test
* Resolved code review comments from PR 256

* Fixed failing unit test

Removed reference to Ensemble._nobs_band_cols field

* fix bug in sync

---------

Co-authored-by: Doug Branton <[email protected]>
Co-authored-by: Konstantin Malanchev <[email protected]>
Co-authored-by: Olivia R. Lynn <[email protected]>
Co-authored-by: Chris Wenneman <[email protected]>

* Fix EnsembleFrame.set_dirty and map_partitions metadata propagation (#280)

* FIx _Frame.set_dirty

* Update propgating data in map_partitions

* Fix typo

* Ensemble.update_frame no longer infers if a frame is dirty by checking if row count changed (#281)

* Mark frames dirty without len() call

* Move calls to set_dirty to EnsembleFrame

* Support storing batch results for custom meta (#285)

* Add meta handling for batch

* Add unit tests for custom meta

* Remove unit test sanity check, fix warning output

* Provide default labels for result frames.

* Update Remaining TAPE Documentation Notebooks for the Refactor (#298)

* Remove ._source and ._object

* Update notebooks for refactor

* Fix find-replace error

* Update Docs for TAPE EnsembleFrame Refactor (#290)

* Initial commit for notebooks with refactor API

* Removed _object and _source references

* Added sync tables example

* Address comment

* Addressed frame renaming

* Update docs/tutorials/working_with_the_ensemble.ipynb

Co-authored-by: Konstantin Malanchev <[email protected]>

* Addressed comments

* Clear output

---------

Co-authored-by: Konstantin Malanchev <[email protected]>

* Allow EnsembleFrame.compute to Trigger Object-Source Table Syncing (#295)

* Allow EnsembleFrame.compue to sync tables

* Fixed docstring

* Add Explicit Metadata Propagation for EnsembleFrame joins  (#301)

* Support propagating frame metadata in joins

* Update doc strings and test

* Update test

* Merge Main into Ensemble Refactor Branch (#304)

* check divisions, enable lazy syncs

* check divisions, enable lazy syncs

* initial tests

* add tests; calc_nobs preserve divisions

* batch with divisions

* cleanup

* fix sf2 tests

* add sync_tables check

* cleanup

* fix calc_nobs reset_index issue

* per table warnings; index comments

* add map_partitions mode for calc_nobs when divisions are known

* build metadata

* build metadata

* add multi partition test

* add version file to init

* add small test

* Fix table syncing to use inner joins. (#303)

* Fix table syncing to use inner joins.

* fix lint error

* Update test

---------

Co-authored-by: Doug Branton <[email protected]>

* Revert "Merge Main into Ensemble Refactor Branch (#304)"

This reverts commit 5c847e1.

* Fix linting

* Remove unsupported type annotations

* Fix merge error

* Use client=False in test_analysis

* Remove '_object' and '_source' fields

* Fix linting errors

* Address review comments, add tests

---------

Co-authored-by: Doug Branton <[email protected]>
Co-authored-by: Konstantin Malanchev <[email protected]>
Co-authored-by: Olivia R. Lynn <[email protected]>
Co-authored-by: Chris Wenneman <[email protected]>
  • Loading branch information
5 people authored Dec 8, 2023
1 parent 626904c commit 1e8abff
Show file tree
Hide file tree
Showing 13 changed files with 2,902 additions and 381 deletions.
16 changes: 8 additions & 8 deletions docs/tutorials/binning_slowly_changing_sources.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"outputs": [],
"source": [
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 500)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -90,7 +90,7 @@
"source": [
"ens.bin_sources(time_window=7.0, offset=0.0)\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 500)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -120,7 +120,7 @@
"source": [
"ens.bin_sources(time_window=28.0, offset=0.0, custom_aggr={\"midPointTai\": \"min\"})\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 500)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 500)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -150,7 +150,7 @@
"ens.from_source_dict(rows, column_mapper=cmap)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -179,7 +179,7 @@
"ens.bin_sources(time_window=1.0, offset=0.0)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -209,7 +209,7 @@
"ens.bin_sources(time_window=1.0, offset=0.5)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -259,7 +259,7 @@
"ens.bin_sources(time_window=1.0, offset=0.5)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.hist(ens._source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.hist(ens.source[\"midPointTai\"].compute().tolist(), 60)\n",
"ax.set_xlabel(\"Time (MJD)\")\n",
"ax.set_ylabel(\"Source Count\")"
]
Expand Down Expand Up @@ -290,7 +290,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.13"
},
"vscode": {
"interpreter": {
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/scaling_to_large_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@
"\n",
"print(\"number of lightcurve results in mapres: \", len(mapres))\n",
"print(\"number of lightcurve results in groupres: \", len(groupres))\n",
"print(\"True number of lightcurves in the dataset:\", len(np.unique(ens._source.index)))"
"print(\"True number of lightcurves in the dataset:\", len(np.unique(ens.source.index)))"
]
},
{
Expand Down Expand Up @@ -263,7 +263,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.13"
},
"vscode": {
"interpreter": {
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/structure_function_showcase.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@
"metadata": {},
"outputs": [],
"source": [
"ens.head(\"object\", 5) \n"
"ens.object.head(5) \n"
]
},
{
Expand All @@ -276,7 +276,7 @@
"metadata": {},
"outputs": [],
"source": [
"ens.head(\"source\", 5) "
"ens.source.head(5) "
]
},
{
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/tape_datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
" column_mapper=col_map\n",
" )\n",
"\n",
"ens.head(\"source\") # View the first 5 entries of the source table"
"ens.source.head(5) # View the first 5 entries of the source table"
]
},
{
Expand Down Expand Up @@ -93,7 +93,7 @@
" column_mapper=col_map\n",
" )\n",
"\n",
"ens.head(\"object\") # View the first 5 entries of the object table"
"ens.object.head(5) # View the first 5 entries of the object table"
]
},
{
Expand Down Expand Up @@ -168,7 +168,7 @@
"source": [
"ens.from_dataset(\"s82_rrlyrae\") # Let's grab the Stripe 82 RR Lyrae\n",
"\n",
"ens.head(\"object\", 5)"
"ens.object.head(5)"
]
},
{
Expand Down Expand Up @@ -270,7 +270,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.13"
},
"vscode": {
"interpreter": {
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/using_ray_with_the_ensemble.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
"outputs": [],
"source": [
"ens.from_dataset(\"s82_qso\")\n",
"ens._source = ens._source.repartition(npartitions=10)\n",
"ens.source = ens.source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False) # use_map is false as we repartition naively, splitting per-object sources across partitions"
]
},
Expand Down Expand Up @@ -116,7 +116,7 @@
"\n",
"ens=Ensemble(client=False) # Do not use a client\n",
"ens.from_dataset(\"s82_qso\")\n",
"ens._source = ens._source.repartition(npartitions=10)\n",
"ens.source = ens.source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False)"
]
},
Expand Down Expand Up @@ -150,7 +150,7 @@
"\n",
"ens = Ensemble()\n",
"ens.from_dataset(\"s82_qso\")\n",
"ens._source = ens._source.repartition(npartitions=10)\n",
"ens.source = ens.source.repartition(npartitions=10)\n",
"ens.batch(calc_sf2, use_map=False)"
]
}
Expand All @@ -171,7 +171,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.13"
},
"vscode": {
"interpreter": {
Expand Down
Loading

0 comments on commit 1e8abff

Please sign in to comment.