pipeline.py main script docstring adapted; stage base class docstring…

… updated (no cake vs. pi distinction anymore, lacking effect of debug_mode, missing parameter error_method and profile, and minor) and cleaned up some references to removed objects in comments; minor fixes in stages/README.MD; expanded introductory text on pisa (stage) modes in pisa_examples/pisa_modes.ipynb due to importance (#780) Co-authored-by: Thomas Ehrhardt <[email protected]>
icecube · Jul 26, 2024 · 39abb5c · 39abb5c
1 parent 15eed85
commit 39abb5c
Show file tree

Hide file tree

Showing 4 changed files with 41 additions and 27 deletions.
diff --git a/pisa/core/pipeline.py b/pisa/core/pipeline.py
@@ -684,7 +684,12 @@ def parse_args():
 
 
 def main(return_outputs=False):
-    """Run unit tests if `pipeline.py` is called as a script."""
+    """Main; call as script with `return_outputs=False` or interactively with
+    `return_outputs=True`
+
+    FIXME: This is broken in various ways (easiest fix:
+    pipeline.get_outputs() has no idx parameter anymore)
+    """
     from pisa.utils.plotter import Plotter
 
     args = parse_args()

diff --git a/pisa/core/stage.py b/pisa/core/stage.py
@@ -28,7 +28,7 @@
 
 class Stage():
     """
-    PISA stage base class. Should be used to implement PISA Pi stages
+    PISA stage base class.
 
     Specialization should be done via subclasses.
 
@@ -45,17 +45,24 @@ class Stage():
     debug_mode : None, bool, or string
         If None, False, or empty string, the stage runs normally.
 
-        Otherwise, the stage runs in debug mode. This disables caching (forcing
-        recomputation of any nominal transforms, transforms, and outputs).
+        Otherwise, the stage runs in debug mode. This disables caching
+        (TODO: where or how?).
         Services that subclass from the `Stage` class can then implement
         further custom behavior when this mode is set by reading the value of
         the `self.debug_mode` attribute.
 
+    error_method : None or string (not enforced)
+        An option to define one or more dedicated error calculation methods
+        for the stage transforms or outputs
+
     calc_mode : pisa.core.binning.MultiDimBinning, str, or None
-        Specify in what to do the calculation
+        Specify the default data representation for `setup()` and `compute()`
 
     apply_mode : pisa.core.binning.MultiDimBinning, str, or None
-        Specify in what to do the application
+        Specify the default data representation for `apply()`
+
+    profile : bool
+        If True, perform timings for the setup, compute, and apply functions.
 
     """
 
@@ -77,7 +84,7 @@ def __init__(
         module_path = self.__module__.split(".")
 
         self.stage_name = module_path[-2]
-        """Name of the stage (e.g. flux, osc, aeff, reco, pid, etc."""
+        """Name of the stage (flux, osc, aeff, reco, pid, etc.)"""
 
         self.service_name = module_path[-1]
         """Name of the specific service implementing the stage."""
@@ -88,8 +95,6 @@ def __init__(
 
         self._source_code_hash = None
 
-        """Last-computed outputs; None if no outputs have been computed yet."""
-
         self._attrs_to_hash = set([])
         """Attributes of the stage that are to be included in its hash value"""
 
@@ -250,15 +255,14 @@ def hash(self):
     def __hash__(self):
         return self.hash
 
+
     def include_attrs_for_hashes(self, attrs):
-        """Include a class attribute or attributes to be included when
-        computing hashes (for all that apply: nominal transforms, transforms,
-        and/or outputs).
+        """Include a class attribute or attributes in the hash
+        computation.
 
         This is a convenience that allows some customization of hashing (and
         hence caching) behavior without having to override the hash-computation
-        methods (`_derive_nominal_transforms_hash`, `_derive_transforms_hash`,
-        and `_derive_outputs_hash`).
+        method.
 
         Parameters
         ----------

diff --git a/pisa/stages/README.md b/pisa/stages/README.md
@@ -40,7 +40,7 @@ class mystage(Stage):
     self.foo = something_else
 ```
 
-The constructor arguments are passed in via the satage config file, which in this case would need to look something like:
+The constructor arguments are passed in via the stage config file, which in this case would need to look something like:
 
  ```ini
  [stage_dir.mystage]
@@ -54,7 +54,7 @@ params.a = 13.
 params.b = 27.3 +/- 3.2
 ```
 
-The `std_kwargs` can only contain `data, params, debug_mode, error_mode, calc_mode, apply_mode, profile`, of which `data` and `params` will be autmoatically populated.
+The `std_kwargs` can only contain `data, params, expected_params, debug_mode, error_mode, calc_mode, apply_mode, profile`, of which `data` and `params` will be automatically populated.
 
 
 ### Methods
@@ -80,14 +80,17 @@ def apply_function(self):
 
 ## Directory Listing
 
+* `absorption/` - A stage for neutrino flux absorption in the Earth.
 * `aeff/` - All stages relating to effective area transforms.
-* `combine/` - A stage for combining maps together and applying appropriate scaling factors. 
+* `background/` - A stage for modifying some nominal (background) MC muon flux due to systematics.
 * `data/` - All stages relating to the handling of data.
 * `discr_sys/` - All stages relating to the handling of discrete systematics.
 * `flux/` - All stages relating to the atmospheric neutrino flux.
+* `likelihood/` - A stage that pre-computes some quantities needed for the "generalized likelihood"
 * `osc/` - All stages relating to neutrino oscillations. 
 * `pid/` - All stages relating to particle identification.
 * `reco/` - All stages relating to applying reconstruction kernels.
-* `unfold/` - All stages relating to the unfolding of parameters from data.
+* `utils/` - All "utility" stages (not representing physics effects).
 * `xsec/` - All stages relating to cross sections.
+* `GLOBALS.md` - File that describes globally available variables within PISA that needs a significant overhaul (TODO).
 * `__init__.py` - File that makes the `stages` directory behave as a Python module.
diff --git a/pisa_examples/pisa_modes.ipynb b/pisa_examples/pisa_modes.ipynb
@@ -4,13 +4,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# PISA modes\n",
+    "# PISA stage modes\n",
     "\n",
-    "Stages in PSIA usually have a `calc_mode` and an `apply_mode`, which specify in what representation of the data operations are performed.\n",
-    "Often calculations can be faster wehn peformed on grids, but we need to be careful in order to ensure that we are not introducing large errors.\n",
-    "Also note that you can change the modes on runtime, but after doing so need to `setup()` the stage or pipleine again.\n",
+    "Every PISA [stage](https://github.com/icecube/pisa/blob/master/pisa/core/stage.py) of a [pipeline](https://github.com/icecube/pisa/blob/master/pisa/core/pipeline.py) has a `calc_mode` and an `apply_mode`. Both instance attributes specify the \"representation\" in which generic [data](https://github.com/icecube/pisa/blob/master/pisa/core/container.py) (e.g., neutrino MC events) is processed through the pipeline. Often calculations can be faster when performed on grids, but we need to be careful in order to ensure that we are not introducing large errors.\n",
     "\n",
-    "If the output of a stage is different than whjat, for example, the next stage needs those to have as inputs, they are automatically translated by PISA. So you can mix and match, but be aware that translations will introduce computational cost and hence may slow things down."
+    "More specifically, `calc_mode` by default defines the representation during the `setup()`and `compute()` steps, and `apply_mode` that during the `apply` step, see the [stages readme](https://github.com/icecube/pisa/blob/master/pisa/stages/README.md). The latter two steps are executed successively whenever a given stage instance is `run()`, during the pipeline output calculation. Like this, complex event-by-event calculations (e.g., oscillation probabilities) can for example be executed during the `compute()` step, which also has a basic caching mechanism to avoid redundant calculations. The `apply()` step typically performs simple transformations (using results of a preceding `compute()` step or not) of the data in the representation determined by `apply_mode`. Take a look at different stage implementations (\"services\") and example pipeline configuration files to get a better feel for the concept.\n",
+    "\n",
+    "Note that you can change the modes on runtime, but after doing so need to `setup()` the stage or pipeline again (exercise: can you find a service which only defines its `setup()` step, but neither `compute()` nor `apply()`?).\n",
+    "\n",
+    "If the output representation of a stage is different than what, for example, the next stage needs to have as input, the output is automatically translated by PISA (translation between data representations). So you can mix and match, but be aware that translations will introduce computational cost and hence may slow things down."
    ]
   },
   {
@@ -36,10 +38,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will configure our neutrino pipleine in 3 different ways:\n",
+    "We will configure our neutrino pipeline in 3 different ways:\n",
     "* The standard form with *some* calculation on grids\n",
-    "* All calculations on an eent-by-event basis (the most correct, but by far slowest way)\n",
-    "* All calcultions on grids (Usually faster for large smples)"
+    "* All calculations on an event-by-event basis (the most correct, but by far slowest way)\n",
+    "* All calculations on grids (usually faster for large event samples)"
    ]
   },
   {
@@ -326,7 +328,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see that in this confioguration we probanbly have fine enough grids, such that differences are at the sub-percent level. This may or may not be acceptable for the specific analysis you want to do."
+    "We can see that in this configuration we probably have fine enough grids, such that differences are at the sub-percent level. This may or may not be acceptable for the specific analysis you want to do."
    ]
   },
   {