We're excited to bring you a new release, with new engines, CuPy support and other improvements.
An alternative CUDA implementation based on cupy
has been implemented, providing the same feature as the PyCuda
based
engine.
It can be imported using
import ptypy
ptypy.load_gpu_engines('cupy')
which will load engines such as DM_cupy
, RAAR_cupy
, ML_cupy
, EPIE_cupy
and SDR_cupy
.
- New WASP algorithm including GPU acceleration, available as custom engines by importing the module from
ptypy.custom
(thanks to Timothy Poon) - Experimental implementation of the ThreePIE algorithm (multislice) which is available as custom engine by importing the module
ptypy.custom.threepIE
and using the engine asThreePIE
(thanks to Yiran Lu and Maik Kahnt) - We provide templates for both algorithms, we are working on additional documentation
- Added Euclidean noise model to core ML engine (thanks to Jari Fowkes)
- New saving mode "used_params" that will save parameters used during reconstruction into the output .ptyr file
- Introducing core functions
copy_state
andrestore_data
which allow for more efficient parameter sweeps
Removed NCCL support from pycuda engines to avoid dependency on CuPy. The new CuPy engines have been implemented with NCCL support.
Patch release.
- Bug fix in Numpy FFT propagator - enforcing C-contiguous arrays
- You can now choose the CPU FFT type with parameter
p.scans.<scan_00>.ffttype={'scipy','numpy','fftw'}
This release is focused on improving the usability of PtyPy in Jupyter notebooks in preparation for the PtyPy workshop held at the Diamond Light Source in January 2023. The workshop features extensive interactive tutorials delivered using Jupyter notebooks.
We added the following features
- convenience functions to read parameters from JSON/YAML files
(
ptypy.utils.param_from_json
andptypy.utils.param_from_yaml
) - plotting utilites
ptypy.utils.plot_client.figure_from_ptycho
andptypy.utils.plot_client.figure_from_ptyr
that can be useful in Jupyter notebooks - non-threaded interactive plotting for Jupyter notebooks using
p.io.autoplot.threaded=False
Public release of ptypy! After having to use a private repository for many years, we finally have a license agreement that allows PtyPy to go public.
In accordance with PEP 621, we are moving away
from setup.py
for the main build of PtyPy and
adopt the new community standard for building
packages with pyproject.toml
.
We're excited to bring you a new release, with new engines, GPU accelerations and many smaller improvements.
- generalised projectional engine with derived engines DM, RAAR
- generalised stochastic engine with derived engines EPIE, SDR
Engines that are based on global projections now all derive from a generalized base engine that is able to express most common projection algorithms with 4 scalar parameters. DM and RAAR are two such derived classes. Similarly, algorithms based on a stochastic sequence of local projections (SDR, EPIE) now inherit from a common base engine.
- GPU-acceleration for all major engines DM, ML, EPIE, SDR, RAAR
- accelerated engines needs to be imported explicitly with
import ptypy ptypy.load_gpu_engines('cuda')
We accelerated three engines (projectional, stochastic and ML) using
the PyCUDA
and
Reikna
library and a whole
collection of custom kernels.
All GPU engines leverage a "streaming" model which means that the
primary locations of all objects are on the host (CPU) memory.
Diffraction data arrays and all other arrys that scale linearly with
the number of shifts/positons are segmented into blocks (of frames).
The idea is that these blocks are moved on and off the device (GPU) during
engine iteration if the GPU does not have enough memory to store all
blocks. The number of frames per block can
be adjusted with the new top-level
frames_per_block
parameter. This parameter has little influence for smaller problem size,
but needs to be adjusted if your GPU has too little memory to fit even
a single block.
Each engine iteration will cycle through all blocks, DM needs to even cycle once for each projection. We therefore recommend to make the block size small enough such that at least a couple of blocks fit on the GPU to hide the latency of data transfers. For best performance, we employ a mirror scheme such that each cycle reverses the block order and reduces the host to device copies (and vice versa) to the absolute minimum.
GPU engines work in parallel when each MPI rank takes one GPU. For sending data between ranks, PtyPy will perform a host copy first in most cases or use whatever the underlying MPI implementations does for CUDA-aware MPI (only tested for OpenMPI). Unfortunately, this mapping of one rank per GPU will leave CPU cores idle if there are more cores on the system than GPUs.
Within a node, PtyPy can use nccl (requires a CuPy install
and setting PTYPY_USE_NCCL=1
) for passing data between ranks/GPUs.
Most derived PtyScan classes (all those in the /experiment
folder) now need
to be imported explicitly. We took this step to separate the user space
more clearly from the base package and to avoid dependency creep from
user-introduced code. At the beginning of your script, you now
need to import your module explicitly or use one of the helper
functions.
import ptypy
ptypy.load_ptyscan_module(module='')
ptypy.load_all_ptyscan_modules()
Any PtyScan derived class in these modules that is decorated
with the ptypy.experiment.register()
function will now be included
in the parameter tree and selectable by name.
If you prefer the old way of importing ptypy "fully loaded", just use
import ptypy
ptypy.load_all()
which attempts to load all optional PtyScan classes and all engines.
- Code for
utils.parallel.bcast_dict
andgather_dict
has been simplified and should be backwards compatible. - The
fourier_power_bound
that was previously calculated internally from thefourier_relax_factor
can now be set explicitly and we recommend that from now on. The recommended value for thefourier_power_bound
is 0.25 for Poisson statistics (see supplementary ofthis paper
) - Position correction now supports an alternate search scheme, i.e. along a fixed grid. This scheme is more accurate than a stochastic search and the overhead incurred for this brute force search is acceptable for GPU engines.
- We switched to a pip install within a conda environment as the main supported way of installation
- Automatic adjustment of the block sizes.
- Improve scaling behavior across multiple nodes and high frame counts.
- Better support for live processing (on a continuous detector data stream).
- More tests.
- Branch cleaning.
Thanks to the efforts at the Diamond Light Source that made this update possible.
- Aaron Parsons
- Bjoern Enders
- Benedikt Daurer
- Joerg Lotze
After quite some work we announce ptypy 0.4. Apart from including all the fixes and improvements from 0.3.0 to 0.3.1, it includes two bigger changes
- Ptypy has now been converted to python 3 and will be python 3 only in future. The python 2 version will not be actively maintained anymore, we keep a branch for it for a while but we don't expect to put in many fixes and certainly not any new features. Team work by Julio, Alex, Bjoern and Aaron. Please note: all branches that haven’t been converted to python 3 by the end of 2019 will most likely be removed during 2020. Please rebase your effort on version 0.4. If you need help rebasing your efforts, please let us know soon.
- Position correction is now supported in most engines. It has been implemented by Wilhelm Eschen following the annealing approach introduced by A.M. Maiden et al. (Ultramicroscopy, Volume 120, 2012, Pages 64-72). Bjoern, Benedikt and Aaron helped refine and test it.
The next release will focus on scalability for HPC applications and GPU acceleration.
We are happy to announce that ptypy 0.3 is now out. If you have been using the ptypy 0.2 (from the master branch), the transition should be smooth but far from automatic - see below. The essence of this new release is
-
a redesign of ptypy's internal structure, especially the introduction of an extendable
ScanModel
, which should make new ideas and new algorithms easier to implement (a big collective effort involving A. Björling, A. Parsons, B. Enders and P. Thibault), -
support for 3D Bragg ptychography, which uses the new
ScanModel
structure (all thanks to A. Björling), -
extensive testing of most components of the code, and Travis CI integration (huge work by A. Parsons and important contributions by S. Chalkidis),
-
more dimensions for
Storage
classes, reduced memory footprint and reduced object count, asViews
are now slotted and don't hold other objects (B. Enders and A. Björling), and -
the introduction of the
descriptor
submodule, which manages the whole parameter tree, including validation, defaults, and documentation (collective effort led by B. Enders and P. Thibault)
The streamlining of the input parameters means that all reconstruction scripts for version 0.2 will now fail. We had no choice.
The changes were required in order to solve the following problems:
- Parameter definitions, documentations and defaults were in different locations, so hard to track and maintain
- The meaning of a branch set to
None
was ambiguous. - Basic experiment geometry (some distances, radiation energy, etc) could be specified at two different locations.
- In general, the standards were not clear.
The solution to all these problems came with the descriptor
submodule. For a user, what matters most is that ptypy.defaults_tree
now contains the description of the full set of parameters known to ptypy.
Here's a short example of how defaults_tree
is used internally, and how you can used it in your scripts or on the command line to inspect ptypy
's parameter structure.
import ptypy
# Extract one branch
desc_DM_simple = ptypy.defaults_tree['engine.DM_simple']
# Print out the description of all sub-parameters
print desc_DM_simple.to_string()
[numiter]
lowlim = 1
help = Total number of iterations
default = 123
type = int
[numiter_contiguous]
lowlim = 1
help = Number of iterations without interruption
default = 1
doc = The engine will not return control to the caller until this number of iterations is completed (not processing server requests, I/O operations, ...).
type = int
[probe_support]
lowlim = 0.0
help = Valid probe area as fraction of the probe frame
default = 0.7
doc = Defines a circular area centered on the probe frame, in which the probe is allowed to be nonzero.
type = float
[name]
help =
default = DM_simple
doc =
type = str
[alpha]
lowlim = 0.0
help = Difference map parameter
default = 1
type = float
[overlap_converge_factor]
lowlim = 0.0
help = Threshold for interruption of the inner overlap loop
default = 0.05
doc = The inner overlap loop refines the probe and the object simultaneously. This loop is escaped as soon as the overall change in probe, relative to the first iteration, is less than this value.
type = float
[overlap_max_iterations]
lowlim = 1
help = Maximum of iterations for the overlap constraint inner loop
default = 10
type = int
# Generate defaults
p = desc_DM_simple.make_default(depth=1)
# Validate
# (try with ptypy.utils.verbose.set_level(5) to get detailed DEBUG output)
desc_DM_simple.validate(p)
# Here's what happens if a parameter is wrong:
p.numiter = 'a'
desc_DM_simple.validate(p)
ERROR root - numiter type INVALID
(...)
-
All
scans
sub-entry have aname
. This name is one of theScanModel
classes, for now onlyVanilla
,Full
, andBragg3dModel
. Most users will want to useFull
. Others will come as we implement engines that require fundamental changes in thepod
creation. -
Data preparation: the sub-tree
recipe
does not exist anymore, and all parameters associated to aPtyScan
subclass are specified directly in thescan.???.data
sub-tree. Thegeometry
sub-tree is also gone, with all relevant parameters also in thescan.???.data
sub-tree. -
There is no more an
.engine
sub-tree. This used to be present to change the default parameters of specific engines (or all of them usingengine.common
) before engines are declared in theengines.*
sub-tree. We have found that this duplication is a source of confusion. Now the single place where engine parameters are set are within theengines.*
sub-tree. -
A sub-tree cannot be set to
None
. To deactivate a feature associated to a sub-tree, one has to setactive=False
. For instanceio.autoplot = None
is not valid anymore, andio.autoplot.active = False
has to be used instead.
- Option to use pyfftw (thanks to L. Bloch, ESRF)
- Scalability tests (thanks to C. Kewish, Australian Synchrotron)
- A first draft jupyter-based plot client (B. Daurer, now National University of Singapore)
- Bug fixes and tests (many people)
The next release will focus on optimisation and speed. We will also soon switch to python 3.