Skip to content

Latest commit

 

History

History
228 lines (206 loc) · 12.5 KB

README_python.md

File metadata and controls

228 lines (206 loc) · 12.5 KB

Python support is now included.

Building the perfstubs library will now also build perfstubs.so, which is a Python C extension that uses the new low-overhead profiling support in Python 3.12+, see https://docs.python.org/3/library/sys.monitoring.html. There is a python module, pstubs.py in the perfstubs <install-prefix>/lib directory. Legacy support for Python versions 3.8-3.11 is also included, using the sys.setprofile() support.

You have the option of installing PerfStubs as a pip3 module, installing it standalone, or incorporating it into your regular build. Here are the instructions for each:

Pip3 module

git clone https://github.com/UO-OACISS/perfstubs.git
cd perfstubs
pip3 install .

Once PerfStubs is installed, you just use it as any other Python module:

python3 -m pstubs ./myprogram.py

To use it with a performance measurement tool, like TAU or APEX:

# Running with tau_exec
tau_exec -T serial python3 -m pstubs ./myprogram.py
# Running with tau_exec and mpirun/srun/mpiexec, etc.
mpirun -n 1024 tau_exec -T mpi python3 -m pstubs ./myprogram.py
# Running with apex_exec
apex_exec python3 -m pstubs ./myprogram.py
# Running with apex_exec using command line argument for Python
apex_exec --apex:python ./myprogram.py
# Running with apex_exec using command line argument for Python and mpirun
mpirun -n 1024 apex_exec --apex:python ./myprogram.py

Standalone build

git clone --branch python-3.12 https://github.com/UO-OACISS/perfstubs.git
cd perfstubs
cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=`pwd`/install
cmake --build build --parallel --target install
export PYTHONPATH=`pwd`/install/lib:$PYTHONPATH
export PYTHONPATH=`pwd`/install/lib64:$PYTHONPATH  # might be necessary, too - depending on your platform

To try it with a working TAU installation:

tau_exec -T pthread,serial python3 -m pstubs ./examples/firstprime_3.12.py
pprof -a

shoud give:

khuck@Kevins-MacBook-Air perfstubs % tau_exec -T pthread,serial python3 -m pstubs ./examples/firstprime_3.12.py
profiling:  ./examples/firstprime_3.12.py
The first prime number after 10000000 is 10000019
Success!
Done
khuck@Kevins-MacBook-Air perfstubs % pprof
Reading Profile files in profile.*

NODE 0;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call
---------------------------------------------------------------------------------------
100.0          172          177           1           1     177738 .TAU application
  2.9        0.812            5           1           3       5238 run
  2.5        0.061            4           1           1       4362 firstPrimeAfter
  2.4        0.812            4           1           3       4301 _find_and_load
  1.8        0.048            3           1           3       3184 _find_and_load_unlocked
  1.7        0.109            3           1           9       3100 _find_spec
  1.6        0.508            2          10          82        282 find_spec
  1.0        0.052            1           2           3        867 _get_spec
  0.9        0.139            1           9          16        180 _path_importer_cache
  0.7        0.038            1           1           3       1259 _load_unlocked
  0.7        0.034            1           1           2       1183 module_from_spec
  0.6        0.024            1           1           2       1059 create_module
  0.6            1            1           3           0        349 _call_with_frames_removed
  0.4        0.198        0.652          10           7         65 __init__
  0.2        0.064        0.395           5           3         79 __enter__
  0.2        0.024        0.365           1           1        365 _path_hooks
  0.1        0.038        0.266           1           2        266 path_hook_for_FileFinder
  0.1        0.107        0.232           1           4        232 acquire
  0.1        0.168        0.168          10           0         17 _path_stat
  0.1        0.147        0.147          32           0          5 _path_join
  0.1        0.138        0.138           1           0        138 _fill_cache
  0.1        0.134        0.134          34           0          4 _verbose_message
  0.1        0.103        0.134           2           3         67 _path_abspath
  0.1        0.054         0.09           1           3         90 _init_module_attrs
  0.0        0.031        0.065           1           2         65 setdefault
  0.0        0.042         0.06           1           2         60 spec_from_file_location
  0.0        0.019        0.053           2           2         26 _path_is_mode_type
  0.0        0.035        0.047           5           1          9 __exit__
  0.0         0.02        0.045           1           1         45 _path_isdir
  0.0         0.01        0.038           1           1         38 _path_isfile
  0.0        0.019        0.035           1           2         35 exec_module
  0.0        0.034        0.034           7           0          5 _relax_case
  0.0        0.026        0.034           1           1         34 _get_module_lock
  0.0        0.031        0.031           1           0         31 remove
  0.0        0.016        0.025           1           1         25 cached
  0.0        0.017        0.017           2           0          8 _path_isabs
  0.0        0.015        0.015           1           0         15 decode
  0.0        0.015        0.015           3           0          5 <genexpr>
  0.0        0.012        0.012           1           0         12 release
  0.0        0.009        0.009           1           0          9 _get_cached
  0.0        0.008        0.008           1           0          8 cb
  0.0        0.008        0.008           1           0          8 __new__
  0.0        0.006        0.006           1           0          6 parent
  0.0        0.005        0.005           1           0          5 has_location

To try it with a working APEX installation:

apex_exec --apex:csv python3 -m pstubs ./examples/firstprime_3.12.py
apex-summary.py

should give:

khuck@Kevins-MacBook-Air perfstubs % apex_exec python3 -m pstubs ./examples/firstprime_3.12.py
Enabling memory tracking!
          (            )
   (      )\ )      ( /(
   )\    (()/( (    )\())
((((_)(   /(_)))\  ((_)\
 )\ _ )\ (_)) ((_) __((_)
 (_)_\(_)| _ \| __|\ \/ /
  / _ \  |  _/| _|  >  <
 /_/ \_\ |_|  |___|/_/\_\
APEX Version: v2.6.5-2876220-develop
Built on: 11:07:57 Feb 22 2024 (Release)
C++ Language Standard version : 201703
Clang Compiler version : Apple LLVM 15.0.0 (clang-1500.1.0.2.5)
Configured features: Pthread, OTF2

profiling:  ./examples/firstprime_3.12.py
The first prime number after 10000000 is 10000019
Success!
Done

Start Date/Time: 06/06/2024 15:15:27
Elapsed time: 0.060933 seconds
Total processes detected: 1
HW Threads detected on rank 0: 0
Worker Threads observed on rank 0: 1
Available CPU time on rank 0: 0 seconds
Available CPU time on all ranks: 0 seconds

CPU Timers                                           : #calls|   mean |   total
--------------------------------------------------------------------------------
                                           APEX MAIN :      1     0.06     0.06
                  firstPrimeAfter [{<string>} {7,0}] :      1     0.00     0.00
_find_and_load [{<frozen importlib._bootstrap>} {13… :      1     0.00     0.00
_find_and_load_unlocked [{<frozen importlib._bootst… :      1     0.00     0.00
_find_spec [{<frozen importlib._bootstrap>} {1240,0… :      1     0.00     0.00
_load_unlocked [{<frozen importlib._bootstrap>} {91… :      1     0.00     0.00
module_from_spec [{<frozen importlib._bootstrap>} {… :      1     0.00     0.00
find_spec [{<frozen importlib._bootstrap_external>}… :      1     0.00     0.00
_path_importer_cache [{<frozen importlib._bootstrap… :      8     0.00     0.00
_get_spec [{<frozen importlib._bootstrap_external>}… :      1     0.00     0.00
create_module [{<frozen importlib._bootstrap_extern… :      1     0.00     0.00
_call_with_frames_removed [{<frozen importlib._boot… :      3     0.00     0.00
find_spec [{<frozen importlib._bootstrap_external>}… :      6     0.00     0.00
_path_hooks [{<frozen importlib._bootstrap_external… :      1     0.00     0.00
              __init__ [{<frozen zipimport>} {64,0}] :      1     0.00     0.00
path_hook_for_FileFinder [{<frozen importlib._boots… :      1     0.00     0.00
 __enter__ [{<frozen importlib._bootstrap>} {416,0}] :      1     0.00     0.00
_path_stat [{<frozen importlib._bootstrap_external>… :      9     0.00     0.00
__init__ [{<frozen importlib._bootstrap_external>} … :      1     0.00     0.00
_path_join [{<frozen importlib._bootstrap_external>… :     27     0.00     0.00
_fill_cache [{<frozen importlib._bootstrap_external… :      1     0.00     0.00
_verbose_message [{<frozen importlib._bootstrap>} {… :     29     0.00     0.00
_path_abspath [{<frozen importlib._bootstrap_extern… :      2     0.00     0.00
   acquire [{<frozen importlib._bootstrap>} {304,0}] :      1     0.00     0.00
_get_spec [{<frozen importlib._bootstrap_external>}… :      1     0.00     0.00
_init_module_attrs [{<frozen importlib._bootstrap>}… :      1     0.00     0.00
                __init__ [{<frozen codecs>} {309,0}] :      1     0.00     0.00
_path_is_mode_type [{<frozen importlib._bootstrap_e… :      2     0.00     0.00
 __enter__ [{<frozen importlib._bootstrap>} {162,0}] :      1     0.00     0.00
_path_isdir [{<frozen importlib._bootstrap_external… :      1     0.00     0.00
spec_from_file_location [{<frozen importlib._bootst… :      1     0.00     0.00
_path_isfile [{<frozen importlib._bootstrap_externa… :      1     0.00     0.00
setdefault [{<frozen importlib._bootstrap>} {124,0}] :      1     0.00     0.00
exec_module [{<frozen importlib._bootstrap_external… :      1     0.00     0.00
_get_module_lock [{<frozen importlib._bootstrap>} {… :      1     0.00     0.00
_relax_case [{<frozen importlib._bootstrap_external… :      6     0.00     0.00
                  decode [{<frozen codecs>} {319,0}] :      1     0.00     0.00
  __exit__ [{<frozen importlib._bootstrap>} {420,0}] :      1     0.00     0.00
    cached [{<frozen importlib._bootstrap>} {632,0}] :      1     0.00     0.00
find_spec [{<frozen importlib._bootstrap>} {1128,0}] :      1     0.00     0.00
                __init__ [{<frozen codecs>} {260,0}] :      1     0.00     0.00
<genexpr> [{<frozen importlib._bootstrap_external>}… :      3     0.00     0.00
__enter__ [{<frozen importlib._bootstrap>} {1222,0}] :      3     0.00     0.00
 __exit__ [{<frozen importlib._bootstrap>} {1226,0}] :      3     0.00     0.00
   release [{<frozen importlib._bootstrap>} {372,0}] :      1     0.00     0.00
_path_isabs [{<frozen importlib._bootstrap_external… :      2     0.00     0.00
 find_spec [{<frozen importlib._bootstrap>} {982,0}] :      1     0.00     0.00
  __init__ [{<frozen importlib._bootstrap>} {232,0}] :      1     0.00     0.00
_get_cached [{<frozen importlib._bootstrap_external… :      1     0.00     0.00
    __new__ [{<frozen importlib._bootstrap>} {74,0}] :      1     0.00     0.00
        cb [{<frozen importlib._bootstrap>} {445,0}] :      1     0.00     0.00
     remove [{<frozen importlib._bootstrap>} {82,0}] :      1     0.00     0.00
  __init__ [{<frozen importlib._bootstrap>} {412,0}] :      1     0.00     0.00
   __init__ [{<frozen importlib._bootstrap>} {79,0}] :      1     0.00     0.00
    parent [{<frozen importlib._bootstrap>} {645,0}] :      1     0.00     0.00
  __init__ [{<frozen importlib._bootstrap>} {599,0}] :      1     0.00     0.00
  __exit__ [{<frozen importlib._bootstrap>} {173,0}] :      1     0.00     0.00
__init__ [{<frozen importlib._bootstrap_external>} … :      1     0.00     0.00
has_location [{<frozen importlib._bootstrap>} {653,… :      1     0.00     0.00
  __init__ [{<frozen importlib._bootstrap>} {158,0}] :      1     0.00     0.00
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
                                        Total timers : 149

Selective Measurement

To include a selective measurement configuration file, set the PERFSTUBS_PYTHON_FILTER_FILENAME environment variable to a configuration file. An example JSON filter file is in pstubs/python_event_filter.json.