This repository contains examples on how to profile Python applications using cProfile
, snakeviz
, line_profiler
, and memory_profiler
modules.
To reproduce the same environment, I suggest using Conda as your package manager. If you have it installed, you can use environment.yml to create it using
conda env create -f environment.yml
and activate it with
conda activate profiling
Time-based profiling allows you to see how much time your application spends in each one of its components.
We use the cProfile
module to profile an entire Python script. In each example folder, you will find a time/app-overview
folder that contains the relevant code, along with a profile.sh
script that will run the Python code with cProfile
on. This script will generate a file example.prof
, that contains the profiling data.
Even though it is possible to get statistics directly from cProfile
, a great way to visualize the profiling results is with snakeviz
. It's very easy to use. For each example, you will find a visualize.sh
script that, when run, will launch snakeviz
in a browser tab. Below is how a typical result looks:
Once you spotted what functions, methods or routines are consuming most of the time in your application, you may want to dig deeper into it to see exactly what instructions under each of them are the hot ones. For each example, in time/line-by-line
, we use line_profiler
for that, which requires decorating the target function with @profile
. The profile.sh
script calls the relevant binary (kernprof
) to generate the profiling data, which can then be visualized with the visualize.sh
script. A typical output is:
Understanding your Python application in terms of time is definitely an important step, but to characterize your application workload better, we also need to understand how it uses memory.
We use the memory_profiler
module to get an overview of how much memory a Python script is using as a function of time. For each example, the memory/app-overview
folder contains the code to be profiled and a profile.sh
script that uses the relevant binary (mprof
) to generate the profiling data, which can be visualized using the visualize.sh
script. A typical output is:
We can also target individual functions with the @profile
decorator. memory_profiler
will then show the amount of memory that the process associated to the Python interpreter is using as your code evolves, line by line. For each example, under memory/line-by-line
, the profile.sh
script runs the profiler and shows the results. A typical output is:
Profiling Jupyter notebooks directly involves jumping through some hoops. The simplest alternative is to copy the content of your cells into a Python script. It possible to get the same effect with the nbconvert
module:
jupyter nbconvert <YourNB>.ipynb --to script
which will generate a <YourNB>.py
script. Sometimes it looks quite ugly, though.