Measuring things #127

gvanrossum · 2021-11-17T00:01:32Z

gvanrossum
Nov 17, 2021
Maintainer

This is not really a new issue, more of a rant. Everybody loves to work on a cool new feature. But too often we don't know how to design something because we don't have enough data. Examples: Where does startup time go? How much overhead does bytecode instruction dispatch cost? Which are the hottest C functions in our code base? What are the most common opcode pairs?

The result is that often we design purely based on intuition, or based on old data or hearsay (e.g. a table of opcode pair frequencies published by Instagram five years ago), or based on some proxy (e.g. static opcode frequency instead of dynamic opcode frequency).

When we do collect data we may use hackish tooling (maybe a few lines of sed/grep/etc. pipelines that is not written down except in our shell history file) that cannot be reproduced by others or collected systematically over time.

So I think we should formulate our needs for data and then design and build some tooling to collect that data. A (relatively) good example is speed.python.org and PyPerformance -- this solves two data needs, comparing benchmark performance over time and across Python versions, with the ability in the UI to drill down on individual benchmarks or alternative configurations.

We have a few other tools (mostly related to counting opcodes in various ways) collected in https://github.com/faster-cpython/tools/, but we also have huge blind gaps (we don't know anything about the timing of individual opcodes or the distribution of types), and we have no infrastructure for collecting and publishing various types of profiling data in a repeatable way (a la speed.python.org). For example, we have a flamegraph of where startup time goes (on Linux), but it would be nice to be able to produce a similar graph now that we have implemented freezing and deep-freezing by pressing a button.

JunyiXie · 2021-11-17T10:28:42Z

JunyiXie
Nov 17, 2021

opcode time analysis
faster-cpython/tools#3

0 replies

FreddieWitherden · 2021-11-24T02:25:14Z

FreddieWitherden
Nov 24, 2021

A few other things which might be of interest to collect.

Average string length. This can be useful for seeing if there is anything to be gained from small string optimisations of the kind most C++ standard libraries now do.
Frequency of lists whose elements are homogeneous and 'PODs' (integers, floats, tuples thereof). A similar case goes for tuples and to a lesser extent dictionaries.
Frequency at which large integers are encountered > 2^32 and > 2^52 being the usual cut-offs.

Another item in this direction which might be worth investigating is the idea of for a given operation what is "Mach 1" — as in the fastest you can go without breaking any rules. Hotspots are not necessarily a problem if they correspond to code which is well optimised and already getting close to ideal performance. A good example here is dictionaries and strings. Several years back I remember benchmarking unordered_map<std::string, std::string> from Boost against Python using the C API (bypassing the interpreter). The results were close.

On the interpreter side a lot is known about how fast one can make an interpreter go short of JIT'ing. Darek Mihocka has some excellent articles on the subject (mostly within the context of emulating CPU architectures, but in many regards this is more challenging than Python bytecode since CPU instructions typically do far less real work than Python opcodes). A fun test here is to just get the interpreter to rattle through a few billion NOP's and count how many CPU cycles it burns. Combined with how many opcodes are needed to execute a 'real' benchmark this can give a weak upper bound on performance. If this upper bound is not sufficient then it indicates that one may want to give the interpreter a prod.

For opcodes some care is needed when interpreting results to due operator overloading and arbitrary precision integers. Spending a lot of time adding numbers is a problem; unless you're in a benchmark which does arbitrary precision calculations where such a result is expected. Same goes for subscript operations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measuring things #127

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Measuring things #127

gvanrossum Nov 17, 2021 Maintainer

Replies: 2 comments

JunyiXie Nov 17, 2021

FreddieWitherden Nov 24, 2021

gvanrossum
Nov 17, 2021
Maintainer

JunyiXie
Nov 17, 2021

FreddieWitherden
Nov 24, 2021