Skip to content

Commit

Permalink
HPCC-30766 Add perftrace option for roxie wu mode
Browse files Browse the repository at this point in the history
Signed-off-by: Richard Chapman <[email protected]>
  • Loading branch information
richardkchapman committed Nov 8, 2023
1 parent f86e540 commit 5948a91
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 0 deletions.
26 changes: 26 additions & 0 deletions devdoc/roxie.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,32 @@ Should the scope of the blacklist be different? Possible scopes are:
Options 2 and 4 above would allow all aspects of the blacklisting behaviour to be specified by options on the SOAPCALL. We could control whether or not the
blacklister is to be used at all via a SOAPCALL option with any of the above...

perftrace options
=================

The HPCC Platform includes a rudimentary performance tracing feature using periodic stack capture to generate flame graphs. Roxie supports this in 3 ways:

1. If expert/@profileStartup is set in roxie config, a flame graph is generated for operations during Roxie startup phase.
2. If @perf is set on an incoming query, a flame graph is generated for the lifetime of that query's execution, and returned along with the query results
3. If expert/perftrace is set in roxie config, one-shot roxie queries (e.g. eclagent mode) generate a flame graph (currently just to a text file).

The perf trace operates as follows:

1. A child process is launched that runs the doperf script. This samples the current stack(s) every 0.2s (configurable) to a series of text files.
2. When tracing is done, these text files are "folded" via a perl script that notes every unique stack and how many times it was seen, one line per unique stack
3. This folded stack list is filtered to suppress some stacks that are not very interesting
4. The filtered folded stack list is passed to another perl script that generates an svg file.

The basic info captured at step 1 (or maybe 2) could also be analysed to give other insights, such as:

1. A list of "time in function, time in children of function".
2. An expanded list of callers to __GI___lll_lock_wait and __GI___lll_lock_wake, to help spot contended critsecs.

Unfortunately some info present in the original stack text files is lost in the folded summary - in particular related to the TID
that the stack is on. Can we spot lifetimes of threads and/or should we treat "stacks" on different threads as different? Thread pools might render this difficult though.
There is an option in stack-collapse-elfutils.pl to include the TID when considering whether stacks match, so
perhaps we should just (optionally) use that.

Some notes on LocalAgent mode
=============================

Expand Down
28 changes: 28 additions & 0 deletions roxie/ccd/ccdmain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1433,6 +1433,10 @@ int CCD_API roxie_main(int argc, const char *argv[], const char * defaultYaml)
if (runOnce)
{
// Avoid delaying the release of packages or queries - otherwise stand-alone queries can take a while to terminate
PerfTracer oneShotTracer;
bool traceOneShot = topology->getPropBool("expert/@perftrace", false); // MORE - check wu options too?
if (traceOneShot)
oneShotTracer.start();
agentQueryReleaseDelaySeconds = 0;
if (wuid)
{
Expand Down Expand Up @@ -1476,6 +1480,30 @@ int CCD_API roxie_main(int argc, const char *argv[], const char * defaultYaml)
E->Release();
}
}
if (traceOneShot)
{
oneShotTracer.stop();
const char *fname = topology->queryProp("expert/@perftraceFileName");
if (!fname)
fname = "perftrace.svg";
Owned<IFile> iFile = createIFile(fname);
try
{
Owned<IFileIO> iFileIO = iFile->open(IFOcreate);
if (iFileIO)
{
StringBuffer &svg = oneShotTracer.queryResult();
iFileIO->write(0, svg.length(), svg.str());
DBGLOG("Flame graph for query written to %s", fname);
}
}
catch (IException *E)
{
VStringBuffer msg("Failed to write flame graph to %s", fname);
EXCLOG(E, msg);
::Release(E);
}
}
}
else
{
Expand Down

0 comments on commit 5948a91

Please sign in to comment.