-
Notifications
You must be signed in to change notification settings - Fork 52
/
Copy pathpublist.yml
533 lines (513 loc) · 30.7 KB
/
publist.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
- title: Cling–the new interactive interpreter for ROOT 6
author: V Vassilev, Ph Canal, A Naumann, P Russo
abstract: |
Cling is an interactive C++ interpreter, built on top of Clang and LLVM
compiler infrastructure. Like its predecessor Cint, Cling realizes the
read-print-evaluate-loop concept, in order to leverage rapid application
development. Implemented as a small extension to LLVM and Clang, the
interpreter reuses their strengths such as the praised concise and expressive
compiler diagnostics. We show how to match the interpreter concept to the
compiler library and generalize common set of requirements for building up
an interactive interpreter. We reason the design and implementation
decisions as solution to the challenge of implementing interpreter behaviour
as an extension of the compiler library. We present the new features, eg
how C++11 will come to Cling and how CINT-specific extensions are being
adopted. We clarify the state of integration in the ROOT framework and the
induced change set. We explain how ROOT ...
cites: '19'
cites_id: '17462954415562379042'
eprint: https://iopscience.iop.org/article/10.1088/1742-6596/396/5/052071/pdf
number: '5'
pages: '052071'
url: https://iopscience.iop.org/article/10.1088/1742-6596/396/5/052071/meta
volume: '396'
year: '2012'
- title: Clad—automatic differentiation using Clang and LLVM
author: V Vassilev, M Vassilev, A Penev, L Moneta, V Ilieva
abstract: |
Differentiation is ubiquitous in high energy physics, for instance in
minimization algorithms and statistical analysis, in detector alignment and
calibration, and in theory. Automatic differentiation (AD) avoids well-known
limitations in round-offs and speed, which symbolic and numerical
differentiation suffer from, by transforming the source code of functions. We
will present how AD can be used to compute the gradient of multi-variate
functions and functor objects. We will explain approaches to implement an AD
tool. We will show how LLVM, Clang and Cling (ROOT's C++ 11 interpreter)
simplifies creation of such a tool. We describe how the tool could be
integrated within any framework. We will demonstrate a simple proof-of-concept
prototype, called Clad, which is able to generate n-th order derivatives of
C++ functions and other language constructs. We also demonstrate how Clad can
offload laborious computations ...
cites: '5'
cites_id: '3614309831712318570'
eprint: https://iopscience.iop.org/article/10.1088/1742-6596/608/1/012055/pdf
journal: 'Journal of Physics: Conference Series'
number: '1'
pages: '012055'
publisher: IOP Publishing
url: https://iopscience.iop.org/article/10.1088/1742-6596/608/1/012055/meta
volume: '608'
year: '2015'
- title: Optimizing ROOT’s Performance Using C++ Modules
author: V Vassilev
abstract: |
ROOT comes with a C++ compliant interpreter cling. Cling needs to understand
the content of the libraries in order to interact with them. Exposing the
full shared library descriptors to the interpreter at runtime translates
into increased memory footprint. ROOT’s exploratory programming concepts
allow implicit and explicit runtime shared library loading. It requires the
interpreter to load the library descriptor. Re-parsing of descriptors’
content has a noticeable effect on the runtime performance. Present
state-of-art lazy parsing technique brings the runtime performance to
reasonable levels but proves to be fragile and can introduce correctness
issues. An elegant solution is to load information from the descriptor lazily
and in a non-recursive way.The LLVM community advances its C++ Modules
technology providing an io-efficient, on-disk representation capable to
reduce build times and peak memory usage. The feature ...
cites: '4'
cites_id: '860139365460275907'
eprint: https://iopscience.iop.org/article/10.1088/1742-6596/898/7/072023/pdf
journal: 'Journal of Physics: Conference Series'
number: '10.1088'
pages: 1742-6596
url: https://iopscience.iop.org/article/10.1088/1742-6596/898/7/072023/pdf
volume: '898'
year: '2016'
- title: Optimizing Frameworks’ Performance Using C++ Modules Aware ROOT
author: Y Takahashi, V Vassilev, O Shadura, R Isemann
abstract: |
ROOT is a data analysis framework broadly used in and outside of High Energy
Physics (HEP). Since HEP software frameworks always strive for performance
improvements, ROOT was extended with experimental support of runtime C++
Modules. C++ Modules are designed to improve the performance of C++ code
parsing. C++ Modules offers a promising way to improve ROOT’s runtime
performance by saving the C++ header parsing time which happens during ROOT
runtime. This paper presents the results and challenges of integrating C++
Modules into ROOT.
cites: '2'
cites_id: '7331821473567233508'
eprint: https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_02011.pdf
journal: EPJ Web of Conferences
pages: '02011'
publisher: EDP Sciences
url: https://www.epj-conferences.org/articles/epjconf/abs/2019/19/epjconf_chep2018_02011/epjconf_chep2018_02011.html
volume: '214'
year: '2019'
- title: GPU Accelerated Automatic Differentiation With Clad
author: Ioana Ifrim, Vassil Vassilev, David J Lange
abstract: |
Automatic Differentiation (AD) is instrumental for science and industry.
It is a tool to evaluate the derivative of a function specified through a
computer program. The range of AD application domain spans from Machine
Learning to Robotics to High Energy Physics. Computing gradients with
the help of AD is guaranteed to be more precise than the numerical
alternative and have a low, constant factor more arithmetical operations
compared to the original function. Moreover, AD applications to domain
problems typically are computationally bound. They are often limited by
the computational requirements of high-dimensional parameters and thus can
benefit from parallel implementations on graphics processing units (GPUs).
Clad aims to enable differential analysis for C/C++ and CUDA and is a
compiler-assisted AD tool available both as a compiler extension and in ROOT.
Moreover, Clad works as a plugin extending the Clang compiler; as a plugin
extending the interactive interpreter Cling; and as a Jupyter kernel
extension based on xeus-cling. We demonstrate the advantages of parallel
gradient computations on GPUs with Clad. We explain how to bring forth a new
layer of optimization and a proportional speed up by extending Clad to
support CUDA. The gradients of well-behaved C++ functions can be
automatically executed on a GPU. The library can be easily integrated into
existing frameworks or used interactively. Furthermore, we demonstrate the
achieved application performance improvements, including (~10x) in ROOT
histogram fitting and corresponding performance gains from offloading to GPUs.
cites: '0'
eprint: https://arxiv.org/abs/2203.06139
journal: arXiv preprint arXiv:2203.06139
url: https://arxiv.org/abs/2203.06139
year: '2022'
- title: Automatic Differentiation in ROOT
author: Vassil Vassilev, Aleksandr Efremov, Oksana Shadura
abstract: |
In mathematics and computer algebra, automatic differentiation (AD) is
a set of techniques to evaluate the derivative of a function specified by a
computer program. AD exploits the fact that every computer program, no matter
how complicated, executes a sequence of elementary arithmetic operations
(addition, subtraction, multiplication, division, etc.), elementary functions
(exp, log, sin, cos, etc.) and control flow statements. AD takes source code
of a function as input and produces source code of the derived function. By
applying the chain rule repeatedly to these operations, derivatives of
arbitrary order can be computed automatically, accurately to working
precision, and using at most a small constant factor more arithmetic
operations than the original program.This paper presents AD techniques
available in ROOT, supported by Cling, to produce derivatives of arbitrary
C/C++ functions through implementing source code transformation and
employing the chain rule of differential calculus in both forward mode and
reverse mode. We explain its current integration for gradient computation in
TFormula. We demonstrate the correctness and performance improvements in
ROOT's fitting algorithms.
cites: '1'
cites_id: '5151320058278720217'
eprint: https://arxiv.org/pdf/2004.04435
journal: arXiv preprint arXiv:2004.04435
url: https://arxiv.org/abs/2004.04435
year: '2020'
- title: 'arXiv: Software Challenges For HL-LHC Data Analysis'
author: Kim Albertsson Brann, Sitong An, Vassil Vassilev, Enric Tejedor
Saavedra, Jakob Blomer, Lorenzo Moneta, Axel Naumann, Enrico Guiraud,
Alja Mrak Tadel, Pere Mato Vila, Vincenzo Eduardo Padulano, Stephan
Hageboeck, Matevz Tadel, Stefan Wunsch, Bertrand Bellenot, Oksana
Shadura, Guilherme Amadio, Fons Rademakers, Massimiliano Galli, Xavier
Valls Pla, Sergey Linev, Philippe Canal, Olivier Couet
abstract: |
The high energy physics community is discussing where investment is needed
to prepare software for the HL-LHC and its unprecedented challenges. The ROOT
project is one of the central software players in high energy physics since
decades. From its experience and expectations, the ROOT team has distilled a
comprehensive set of areas that should see research and development in the
context of data analysis software, for making best use of HL-LHC's physics
potential. This work shows what these areas could be, why the ROOT team
believes investing in them is needed, which gains are expected, and where
related work is ongoing. It can serve as an indication for future research
proposals and cooperations.
cites: '0'
eprint: https://cds.cern.ch/record/2715455/files/2004.07675.pdf
number: 'arXiv: 2004.07675'
url: https://cds.cern.ch/record/2715455
year: '2020'
- title: C++ Modules in ROOT and Beyond
author: Vassil Vassilev, David Lange, Malik Shahzad Muzzafar, Mircho Rodozov,
Oksana Shadura, Alexander Penev
abstract: |
C++ Modules come in C++ 20 to fix the long-standing build scalability
problems in the language. They provide an io-efficient, on-disk
representation capable to reduce build times and peak memory usage. ROOT
employs the C++ modules technology further in the ROOT dictionary system to
improve its performance and reduce the memory footprint.ROOT with C++ Modules
was released as a technology preview in fall 2018, after intensive
development during the last few years. The current state is ready for
production, however, there is still room for performance optimizations. In
this talk, we show the roadmap for making the technology default in ROOT.
We demonstrate a global module indexing optimization which allows reducing
the memory footprint dramatically for many workflows. We will report user
feedback on the migration to ROOT with C++ Modules.
cites: '0'
eprint: https://arxiv.org/pdf/2004.06507
journal: arXiv preprint arXiv:2004.06507
url: https://arxiv.org/abs/2004.06507
year: '2020'
- title: Migrating large codebases to C++ Modules
author: Yuka Takahashi, Oksana Shadura, Vassil Vassilev
abstract: |
ROOT has several features which interact with libraries and require implicit
header inclusion. This can be triggered by reading or writing data on disk,
or user actions at the prompt. Often, the headers are immutable, and
reparsing is redundant. C++ Modules are designed to minimize the reparsing
of the same header content by providing an efficient on-disk representation
of C++ Code. ROOT has released a C++ Modules-aware technology preview, which
intends to become the default for the ROOT 6.20 release.
cites: '0'
eprint: https://iopscience.iop.org/article/10.1088/1742-6596/1525/1/012051/pdf
journal: 'Journal of Physics: Conference Series'
number: '1'
pages: '012051'
publisher: IOP Publishing
url: https://iopscience.iop.org/article/10.1088/1742-6596/1525/1/012051/meta
volume: '1525'
year: '2020'
- title: IDD – A Platform Enabling Differential Debugging
author: Martin Vassilev, Vassil Vassilev, Alexander Penev
abstract: |
Debugging is a very time consuming task which is not well supported by
existing tools. The existing methods do not provide tools enabling optimal
developers' productivity when debugging regressions in complex systems. In
this paper we describe a possible solution aiding differential debugging.
The differential debugging technique performs analysis of the regressed
system and identifying the cause of the unexpected behavior by comparing to
a previous version of the same system. The prototype, idd, inspects two
versions of the executable – a baseline and a regressed version. The
interactive debugging session runs side by side both executables and allows
to examine and to compare various internal states. The architecture can work
with multiple information sources comparing data from different tools. We
also show how idd can detect performance regressions using information from
third-party performance facilities. We illustrate how in practice we can
quickly discover regressions in large systems such as the clang compiler.
cites: '0'
eprint: https://content.sciendo.com/downloadpdf/journals/cait/20/1/article-p53.xml
journal: Cybernetics and Information Technologies
number: '1'
pages: 53-67
publisher: Sciendo
url: https://content.sciendo.com/view/journals/cait/20/1/article-p53.xml
volume: '20'
year: '2020'
- title: Relaxing the one definition rule in interpreted C++
author: Javier López-Gómez, Javier Fernández, David del Rio Astorga, Vassil
Vassilev, Axel Naumann, J Daniel García
abstract: |
Most implementations of the C++ programming language generate binary
executable code. However, interpreted execution of C++ sources has its own
use cases as the Cling interpreter from CERN's ROOT project has shown. Some
limitations are derived from the ODR (One Definition Rule) that rules out
multiple definitions of entities within a single translation unit (TU). ODR
is there to ensure uniform view of a given C++ entity across translation
units. Ensuring uniform view of C++ entities helps when producing ABI
compatible binaries. Interpreting C++ presumes a single ever-growing
translation unit that define away some of the ODR use-cases. Therefore, it
may well be desirable to relax the ODR and, consequently, to support the
ability of developers to override any existing definition for a given
declaration. This approach is especially well-suited for iterative
prototyping. In this paper, we extend Cling, a Clang/LLVM ...
cites: '0'
eprint: https://dl.acm.org/doi/pdf/10.1145/3377555.3377901
pages: 212-222
url: https://dl.acm.org/doi/abs/10.1145/3377555.3377901
year: '2020'
- title: 'P2072R0: Differentiable programming for C++'
author: Marco Foco, Max Rietmann, Vassil Vassilev, Michael Wong, Dmitry
Duka, Vinod Grover
abstract: |
Derivatives are vital a wide variety of computing applications, including
numerical optimization, solution of nonlinear equations, sensitivity
analysis, and nonlinear inverse problems. Virtually every process could be
described with a mathematical function. A mathematical function can be
thought of as an association between elements from different sets.
Derivatives can track how a varying quantity depends on another quantity,
for example how the position of a planet as the time varies. Derivatives and
gradients allows us to explore the properties of a function and thus the
described process as a whole. Also, gradients are an essential component in
gradient-based optimization methods, that have become more and more important
in recent years, in particular with its application training of (Deep) Neural
Networks. Derivatives can be computed numerically, but unfortunately the
finite differences methods are problematic due to the approximation operated
and the finite precision of floating point values used, and so the
implementation of the method faces precision and round off problems which
can affect the overall precision of the computation. This problem becomes
worse with higher order derivatives.
cites: '0'
eprint: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2072r0.pdf
url: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2072r0.pdf
year: '2020'
- title: Continuous Performance Benchmarking Framework for ROOT
author: Oksana Shadura, Vassil Vassilev, Brian Paul Bockelman
abstract: |
Foundational software libraries such as ROOT are under intense pressure to
avoid software regression, including performance regressions. Continuous
performance benchmarking, as a part of continuous integration and other code
quality testing, is an industry best-practice to understand how the performance
of a software product evolves. We present a framework, built from industry best
practices and tools, to help to understand ROOT code performance and monitor
the efficiency of the code for several processor architectures. It additionally
allows historical performance measurements for ROOT I/O, vectorization and
parallelization sub-systems.
cites: '0'
eprint: https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_05003.pdf
journal: EPJ Web of Conferences
pages: '05003'
publisher: EDP Sciences
url: https://www.epj-conferences.org/articles/epjconf/abs/2019/19/epjconf_chep2018_05003/epjconf_chep2018_05003.html
volume: '214'
year: '2019'
- title: Extending ROOT through Modules
author: Oksana Shadura, Brian Paul Bockelman, Vassil Vassilev
abstract: |
The ROOT software framework is foundational for the HEP ecosystem, providing
multiple capabilities such as I/O, a C++ interpreter, GUI, and math
libraries. It uses object-oriented concepts and build-time components to
layer between them. We believe that a new layering formalism will benefit
the ROOT user community. We present the modularization strategy for ROOT
which aims to build upon the existing source components, making available
the dependencies and other metadata outside of the build system, and allow
post-install additions on top of existing installation as well as in the ROOT
runtime environment. Components can be grouped into packages and made
available from repositories in order to provide a post-install step of
missing packages. This feature implements a mechanism for the more
comprehensive software ecosystem and makes it available even from a minimal
ROOT installation. As part of ...
cites: '0'
eprint: https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_05011.pdf
journal: EPJ Web of Conferences
pages: '05011'
publisher: EDP Sciences
url: https://www.epj-conferences.org/articles/epjconf/abs/2019/19/epjconf_chep2018_05011/epjconf_chep2018_05011.html
volume: '214'
year: '2019'
- title: Native Language Integrated Queries with CppLINQ in C++
author: V Vassilev
abstract: |
Programming language evolution brought to us the domain-specific languages
(DSL). They proved to be very useful for expressing specific concepts,
turning into a vital ingredient even for general-purpose frameworks.
Supporting declarative DSLs (such as SQL) in imperative languages (such as
C++) can happen in the manner of language integrated query (LINQ).
cites: '0'
eprint: https://iopscience.iop.org/article/10.1088/1742-6596/608/1/012030/pdf
journal: 'Journal of Physics: Conference Series'
number: '1'
pages: '012030'
publisher: IOP Publishing
url: https://iopscience.iop.org/article/10.1088/1742-6596/608/1/012030/meta
volume: '608'
year: '2015'
- title: clad - Automatic Differentiation with Clang
author: Violeta Ilieva, Vassil Vassilev
abstract: |
Differentiation is the process of finding a derivative, which measures
how a function changes as its input changes. Derivatives can be evaluated
with machine precision accuracy through a method called automatic
differentiation. Unlike other methods for differentiation, including
numerical and symbolic, automatic differentiation yields exact derivatives
even of complicated functions at relatively low processing and storage costs.
cites: '0'
eprint: http://www.llvm.org/devmtg/2013-11/slides/Vassilev-Poster.pdf
url: http://www.llvm.org/devmtg/2013-11/slides/Vassilev-Poster.pdf
year: '2013'
- title: Automatic Differentiation of Binned Likelihoods With Roofit and Clad
author: Garima Singh, Jonas Rembser, Lorenzo Moneta, David Lange, Vassil Vassilev
abstract: |
RooFit is a toolkit for statistical modeling and fitting used by most experiments
in particle physics. Just as data sets from next-generation experiments grow,
processing requirements for physics analysis become more computationally demanding,
necessitating performance optimizations for RooFit. One possibility to speed-up
minimization and add stability is the use of Automatic Differentiation (AD). Unlike
for numerical differentiation, the computation cost scales linearly with the number
of parameters, making AD particularly appealing for statistical models with many
parameters. In this paper, we report on one possible way to implement AD in RooFit.
Our approach is to add a facility to generate C++ code for a full RooFit model
automatically. Unlike the original RooFit model, this generated code is free of virtual
function calls and other RooFit-specific overhead. In particular, this code is then
used to produce the gradient automatically with Clad. Clad is a source transformation
AD tool implemented as a plugin to the clang compiler, which automatically generates
the derivative code for input C++ functions. We show results demonstrating the
improvements observed when applying this code generation strategy to HistFactory and
other commonly used RooFit models.
cites: '0'
eprint: https://arxiv.org/abs/2304.02650
url: https://arxiv.org/abs/2304.02650
year: '2023'
- title: Efficient and Accurate Automatic Python Bindings with cppyy & Cling
author: Baidyanath Kundu, Vassil Vassilev, Wim Lavrijsen
abstract: |
The simplicity of Python and the power of C++ force stark choices on a scientific software
stack. There have been multiple developments to mitigate language boundaries by implementing
language bindings, but the impedance mismatch between the static nature of C++ and the
dynamic one of Python hinders their implementation; examples include the use of user-defined
Python types with templated C++ and advanced memory management.
The development of the C++ interpreter Cling has changed the way we can think of language
bindings as it provides an incremental compilation infrastructure available at runtime.
That is, Python can interrogate C++ on demand, and bindings can be lazily constructed at
runtime. This automatic binding provision requires no direct support from library authors
and offers better performance than alternative solutions, such as PyBind11. ROOT pioneered
this approach with PyROOT, which was later enhanced with its successor, cppyy. However,
until now, cppyy relied on the reflection layer of ROOT, which is limited in terms of
provided features and performance.
This paper presents the next step for language interoperability with cppyy, enabling
research into uniform cross-language execution environments and boosting optimization
opportunities across language boundaries. We illustrate the use of advanced C++ in
Numba-accelerated Python through cppyy. We outline a path forward for re-engineering
parts of cppyy to use upstream LLVM components to improve performance and sustainability.
We demonstrate cppyy purely based on a C++ reflection library, InterOp, which offers
interoperability primitives based on Cling and Clang-Repl.
cites: '0'
eprint: https://arxiv.org/abs/2304.02712
url: https://arxiv.org/abs/2304.02712
year: '2023'
- title: Fast And Automatic Floating Point Error Analysis With CHEF-FP
author: Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev,
David J. Lange, Vassil Vassilev
abstract: |
As we reach the limit of Moore's Law, researchers are exploring different
paradigms to achieve unprecedented performance. Approximate Computing (AC),
which relies on the ability of applications to tolerate some error in the
results to trade-off accuracy for performance, has shown significant promise.
Despite the success of AC in domains such as Machine Learning, its
acceptance in High-Performance Computing (HPC) is limited due to its
stringent requirement of accuracy. We need tools and techniques to identify
regions of the code that are amenable to approximations and their impact on
the application output quality so as to guide developers to employ selective
approximation. To this end, we propose CHEF-FP, a flexible, scalable, and
easy-to-use source-code transformation tool based on Automatic
Differentiation (AD) for analysing approximation errors in HPC applications.
CHEF-FP uses Clad, an efficient AD tool built as a plugin to the Clang
compiler and based on the LLVM compiler infrastructure, as a backend
and utilizes its AD abilities to evaluate approximation errors in C++
code. CHEF-FP works at the source level by injecting error estimation
code into the generated adjoints. This enables the error-estimation code
to undergo compiler optimizations resulting in improved analysis time and
reduced memory usage. We also provide theoretical and architectural
augmentations to source code transformation-based AD tools to perform FP
error analysis. In this paper, we primarily focus on analyzing errors
introduced by mixed-precision AC techniques, the most popular approximate
technique in HPC. We also show the applicability of our tool in estimating
other kinds of errors by evaluating our tool on codes that use approximate
functions. Moreover, we demonstrate the speedups achieved by CHEF-FP during
analysis time as compared to the existing state-of-the-art tool as a result
of its ability to generate and insert approximation error estimate code
directly into the derivative source. The generated code also becomes a
candidate for better compiler optimizations contributing to lesser
runtime performance overhead.
cites: '0'
eprint: https://arxiv.org/abs/2304.06441
journal: '2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)'
pages: 1018-1028
publisher: IEEE
url: https://ieeexplore.ieee.org/document/10177445
link: /publications/fast-and-automatic-floating-point-error-analysis-with-chef-fp
volume: '608'
year: '2023'
- title: Performance Portable Gradient Computations Using Source Transformation
author: Kim Liegeois, Brian Kelley, Eric Phipps, Sivasankaran Rajamanickam and
Vassil Vassilev
abstract: |
Derivative computation is a key component of optimization, sensitivity
analysis, uncertainty quantification, and nonlinear solvers. Automatic
differentiation (AD) is a powerful technique for evaluating such
derivatives, and in recent years, has been integrated into programming
environments such as Jax, PyTorch, and TensorFlow to support derivative
computations needed for training of machine learning models, resulting in
widespread use of these technologies. The C++ language has become the de
facto standard for scientific computing due to numerous factors, yet
language complexity has made the adoption of AD technologies for C++
difficult, hampering the incorporation of powerful differentiable
programming approaches into C++ scientific simulations. This is exacerbated
by the increasing emergence of architectures such as GPUs, which have
limited memory capabilities and require massive thread-level
concurrency. Portable scientific codes rely on domain specific programming
models such as Kokkos making AD for such codes even more complex.<br />
In this paper, we will investigate source transformation-based automatic
differentiation using Clad to automatically generate portable and efficient
gradient computations of Kokkos-based code. We discuss the modifications of
Clad required to differentiate Kokkos abstractions. We will illustrate the
feasibility of our proposed strategy by comparing the wall-clock time of the
generated gradient code with the wall-clock time of the input function on
different cutting edge GPU architectures such as NVIDIA H100, AMD MI250x,
and Intel Ponte Vecchio GPU. For these three architectures and for the
considered example, evaluating up to 10,000 entries of the gradient only
took up to 2.17 times the wall-clock time of evaluating the input function.
cites: '0'
eprint: 8th International Conference on Algorithmic Differentiation
url: https://www.autodiff.org/ad24/
year: '2024'
- title: Optimization Using Pathwise Algorithmic Derivatives of Electromagnetic
Shower Simulations
author: Max Aehle, Mihaly Novak, Vassil Vassilev, Nicolas R. Gauger,
Lukas Heinrich, Michael Kagan and David Lange
abstract: |
Among the well-known methods to approximate derivatives of expectancies
computed by Monte-Carlo simulations, averages of pathwise derivatives are
often the easiest one to apply. Computing them via algorithmic
differentiation typically does not require major manual analysis and
rewriting of the code, even for very complex programs like simulations of
particle-detector interactions in high-energy physics. However, the pathwise
derivative estimator can be biased if there are discontinuities in the
program, which may diminish its value for applications.<br />
This work integrates algorithmic differentiation into the electromagnetic
shower simulation code HepEmShow based on G4HepEm, allowing us to study how
well pathwise derivatives approximate derivatives of energy depositions in a
sampling calorimeter with respect to parameters of the beam and geometry. We
found that when multiple scattering is disabled in the simulation, means of
pathwise derivatives converge quickly to their expected values, and these
are close to the actual derivatives of the energy deposition. Additionally,
we demonstrate the applicability of this novel gradient estimator for
stochastic gradient-based optimization in a model example.
cites: '0'
eprint: https://arxiv.org/pdf/2405.07944
url: https://arxiv.org/pdf/2405.07944
year: '2024'