diff --git a/content/pubs/_index.md b/content/pubs/_index.md
new file mode 100644
index 0000000..f1731ba
--- /dev/null
+++ b/content/pubs/_index.md
@@ -0,0 +1,12 @@
+---
+title: 'LLVM Related Publications'
+description: ""
+toc: true
+tags: []
+draft: false
+---
+
+Here are some of the publications that use or build on LLVM.
+
+
+{{
+ {{- $titleContent := .title | plainify -}}
+ {{- $id := anchorize $titleContent -}}
+ {{ .year }}
+
+ {{ $currentYear = .year }}
+ {{ end }}
+
+
{{ .title }}
+
+ {{ .author }}
+
+ {{ if .url }}
+ [Link]
+ {{ else }}
+ [Search]
+ {{ end }}
+
+
+
+ {{ if .published }}
+ {{ .published }},
+ {{ end }}
+
+ {{ if .location }}{{ .location }},{{ end }}
+
+ {{ if or .month .year }}
+ {{ if .month }}
+ {{ index (slice "Jan." "Feb." "Mar." "Apr." "May" "June" "July" "Aug." "Sep." "Oct." "Nov." "Dec.") (sub (int .month) 1) }}
+ {{ end }}
+ {{ if .year }} {{ .year }}.{{ end }}
+ {{ end }}
+
+ {{ if .award }}
+
{{ .award }}.
+ {{ end }}
+
+
+This paper presents an analysis technique and a novel program transformation +that can enable powerful optimizations for entire linked data structures. The +fully automatic transformation converts ordinary programs to use pool (aka +region) allocation for heap-based data structures. The transformation relies on +an efficient link-time interprocedural analysis to identify disjoint data +structures in the program, to check whether these data structures are accessed +in a type-safe manner, and to construct a Disjoint Data Structure Graph that +describes the connectivity pattern within such structures. We present +preliminary experimental results showing that the data structure analysis and +pool allocation are effective for a set of pointer intensive programs in the +Olden benchmark suite. To illustrate the optimizations that can be enabled by +these techniques, we describe a novel pointer compression transformation and +briefly discuss several other optimization possibilities for linked data +structures. ++ +
+ "Automatic Pool Allocation for Disjoint Data Structures", + Chris Lattner & Vikram Adve,+ +
+ ACM SIGPLAN Workshop on Memory + System Performance (MSP), Berlin, Germany, June 2002.
+
+@InProceedings{LattnerAdve:MSP02, + Author = "{Chris Lattner and Vikram Adve}", + Title = "{Automatic Pool Allocation for Disjoint Data Structures}", + Booktitle = "{Proc. ACM SIGPLAN Workshop on Memory System Performance}", + Address = "{Berlin, Germany}", + Month = {Jun}, + Year = {2002} +} ++ + + diff --git a/static/pubs/2002-06-AutomaticPoolAllocation.pdf b/static/pubs/2002-06-AutomaticPoolAllocation.pdf new file mode 100644 index 0000000..8a24642 Binary files /dev/null and b/static/pubs/2002-06-AutomaticPoolAllocation.pdf differ diff --git a/static/pubs/2002-06-AutomaticPoolAllocation.ppt b/static/pubs/2002-06-AutomaticPoolAllocation.ppt new file mode 100644 index 0000000..4c0f080 Binary files /dev/null and b/static/pubs/2002-06-AutomaticPoolAllocation.ppt differ diff --git a/static/pubs/2002-06-AutomaticPoolAllocation.ps b/static/pubs/2002-06-AutomaticPoolAllocation.ps new file mode 100644 index 0000000..38046ad Binary files /dev/null and b/static/pubs/2002-06-AutomaticPoolAllocation.ps differ diff --git a/static/pubs/2002-08-08-CASES02-ControlC.html b/static/pubs/2002-08-08-CASES02-ControlC.html new file mode 100644 index 0000000..f31205d --- /dev/null +++ b/static/pubs/2002-08-08-CASES02-ControlC.html @@ -0,0 +1,78 @@ + + + + + +
+This paper considers the problem of providing safe programming +support and enabling secure online software upgrades for control +software in real-time control systems. +In such systems, offline techniques for ensuring code safety are +greatly preferable to online techniques. +We propose a language called Control-C that is essentially a subset +of C, but with key restrictions designed to ensure that memory safety +of code can be verified entirely by static checking, +under certain system assumptions. +The language permits pointer-based data structures, restricted +dynamic memory allocation, and restricted array operations, +without requiring any runtime checks on memory operations and +without garbage collection. +The language restrictions have been chosen based on an understanding +of both compiler technology and the needs of real-time control systems. +The paper describes the language design and a +compiler implementation for Control-C. We use control codes +from three different experimental control systems to evaluate the +suitability of the language for these codes, the effort required +to port them to Control-C, and the effectiveness of the compiler +in detecting a wide range of potential security violations for +one of the systems. ++ +
+ "Ensuring Code Safety Without Runtime Checks for Real-Time Control Systems", + Sumant Kowshik, Dinakar Dhurjati & Vikram Adve,+ +
+ CASES + 2002, Grenoble, France, Oct 2002.
+
+ @inproceedings{KDA:LCTES03, + Author = {Sumant Kowshik, Dinakar Dhurjati and Vikram Adve}, + Title = "{Ensuring Code Safety Without Runtime Checks for Real-Time Control Systems}", + Booktitle = "{Proc. Int'l Conf. on Compilers Architecture and Synthesis for Embedded Systems, 2002}", + Address = {Grenoble, France}, + Month = {Oct}, + Year = {2002}, + URL = {http://llvm.cs.uiuc.edu/pubs/2003-08-08-CASES02-ControlC.html} + } ++ + + diff --git a/static/pubs/2002-08-08-CASES02-ControlC.pdf b/static/pubs/2002-08-08-CASES02-ControlC.pdf new file mode 100644 index 0000000..b94599f Binary files /dev/null and b/static/pubs/2002-08-08-CASES02-ControlC.pdf differ diff --git a/static/pubs/2002-08-08-CASES02-ControlC.ps b/static/pubs/2002-08-08-CASES02-ControlC.ps new file mode 100644 index 0000000..52c465c Binary files /dev/null and b/static/pubs/2002-08-08-CASES02-ControlC.ps differ diff --git a/static/pubs/2002-08-09-LLVMCompilationStrategy.html b/static/pubs/2002-08-09-LLVMCompilationStrategy.html new file mode 100644 index 0000000..2405b86 --- /dev/null +++ b/static/pubs/2002-08-09-LLVMCompilationStrategy.html @@ -0,0 +1,81 @@ + + + + + +
+This document introduces the LLVM compiler infrastructure and instruction set, a +simple approach that enables sophisticated code transformations at link time, +runtime, and in the field. It is a pragmatic approach to compilation, +interfering with programmers and tools as little as possible, while still +retaining extensive high-level information from source-level compilers for later +stages of an application's lifetime. We describe the LLVM instruction set, +the design of the LLVM system, and some of its key components. ++ +
+ "The LLVM Instruction Set and Compilation Strategy", + Chris Lattner & Vikram Adve+ +
+ Technical Report #UIUCDCS-R-2002-2292, Computer Science Dept., Univ. of + Illinois, Aug. 2002. +
++ ++ Since this document was published, one significant change has been + made to LLVM: the GCC C front-end described in the document has been + completely rewritten from scratch. The new C front-end is based on the + mainline GCC CVS tree (what will become GCC 3.4), and expands type-safe LLVM + code from the GCC AST representation, instead of from the untyped GCC RTL + representation. +
++ This change dramatically improved the quality of code generated and the + stability of the system as a whole. +
+
+@TechReport{LattnerAdve:LLVM:ISCS, + Author = "{Chris Lattner and Vikram Adve}", + Title = "{The LLVM Instruction Set and Compilation Strategy}", + Institution = "{CS Dept., Univ. of Illinois at Urbana-Champaign}", + Number = {UIUCDCS-R-2002-2292}, + Type = {Tech. Report}, + Month = {Aug}, + Year = {2002} +} ++ + + diff --git a/static/pubs/2002-08-09-LLVMCompilationStrategy.pdf b/static/pubs/2002-08-09-LLVMCompilationStrategy.pdf new file mode 100644 index 0000000..eee751f Binary files /dev/null and b/static/pubs/2002-08-09-LLVMCompilationStrategy.pdf differ diff --git a/static/pubs/2002-08-09-LLVMCompilationStrategy.ps b/static/pubs/2002-08-09-LLVMCompilationStrategy.ps new file mode 100644 index 0000000..4a6d8ee Binary files /dev/null and b/static/pubs/2002-08-09-LLVMCompilationStrategy.ps differ diff --git a/static/pubs/2002-08-11-CASES02-ControlC.ppt b/static/pubs/2002-08-11-CASES02-ControlC.ppt new file mode 100644 index 0000000..e6e627c Binary files /dev/null and b/static/pubs/2002-08-11-CASES02-ControlC.ppt differ diff --git a/static/pubs/2002-11-15-PLDI-Submission.pdf b/static/pubs/2002-11-15-PLDI-Submission.pdf new file mode 100644 index 0000000..533dd88 Binary files /dev/null and b/static/pubs/2002-11-15-PLDI-Submission.pdf differ diff --git a/static/pubs/2002-12-LattnerMSThesis-book.pdf b/static/pubs/2002-12-LattnerMSThesis-book.pdf new file mode 100644 index 0000000..977983d Binary files /dev/null and b/static/pubs/2002-12-LattnerMSThesis-book.pdf differ diff --git a/static/pubs/2002-12-LattnerMSThesis-book.ps b/static/pubs/2002-12-LattnerMSThesis-book.ps new file mode 100644 index 0000000..26c215d Binary files /dev/null and b/static/pubs/2002-12-LattnerMSThesis-book.ps differ diff --git a/static/pubs/2002-12-LattnerMSThesis.html b/static/pubs/2002-12-LattnerMSThesis.html new file mode 100644 index 0000000..38017d7 --- /dev/null +++ b/static/pubs/2002-12-LattnerMSThesis.html @@ -0,0 +1,99 @@ + + + + + +
++ ++Modern programming languages and software engineering principles are causing +increasing problems for compiler systems. Traditional approaches, which use +a simple compile-link-execute model, are unable to provide adequate application +performance under the demands of the new conditions. Traditional approaches to +interprocedural and profile-driven compilation can provide the application +performance needed, but require infeasible amounts of compilation time to build +the application.
+ ++This thesis presents LLVM, a design and implementation of a compiler +infrastructure which supports a unique multi-stage optimization system. +This system is designed to support extensive interprocedural and +profile-driven optimizations, while being efficient enough for use in +commercial compiler systems.
+ ++The LLVM virtual instruction set is the glue that holds the system together. It +is a low-level representation, but with high-level type information. +This provides the benefits of a low-level representation (compact +representation, wide variety of available transformations, etc.) as well as +providing high-level information to support aggressive interprocedural +optimizations at link- and post-link time. In particular, this system is +designed to support optimization in the field, both at run-time and during +otherwise unused idle time on the machine.
+ ++This thesis also describes an implementation of this compiler design, the LLVM +compiler infrastructure, proving that the design is feasible. The LLVM +compiler infrastructure is a maturing and efficient system, which we show is a +good host for a variety of research. More information about LLVM can be found +on its web site at: http://llvm.cs.uiuc.edu/
+
+This thesis supercedes an older +technical report. +
+ ++ "LLVM: An Infrastructure for Multi-Stage Optimization", Chris Lattner.+ +
+ Masters Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, Dec. 2002. +
+ The "book form" is useful if you plan to print this out. Print the file out + double sided, fold it in half, and staple it in the middle of the page. It + dramatically reduces the number of pages of paper used. +
++ @MastersThesis{Lattner:MSThesis02, + author = {Chris Lattner}, + title = "{LLVM: An Infrastructure for Multi-Stage Optimization}", + school = "{Computer Science Dept., University of Illinois at Urbana-Champaign}", + year = {2002}, + address = {Urbana, IL}, + month = {Dec}, + note = {{\em See {\tt http://llvm.cs.uiuc.edu}.}} + } ++ + + diff --git a/static/pubs/2002-12-LattnerMSThesis.pdf b/static/pubs/2002-12-LattnerMSThesis.pdf new file mode 100644 index 0000000..99cf3c1 Binary files /dev/null and b/static/pubs/2002-12-LattnerMSThesis.pdf differ diff --git a/static/pubs/2002-12-LattnerMSThesis.ps b/static/pubs/2002-12-LattnerMSThesis.ps new file mode 100644 index 0000000..c01e2d4 Binary files /dev/null and b/static/pubs/2002-12-LattnerMSThesis.ps differ diff --git a/static/pubs/2003-04-29-DataStructureAnalysisTR.html b/static/pubs/2003-04-29-DataStructureAnalysisTR.html new file mode 100644 index 0000000..0560ba6 --- /dev/null +++ b/static/pubs/2003-04-29-DataStructureAnalysisTR.html @@ -0,0 +1,70 @@ + + + + + +
+This paper presents an efficient context-sensitive heap analysis algorithm +called Data Structure Analysis designed to enable analyses and transformations +on entire disjoint recursive data structures. The analysis has several +challenging properties needed to enable such transformations: +context-sensitivity with cloning (essential for proving disjointness), +field-sensitivity, and the use of an explicit heap model rather +than just alias information. It is also applicable to arbitrary C programs. To +our knowledge no prior work provides all these properties and is +efficient and scalable enough for large programs. Measurements for 29 programs +show that the algorithm is extremely fast, space-efficient, and scales almost +linearly across 3 orders-of-magnitude of code size. ++ +
+ "Data Structure Analysis: An Efficient Context-Sensitive Heap Analysis", + Chris Lattner & Vikram Adve+ +
+ Technical Report #UIUCDCS-R-2003-2340, Computer Science Dept., Univ. of + Illinois, Apr. 2003. +
+This document was updated on 15 November 2003 to reflect improvements to the +algorithm, and to be more clear and precise. ++ +
+ @TechReport{LattnerAdve:DSA, + Author = {Chris Lattner and Vikram Adve}, + Title = "{Data Structure Analysis: An Efficient Context-Sensitive Heap Analysis}", + Institution = "{Computer Science Dept., Univ. of Illinois at Urbana-Champaign}", + Number = {UIUCDCS-R-2003-2340}, + Type = {Tech. Report}, + Month = {Apr}, + Year = {2003} + } ++ + + diff --git a/static/pubs/2003-04-29-DataStructureAnalysisTR.pdf b/static/pubs/2003-04-29-DataStructureAnalysisTR.pdf new file mode 100644 index 0000000..61c85ed Binary files /dev/null and b/static/pubs/2003-04-29-DataStructureAnalysisTR.pdf differ diff --git a/static/pubs/2003-04-29-DataStructureAnalysisTR.ps b/static/pubs/2003-04-29-DataStructureAnalysisTR.ps new file mode 100644 index 0000000..21747da Binary files /dev/null and b/static/pubs/2003-04-29-DataStructureAnalysisTR.ps differ diff --git a/static/pubs/2003-05-01-GCCSummit2003.html b/static/pubs/2003-05-01-GCCSummit2003.html new file mode 100644 index 0000000..99039a3 --- /dev/null +++ b/static/pubs/2003-05-01-GCCSummit2003.html @@ -0,0 +1,74 @@ + + + + + +
+This paper presents a design and implementation of a whole-program +interprocedural optimizer built in the GCC framework. Through the introduction +of a new language-independent intermediate representation, we extend the current +GCC architecture to include a powerful mid-level optimizer and add link-time +interprocedural analysis and optimization capabilities. This intermediate +representation is an SSA-based, low-level, strongly-typed, representation which +is designed to support both efficient global optimizations and high-level +analyses. Because most of the program is available at link-time, aggressive +``whole-program'' optimizations and analyses are possible, improving the time +and space requirements of compiled programs. The final proposed organization of +GCC retains the important features which make it successful today, requires +almost no modification to either the front- or back-ends of GCC, and is +completely compatible with user makefiles. ++ +
+ "Architecture For a Next-Generation GCC", Chris Lattner & + Vikram Adve,+ +
+ First Annual GCC Developers' + Summit, Ottawa, Canada, May 2003.
+
+ @InProceedings{LattnerAdve:GCCSummit03, + Author = {Chris Lattner and Vikram Adve}, + Title = "{Architecture for a Next-Generation GCC}", + Booktitle = "{Proc. First Annual GCC Developers' Summit}", + Address = {Ottawa, Canada}, + Month = {May}, + Year = {2003}, + URL = {http://llvm.cs.uiuc.edu/pubs/2003-05-01-GCCSummit2003.html} + } ++ + + diff --git a/static/pubs/2003-05-01-GCCSummit2003.pdf b/static/pubs/2003-05-01-GCCSummit2003.pdf new file mode 100644 index 0000000..2728ff7 Binary files /dev/null and b/static/pubs/2003-05-01-GCCSummit2003.pdf differ diff --git a/static/pubs/2003-05-01-GCCSummit2003.ps b/static/pubs/2003-05-01-GCCSummit2003.ps new file mode 100644 index 0000000..5971d86 Binary files /dev/null and b/static/pubs/2003-05-01-GCCSummit2003.ps differ diff --git a/static/pubs/2003-05-01-GCCSummit2003pres.pdf b/static/pubs/2003-05-01-GCCSummit2003pres.pdf new file mode 100644 index 0000000..adde2ca Binary files /dev/null and b/static/pubs/2003-05-01-GCCSummit2003pres.pdf differ diff --git a/static/pubs/2003-05-01-GCCSummit2003pres.ppt b/static/pubs/2003-05-01-GCCSummit2003pres.ppt new file mode 100644 index 0000000..3885655 Binary files /dev/null and b/static/pubs/2003-05-01-GCCSummit2003pres.ppt differ diff --git a/static/pubs/2003-05-05-LCTES03-CodeSafety.html b/static/pubs/2003-05-05-LCTES03-CodeSafety.html new file mode 100644 index 0000000..3b00bae --- /dev/null +++ b/static/pubs/2003-05-05-LCTES03-CodeSafety.html @@ -0,0 +1,77 @@ + + + + + +
+Traditional approaches to enforcing memory safety of programs rely heavily on +runtime checks of memory accesses and on garbage collection, both of which are +unattractive for embedded applications. The long-term goal of our work is to +enable 100% static enforcement of memory safety for embedded programs through +advanced compiler techniques and minimal semantic restrictions on programs. The +key result of this paper is a compiler technique that ensures memory safety of +dynamically allocated memory without programmer annotations, runtime checks, +or garbage collection, and works for a large subclass of type-safe C +programs. The technique is based on a fully automatic pool allocation (i.e., +region-inference) algorithm for C programs we developed previously, and it +ensures safety of dynamically allocated memory while retaining explicit +deallocation of individual objects within regions (to avoid garbage collection). +For a diverse set of embedded C programs (and using a previous technique to +avoid null pointer checks), we show that we are able to statically ensure the +safety of pointer and dynamic memory usage in all these programs. We +also describe some improvements over our previous work in static checking of +array accesses. Overall, we achieve 100% static enforcement of memory safety +without new language syntax for a significant subclass of embedded C programs, +and the subclass is much broader if array bounds checks are ignored. ++ +
+ "Memory Safety Without Runtime Checks or Garbage Collection", Dinakar + Dhurjati, Sumant Kowshik, Vikram Adve & Chris Lattner,+ +
+ LCTES 2003, San + Diego, CA, June 2003.
+
+ @InProceedings{DKAL:LCTES03, + Author = {Dinakar Dhurjati, Sumant Kowshik, Vikram Adve and Chris Lattner}, + Title = "{Memory Safety Without Runtime Checks or Garbage Collection}", + Booktitle = "{Proc. Languages Compilers and Tools for Embedded Systems 2003}", + Address = {San Diego, CA}, + Month = {June}, + Year = {2003}, + URL = {http://llvm.cs.uiuc.edu/pubs/2003-05-05-LCTES03-CodeSafety.html} + } ++ +
+Cross-procedure tracing implies finding frequently executing paths in a +program that span +one or more procedures. A runtime optimizer detects such paths and optimizes +them to +improve overall program performance. Even though several instrumentation +techniques exist +for tracing programs offline, most have high overheads which makes them unsuitable +for +runtime optimization. In this work we present a lightweight technique to +detect cross- +procedure traces at runtime. ++ +
+ Lightweight, Cross-Procedure Tracing for Runtime Optimization, Anand Shukla.+ +
+ Masters Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, July 2003. +
+ @MastersThesis{Shukla:MSThesis03, + author = {Anand Shukla}, + title = "{Lightweight, Cross-Procedure Tracing for Runtime Optimization}", + school = "{Computer Science Dept., University of Illinois at Urbana-Champaign}", + year = {2003}, + address = {Urbana, IL}, + month = {July}, + note = {{\em See {\tt http://llvm.cs.uiuc.edu}.}} + } ++ + + diff --git a/static/pubs/2003-07-18-ShuklaMSThesis.pdf b/static/pubs/2003-07-18-ShuklaMSThesis.pdf new file mode 100644 index 0000000..7a47928 Binary files /dev/null and b/static/pubs/2003-07-18-ShuklaMSThesis.pdf differ diff --git a/static/pubs/2003-07-18-ShuklaMSThesis.ps b/static/pubs/2003-07-18-ShuklaMSThesis.ps new file mode 100644 index 0000000..a863dcd Binary files /dev/null and b/static/pubs/2003-07-18-ShuklaMSThesis.ps differ diff --git a/static/pubs/2003-07-18-StanleyMSThesis.html b/static/pubs/2003-07-18-StanleyMSThesis.html new file mode 100644 index 0000000..a784098 --- /dev/null +++ b/static/pubs/2003-07-18-StanleyMSThesis.html @@ -0,0 +1,73 @@ + + + + + +
++ ++Modern software development practices lack portable, precise and powerful +mechanisms for describing performance properties of application code. +Traditional approaches rely almost solely on performance instrumentation +libraries, which have significant drawbacks in certain types (e.g., adaptive) of +applications, present the end user with integration challenges and complex APIs, +and often pose portability problems of their own. This thesis proposes a small +set of C-like language extensions that facilitate the treatment of performance +properties as intrinsic properties of application code. The proposed language +extensions allow the application developer to encode performance expectations, +gather and aggregate various types of performance information, and more, all at +the language level. Furthermore, this thesis demonstrates many novel compiler +implementation techniques that make the the presented approach possible with an +arbitrary (third-party) compiler, and that minimize performance perturbation by +enabling compiler optimizations that are commonly inhibited by traditional +approaches. This thesis describes the fundamental contribution of +language-level performance properties, the language extensions themselves, the +implementation of the compilation and runtime system, together with a standard +library of widely-used metrics, and demonstrates the role that the extensions +and compilation system can play in describing the performance-oriented aspects +of both a production-quality raytracing application and a long-running adaptive +server code. +
+ "Language Extensions for Performance-Oriented Programming", Joel Stanley.+ +
+ + Masters Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, July 2003. +
+ @MastersThesis{Stanley:MSThesis03, + author = {Joel Stanley}, + title = "{Language Extensions for Performance-Oriented Programming}", + school = "{Computer Science Dept., University of Illinois at Urbana-Champaign}", + year = {2003}, + address = {Urbana, IL}, + month = {July}, + note = {{\em See {\tt http://llvm.cs.uiuc.edu}.}} + } ++ + + diff --git a/static/pubs/2003-07-18-StanleyMSThesis.pdf b/static/pubs/2003-07-18-StanleyMSThesis.pdf new file mode 100644 index 0000000..25216c9 Binary files /dev/null and b/static/pubs/2003-07-18-StanleyMSThesis.pdf differ diff --git a/static/pubs/2003-07-18-StanleyMSThesis.ps b/static/pubs/2003-07-18-StanleyMSThesis.ps new file mode 100644 index 0000000..4e7fee8 Binary files /dev/null and b/static/pubs/2003-07-18-StanleyMSThesis.ps differ diff --git a/static/pubs/2003-09-30-LifelongOptimizationTR.html b/static/pubs/2003-09-30-LifelongOptimizationTR.html new file mode 100644 index 0000000..306f2ac --- /dev/null +++ b/static/pubs/2003-09-30-LifelongOptimizationTR.html @@ -0,0 +1,79 @@ + + + + + +
+This paper describes LLVM (Low Level Virtual Machine), a compiler framework +designed to support transparent, lifelong program analysis and +transformation for arbitrary programs, by providing high-level information +to compiler transformations at compile-time, link-time, run-time, and offline. +LLVM defines a common, low-level code representation in Static Single Assignment +(SSA) form, with several novel features: a simple, language-independent +type-system that exposes the primitives commonly used to implement high-level +language features; an instruction for typed address arithmetic; and a simple +mechanism that can be used to implement the exception handling features of +high-level languages (and setjmp/longjmp in C) uniformly and +efficiently. The LLVM compiler framework and code representation together +provide a combination of key capabilities that are important for practical, +lifelong analysis and transformation of programs. To our knowledge, no existing +compilation approach provides all these capabilities. We describe the design of +the LLVM representation and compiler framework, and evaluate the design in three +ways: (a) the size and effectiveness of the representation, including the type +information it provides; (b) compiler performance for several interprocedural +problems; and (c) illustrative examples of the benefits LLVM provides for +several challenging compiler problems. ++ +
+ "LLVM: A Compilation Framework for Lifelong Program Analysis & + Transformation", Chris Lattner & Vikram Adve+ +
+ Technical Report #UIUCDCS-R-2003-2380, Computer Science Dept., Univ. of + Illinois, Sep. 2003. +
This paper is an early version of the paper published in CGO'04, and is + superseded by it.
+ ++ @TechReport{LattnerAdve:LifeLong, + Author = {Chris Lattner and Vikram Adve}, + Title = "{LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation}", + Institution = "{Computer Science Dept., Univ. of Illinois at Urbana-Champaign}", + Number = {UIUCDCS-R-2003-2380}, + Type = {Tech. Report}, + Month = {Sep}, + Year = {2003} + } ++ + + diff --git a/static/pubs/2003-09-30-LifelongOptimizationTR.pdf b/static/pubs/2003-09-30-LifelongOptimizationTR.pdf new file mode 100644 index 0000000..45f6f3b Binary files /dev/null and b/static/pubs/2003-09-30-LifelongOptimizationTR.pdf differ diff --git a/static/pubs/2003-09-30-LifelongOptimizationTR.ps b/static/pubs/2003-09-30-LifelongOptimizationTR.ps new file mode 100644 index 0000000..58c8548 Binary files /dev/null and b/static/pubs/2003-09-30-LifelongOptimizationTR.ps differ diff --git a/static/pubs/2003-10-01-LLVA.html b/static/pubs/2003-10-01-LLVA.html new file mode 100644 index 0000000..6d7e7b2 --- /dev/null +++ b/static/pubs/2003-10-01-LLVA.html @@ -0,0 +1,76 @@ + + + + + +
+A virtual instruction set architecture (V-ISA) implemented via a +processor-specific software translation layer can provide great flexibility to +processor designers. Recent examples such as Crusoe and DAISY, however, have +used existing hardware instruction sets as virtual ISAs, which complicates +translation and optimization. In fact, there has been little research on +specific designs for a virtual ISA for processors. This paper proposes a novel +virtual ISA (LLVA) and a translation strategy for implementing it on arbitrary +hardware. The instruction set is typed, uses an infinite virtual register set +in Static Single Assignment form, and provides explicit control-flow and +dataflow information, and yet uses low-level operations closely matched to +traditional hardware. It includes novel mechanisms to allow more flexible +optimization of native code, including a flexible exception model and minor +constraints on self-modifying code. We propose a translation strategy that +enables offline translation and transparent offline caching of native code and +profile information, while remaining completely OS-independent. It also +supports optimizations directly on the representation at install-time, runtime, +and offline between executions. We show experimentally that the virtual ISA is +compact, it is closely matched to ordinary hardware instruction sets, and +permits very fast code generation, yet has enough high-level information to +permit sophisticated program analyses and optimizations. ++ +
+ "LLVA: A Low-level Virtual Instruction Set Architecture", Vikram Adve, Chris + Lattner, Michael Brukman, Anand Shukla, and Brian Gaeke.+ +
+ Proceedings of the 36th annual ACM/IEEE international symposium on + Microarchitecture (MICRO-36), San Diego, California, Dec. 2003. +
+ @InProceedings{ALBSG:MICRO36, + author = {Vikram Adve and Chris Lattner and Michael Brukman and Anand Shukla and Brian Gaeke}, + title = "{LLVA: A Low-level Virtual Instruction Set Architecture}", + booktitle = "{Proceedings of the 36th annual ACM/IEEE international symposium on Microarchitecture (MICRO-36)}", + address = {San Diego, California}, + month = {Dec}, + year = {2003} + } ++ + + diff --git a/static/pubs/2003-10-01-LLVA.pdf b/static/pubs/2003-10-01-LLVA.pdf new file mode 100644 index 0000000..bcccbc4 Binary files /dev/null and b/static/pubs/2003-10-01-LLVA.pdf differ diff --git a/static/pubs/2003-10-01-LLVA.ppt b/static/pubs/2003-10-01-LLVA.ppt new file mode 100644 index 0000000..448f4c2 Binary files /dev/null and b/static/pubs/2003-10-01-LLVA.ppt differ diff --git a/static/pubs/2003-10-01-LLVA.ps b/static/pubs/2003-10-01-LLVA.ps new file mode 100644 index 0000000..126823e Binary files /dev/null and b/static/pubs/2003-10-01-LLVA.ps differ diff --git a/static/pubs/2003-11-15-DataStructureAnalysisTR.pdf b/static/pubs/2003-11-15-DataStructureAnalysisTR.pdf new file mode 100644 index 0000000..a7a98a5 Binary files /dev/null and b/static/pubs/2003-11-15-DataStructureAnalysisTR.pdf differ diff --git a/static/pubs/2003-11-15-DataStructureAnalysisTR.ps b/static/pubs/2003-11-15-DataStructureAnalysisTR.ps new file mode 100644 index 0000000..211567f Binary files /dev/null and b/static/pubs/2003-11-15-DataStructureAnalysisTR.ps differ diff --git a/static/pubs/2004-01-30-CGO-LLVM.html b/static/pubs/2004-01-30-CGO-LLVM.html new file mode 100644 index 0000000..da57f6d --- /dev/null +++ b/static/pubs/2004-01-30-CGO-LLVM.html @@ -0,0 +1,74 @@ + + + + + +
+This paper describes LLVM (Low Level Virtual Machine), +a compiler framework designed to support transparent, +lifelong program analysis and transformation for arbitrary programs, +by providing high-level information to compiler transformations +at compile-time, link-time, run-time, and in idle time between runs. +LLVM defines a +common, low-level code representation in Static Single Assignment (SSA) +form, with several novel features: +a simple, language-independent type-system that exposes the +primitives commonly used to implement high-level language features; +an instruction for typed address arithmetic; +and a simple mechanism that can be used to implement the exception handling +features of high-level languages (and setjmp/longjmp in C) uniformly and +efficiently. The LLVM compiler framework and code representation together +provide a combination of key capabilities that are important for practical, +lifelong analysis and transformation of programs. To our knowledge, +no existing compilation approach provides all these capabilities. +We describe the design of the LLVM representation and compiler framework, +and evaluate the design in three ways: +(a) the size and effectiveness of the representation, including the type +information it provides; +(b) compiler performance for several interprocedural problems; and +(c) illustrative examples of the benefits LLVM provides for several +challenging compiler problems. ++ +
Note this paper supersedes the earlier tech report.
+ ++ "LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation", Chris Lattner and Vikram Adve.+ +
+ Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar. 2004. +
+ @InProceedings{LLVM:CGO04, + author = {Chris Lattner and Vikram Adve}, + title = "{LLVM: A Compilation Framework for Lifelong Program Analysis \& Transformation}", + booktitle = "{Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04)}", + address = {Palo Alto, California}, + month = {Mar}, + year = {2004} + } ++ + + diff --git a/static/pubs/2004-01-30-CGO-LLVM.pdf b/static/pubs/2004-01-30-CGO-LLVM.pdf new file mode 100644 index 0000000..1979c5a Binary files /dev/null and b/static/pubs/2004-01-30-CGO-LLVM.pdf differ diff --git a/static/pubs/2004-01-30-CGO-LLVM.ps b/static/pubs/2004-01-30-CGO-LLVM.ps new file mode 100644 index 0000000..f74e1d7 Binary files /dev/null and b/static/pubs/2004-01-30-CGO-LLVM.ps differ diff --git a/static/pubs/2004-03-22-CGO-LLVM-Presentation.ppt b/static/pubs/2004-03-22-CGO-LLVM-Presentation.ppt new file mode 100644 index 0000000..1bc2f1c Binary files /dev/null and b/static/pubs/2004-03-22-CGO-LLVM-Presentation.ppt differ diff --git a/static/pubs/2004-03-ICDCS-Adaptions.html b/static/pubs/2004-03-ICDCS-Adaptions.html new file mode 100644 index 0000000..aee9110 --- /dev/null +++ b/static/pubs/2004-03-ICDCS-Adaptions.html @@ -0,0 +1,56 @@ + + + + + +
+Distributed applications may use sophisticated run-time +adaptation strategies to meet their performance +or quality-of-service goals. Coordinating an adaptation +that involves multiple processes can require complex +communication or synchronization, in addition to communication +in the base application. We propose conceptually +simple high-level directives and a sophisticated runtime algorithm +for coordinating adaptation automatically and +transparently in distributed applications. The coordination +directives specify when to adapt, in terms of the relative +computational progress of each relevant process. The coordination +algorithm relies on simple compiler transformations +to track the progress of the processes, and performs +the adaptive changes locally and asynchronously at each +process. Measurements of the runtime overhead of the automatic +coordination algorithm for two adaptive applications +(a parallel PDE solver and a distributed video tracking +code) show that the overhead is less than 1% of execution +time for both these codes, even with relatively frequent +adaptations, and does not grow significantly with the number +of coordinating processes. ++ +
+ "Coordinating Adaptations in Distributed Systems", Brian Ensink and Vikram Adve.+ +
+ Proceedings of the 24th International Conference on Distributed Computing Systems + (ICDCS 2004), Tokyo, Japan, March 2004 +
+ ++ +The Master/Slave Speculative Parallelization paradigm relies on the use of a highly optimized +and mostly correct version of a sequential program, called distilled code, for breaking inter-task +dependences. We describe the design and implementation of an optimization framework that can +create such distilled code within the context of an MSSP simulator. Our optimizer processes pieces +of Alpha machine code called traces and is designed to optimize code while adhering to certain +restrictions and requirements imposed by the MSSP paradigm. We describe the specific places +where our optimizer is different from that in a compiler and explain how the optimized traces are +deployed in the simulator.
+ +
+ "A Task Optimization Framework for MSSP", Anders Alexandersson.+ +
+ Masters Thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, May 2004 +
+ ++ +The Low-Level Virtual Machine (LLVM) is a collection of libraries and tools that make it easy to build compilers, optimizers, Just-In-Time code generators, and many other compiler-related programs. LLVM uses a single, language-independent virtual instruction set both as an offline code representation (to communicate code between compiler phases and to run-time systems) and as the compiler internal representation (to analyze and transform programs). This persistent code representation allows a common set of sophisticated compiler techniques to be applied at compile-time, link-time, install-time, run-time, or "idle-time" (between program runs).
+
The strengths of the LLVM infrastructure are its extremely simple design (which makes it easy to understand and use), source-language independence, powerful mid-level optimizer, automated compiler debugging support, extensibility, and its stability and reliability. LLVM is currently being used to host a wide variety of academic research projects and commercial projects. LLVM includes C and C++ front-ends (based on GCC 3.4), a front-end for a Forth-like language (Stacker), a young scheme front-end, and Java support is in development. LLVM can generate code for X86, SparcV9, PowerPC, or it can emit C code.
+This tutorial describes the LLVM virtual instruction set and the high-level design of the LLVM compiler system. To illustrate the ideas in the LLVM IR, we use a running example (by-reference to by-value argument promotion) to illustrate several important API's in the LLVM system. Next, we describe some of the key tools provided by LLVM, and mention several projects that are natural targets for the LLVM system.
+
+ "The LLVM Compiler Framework and Infrastructure Tutorial", Chris Lattner and Vikram Adve.+ +
+ LCPC'04 Mini Workshop on Compiler Research + Infrastructures, West Lafayette, Indiana, Sep. 2004. +
+@InProceedings{LattnerAdve:tutorial, + author={Chris Lattner and Vikram Adve}, + title="{The LLVM Compiler Framework and Infrastructure Tutorial}", + month={Sep}, + year={2004}, + address={West Lafayette, Indiana}, + booktitle="{LCPC'04 Mini Workshop on Compiler Research Infrastructures}" +} ++ + + + diff --git a/static/pubs/2004-09-22-LCPCLLVMTutorial.pdf b/static/pubs/2004-09-22-LCPCLLVMTutorial.pdf new file mode 100644 index 0000000..2a24fe4 Binary files /dev/null and b/static/pubs/2004-09-22-LCPCLLVMTutorial.pdf differ diff --git a/static/pubs/2004-09-22-LCPCLLVMTutorial.ppt b/static/pubs/2004-09-22-LCPCLLVMTutorial.ppt new file mode 100644 index 0000000..919310c Binary files /dev/null and b/static/pubs/2004-09-22-LCPCLLVMTutorial.ppt differ diff --git a/static/pubs/2004-Spring-AlexanderssonMSThesis.html b/static/pubs/2004-Spring-AlexanderssonMSThesis.html new file mode 100644 index 0000000..c939fff --- /dev/null +++ b/static/pubs/2004-Spring-AlexanderssonMSThesis.html @@ -0,0 +1,43 @@ + + + + + +
+ ++ +Dynamic programming languages are not generally precompiled, but are interpreted at run-time. This approach has some serious drawbacks, e.g. complex deployment, human readable source code not preserving the intellectual properties of the developers and no ability to do optimizations at compile-time or run-time.
+ +In this paper we study the possibility to precompile the Ruby language, a dynamic object-oriented language, into Low Level Virtual Machine (LLVM) code for execution by the LLVM run-time, a compiler framework for lifelong optimization of an application. The result of the project is a Ruby compiler prototype, describing the infrastructure and overall design principles to map the highly dynamic properties of the Ruby language into low-level static constructs of the LLVM language.
+ +The LLVM framework supports different hardware platforms, and by using LLVM +as the target of compilation the benefits of that portability are gained.
+
+ "RubyComp - A Ruby-to-LLVM Compiler Prototype", Anders Alexandersson.+ +
+ Masters Thesis, Division of Computer Science at the Department of + Informatics and Mathematics, University of Trollhättan/Uddevalla, Sweden +
Traditional approaches to enforcing memory +safety of programs rely heavily on run-time checks of memory accesses and +on garbage collection, both of which are unattractive for embedded +applications. The goal of our work is to develop advanced compiler +techniques for enforcing memory safety with minimal run-time overheads. In +this paper, we describe a set of compiler techniques that, together with +minor semantic restrictions on C programs and no new syntax, ensure memory +safety and provide most of the error-detection capabilities of type-safe +languages, without using garbage collection, and with no run-time software +checks, (on systems with standard hardware support for memory management). +The language permits arbitrary pointer-based data structures, explicit +deallocation of dynamically allocated memory, and restricted array +operations. One of the key results of this paper is a compiler technique +that ensures that dereferencing dangling pointers to freed memory does not +violate memory safety, without annotations, run-time checks, or garbage +collection, and works for arbitrary type-safe C programs. Furthermore, we +present a new interprocedural analysis for static array bounds checking +under certain assumptions. For a diverse set of embedded C programs, we +show that we are able to ensure memory safety of pointer and dynamic +memory usage in all these programs with no run-time software checks (on +systems with standard hardware memory protection), requiring only minor +restructuring to conform to simple type restrictions. Static array bounds +checking fails for roughly half the programs we study due to complex array +references, and these are the only cases where explicit run-time software +checks would be needed under our language and system assumptions. ++ +
+ "Memory Safety Without Garbage Collection for Embedded +Applications", Dinakar Dhurjati, Sumant Kowshik, Vikram Adve and +Chris Lattner.+ +
+In ACM Transactions in Embedded Computing Systems (TECS), February 2005. +
+@article{DKAL:TECS05, + author = {Dinakar Dhurjati and Sumant Kowshik and Vikram Adve and Chris Lattner}, + title = {Memory safety without garbage collection for embedded applications}, + journal = {Trans. on Embedded Computing Sys.}, + volume = {4}, + number = {1}, + year = {2005}, + issn = {1539-9087}, + pages = {73--111}, + URL = {http://llvm.cs.uiuc.edu/pubs/2005-02-TECS-SAFECode.html} + publisher = {ACM Press}, + address = {New York, NY, USA}, + } ++ +
+ ++ +Current implementations of software providing dynamic aspect functionality in operating system (OS) kernels are quite +restricted in the possible joinpoint types for native code they +are able to support. Most of the pro jects implementing advice for native code use basic technologies adopted from instrumentation methods which allow to provide before, after +and around joinpoints for functions. More elaborate join-points, however, are not available since support for monitoring native code execution in current CPUs is very restricted +without extensive extensions of the compiler toolchain. To +realize improved ways of aspect activation in OS kernels, we +present an architecture that provides an efficient low-level +virtual machine running on top of a microkernel system in +cooperation with an aspect deployment service to provide +novel ways of aspect activation in kernel environments. +
+
+ "Using a Low-Level Virtual Machine to Improve Dynamic Aspect Support in + Operating System Kernels"+ +
+ By Michael Engel and Bernd Freisleben.
+ Proceedings of the 4th AOSD Workshop on Aspects, Components, and Patterns + for Infrastructure Software (ACP4IS), March 14-18, Chicago, 2005 +
++ +Providing high performance for pointer-intensive programs on modern +architectures is an increasingly difficult problem for compilers. +Pointer-intensive programs are often bound by memory latency and cache +performance, but traditional approaches to these problems usually fail: +Pointer-intensive programs are often highly-irregular and the compiler has +little control over the layout of heap allocated objects.
+ +This thesis presents a new class of techniques named ``Macroscopic Data +Structure Analyses and Optimizations'', which is a new approach to the problem +of analyzing and optimizing pointer-intensive programs. Instead of analyzing +individual load/store operations or structure definitions, this approach +identifies, analyzes, and transforms entire memory structures as a unit. The +foundation of the approach is an analysis named Data Structure Analysis and a +transformation named Automatic Pool Allocation. Data Structure Analysis is a +context-sensitive pointer analysis which identifies data structures on the heap +and their important properties (such as type safety). Automatic Pool Allocation +uses the results of Data Structure Analysis to segregate dynamically allocated +objects on the heap, giving control over the layout of the data structure in +memory to the compiler.
+ +Based on these two foundation techniques, this thesis describes several +performance improving optimizations for pointer-intensive programs. First, +Automatic Pool Allocation itself provides important locality improvements for +the program. Once the program is pool allocated, several pool-specific +optimizations can be performed to reduce inter-object padding and pool overhead. +Second, we describe an aggressive technique, Automatic Pointer Compression, +which reduces the size of pointers on 64-bit targets to 32-bits or less, +increasing effective cache capacity and memory bandwidth for pointer-intensive +programs.
+ +This thesis describes the approach, analysis, and transformation of programs +with macroscopic techniques, and evaluates the net performance impact of the +transformations. Finally, it describes a large class of potential applications +for the work in fields such as heap safety and reliability, program +understanding, distributed computing, and static garbage collection.
+
+ "Macroscopic Data Structure Analysis and Optimization", Chris Lattner.+ +
+ Ph.D. Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, May 2005. +
The "book form" is useful if you plan to print this out. Print the file out +double sided, fold it in half, and staple it in the middle of the page. It +dramatically reduces the number of pages of paper used.
+ ++ @PhdThesis{Lattner:PHD, + author = {Chris Lattner}, + title = "{Macroscopic Data Structure Analysis and Optimization}", + school = "{Computer Science Dept., University of Illinois at Urbana-Champaign}", + year = {2005}, + address = {Urbana, IL}, + month = {May}, + note = {{\em See {\tt http://llvm.cs.uiuc.edu}.}} + } ++ +
This thesis is also available from the UIUC Computer Science Department as +tech report #UIUCDCS-R-2005-2536 or the UIUC Department of Engineering +#UILU-ENG-2005-1728.
+ + + diff --git a/static/pubs/2005-05-04-LattnerPHDThesis.pdf b/static/pubs/2005-05-04-LattnerPHDThesis.pdf new file mode 100644 index 0000000..46f794c Binary files /dev/null and b/static/pubs/2005-05-04-LattnerPHDThesis.pdf differ diff --git a/static/pubs/2005-05-04-LattnerPHDThesis.ps b/static/pubs/2005-05-04-LattnerPHDThesis.ps new file mode 100644 index 0000000..d702a68 Binary files /dev/null and b/static/pubs/2005-05-04-LattnerPHDThesis.ps differ diff --git a/static/pubs/2005-05-21-PLDI-PoolAlloc.html b/static/pubs/2005-05-21-PLDI-PoolAlloc.html new file mode 100644 index 0000000..28f6346 --- /dev/null +++ b/static/pubs/2005-05-21-PLDI-PoolAlloc.html @@ -0,0 +1,102 @@ + + + + + ++This paper describes Automatic Pool Allocation, a transformation +framework that segregates distinct instances of heap-based data structures +into seperate memory pools and allows heuristics to be used to +partially control the internal layout of those data structures. +The primary goal of this work is performance improvement, not automatic memory +management, and the paper makes several new contributions. The key +contribution is a new compiler algorithm for partitioning heap objects in +imperative programs based on a context-sensitive pointer analysis, including a +novel strategy for correct handling of indirect (and potentially unsafe) +function calls. The transformation does not require type safe programs and +works for the full generality of C and C++. Second, the paper describes +several optimizations that exploit data structure partitioning to further +improve program performance. Third, the paper evaluates how memory +hierarchy behavior and overall program performance are impacted by the new +transformations. Using a number of benchmarks and a few applications, we find +that compilation times are extremely low, and overall running times for heap +intensive programs speed up by 10-25% in many cases, about 2x in two +cases, and more than 10x in two small benchmarks. Overall, we believe this +work provides a new framework for optimizing pointer intensive programs by +segregating and controlling the layout of heap-based data structures. ++ +
+ "Automatic Pool Allocation: Improving Performance by Controlling Data + Structure Layout in the Heap"+ +
+ Chris Lattner and Vikram Adve.
+ Proc. of the 2005 ACM SIGPLAN Conference on Programming Language + Design and Implementation (PLDI'05), Chicago, Illinois, Jun, 2005. +
Awarded PLDI 2005 Best Paper Award
+ +Note, animations do not work in PDF version. Please use PPT version if + possible.
+A more recent and expanded version of this work is available in Chris Lattner's Ph.D. Thesis. +
+ ++ @InProceedings{PoolAlloc:PLDI05, + author = {Chris Lattner and Vikram Adve}, + title = "{Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap}", + booktitle = "{Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05)}", + address = {Chigago, Illinois}, + month = {June}, + year = {2005} + } ++ + +
+64-bit address spaces are increasingly important for modern +applications, but they come at a price: pointers use twice +as much memory, reducing the effective cache capacity and memory +bandwidth of the system (compared to 32-bit address spaces). This paper +presents a sophisticated, +automatic transformation that shrinks pointers from 64-bits +to 32-bits (and potentially less). The approach is macroscopic, i.e., it +operates on an entire logical data structure in the program at a time. It +allows an individual data structure instance or even a subset thereof +to grow up to 232 bytes in size. Furthermore, the transformation can +choose to compress some data structures in a program but not others (e.g. if +only part of the program is type-safe). We also describe (but have not +implemented) a dynamic version of the technique that can transparently expand +the pointers in an individual data structure if it exceeds the 4GB limit. +For a collection of pointer-intensive benchmarks, we show that the +transformation improves performance substantially (20% to 2x) for several of +these benchmarks, and also reduces peak heap size by a similar factor. ++ +
+ "Transparent Pointer Compression for Linked Data Structures"+ +
+ Chris Lattner and Vikram Adve.
+ Proceedings of the ACM Workshop on Memory System Performance (MSP'05), + Chicago, Illinois, June, 2005. +
Note, animations do not work in PDF version. Please use PPT version if + possible.
+The full context for this work is available in Chris Lattner's Ph.D. Thesis. +
+ ++ @InProceedings{PointerComp:MSP05, + author = {Chris Lattner and Vikram Adve}, + title = "{Transparent Pointer Compression for Linked Data Structures}", + booktitle = "{Proceedings of the ACM Workshop on Memory System Performance (MSP'05)}", + address = {Chigago, Illinois}, + month = {June}, + year = {2005} + } ++ + +
++ ++This thesis details the implementation of Swing Modulo Scheduling, a +Software Pipelining technique, that is both effective and efficient in +terms of compile time and generated code. Software Pipelining aims to +expose Instruction Level Parallelism in loops which tend to help +scientific and graphical applications.
+ +Modulo Scheduling is a category of algorithms that attempt to overlap +iterations of single basic block loops and schedule instructions based +upon a priority (derived from a set of heuristics). The approach used +by Swing Modulo Scheduling is designed to achieve a highly optimized +schedule, keeping register pressure low, and does both in a reasonable +amount of compile time.
+ +One drawback of Swing Modulo Scheduling, (and all Modulo Scheduling +algorithms) is that they are missing opportunities for further +Instruction Level Parallelism by only handling single basic block +loops. This thesis details extensions to the Swing Modulo Scheduling +algorithm to handle multiple basic block loops in the form of a +superblock. A superblock is group of basic blocks that have a single +entry and multiple exits. Extending Swing Modulo Scheduling to support +these types of loops increases the number of loops Swing Modulo +Scheduling can be applied to. In addition, it allows Modulo +Scheduling to be performed on hot paths (also single entry, multiple +exit), found with profile information to be optimized later offline or +at runtime.
+ +Our implementation of Swing Modulo Scheduling and extensions to the +algorithm for superblock loops were evaluated and found to be both +effective and efficient. For the original algorithm, benchmarks were +transformed to have performance gains of 10-33%, while the extended +algorithm increased benchmark performance from 7-22%.
+
+ An Implementation of Swing Modulo Scheduling with Extensions for Superblocks, Tanya M. Lattner.+ +
+ Masters Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, June 2005. +
+ The "book form" is useful if you plan to print this out. Print the file out + double sided, fold it in half, and staple it in the middle of the page. It + dramatically reduces the number of pages of paper used. +
+ ++ @MastersThesis{Lattner:MSThesis05, + author = {Tanya M. Lattner}, + title = "{An Implementation of Swing Modulo Scheduling with Extensions for Superblocks}", + school = "{Computer Science Dept., University of Illinois at Urbana-Champaign}", + year = {2005}, + address = {Urbana, IL}, + month = {June}, + note = {{\em See {\tt http://llvm.cs.uiuc.edu}.}} + } ++ + + diff --git a/static/pubs/2005-06-17-LattnerMSThesis.pdf b/static/pubs/2005-06-17-LattnerMSThesis.pdf new file mode 100644 index 0000000..9f902f4 Binary files /dev/null and b/static/pubs/2005-06-17-LattnerMSThesis.pdf differ diff --git a/static/pubs/2005-06-17-LattnerMSThesis.ps b/static/pubs/2005-06-17-LattnerMSThesis.ps new file mode 100644 index 0000000..8ba3e3c Binary files /dev/null and b/static/pubs/2005-06-17-LattnerMSThesis.ps differ diff --git a/static/pubs/2005-07-IDEAS-PerfEstimation.html b/static/pubs/2005-07-IDEAS-PerfEstimation.html new file mode 100644 index 0000000..0dce27d --- /dev/null +++ b/static/pubs/2005-07-IDEAS-PerfEstimation.html @@ -0,0 +1,41 @@ + + + + + +
+Performance estimation of processor is important to select the right processor for an application. Poorly chosen processors can either under perform very badly or over +perform but with high cost. Most previous work on performance estimation are based on generating the development tools, i.e., compilers, assemblers etc from a processor +description file and then additionally generating an instruction set simulator to get the performance. In this work we +present a simpler strategy for performance estimation. We +propose an estimation technique based on the intermediate format of an application. The estimation process does +not require the generation of all the development tools as in +the prevalent methods. As a result our method is not only +cheaper but also faster. ++ +
+ "Practical Techniques for Performance Estimation of Processors", Abhijit Ray, Thambipillai Srikanthan and Wu Jigang.+ +
+ Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS'05), July 2005. +
+ "Profile-directed If-Conversion in Superscalar Microprocessors"+ +
+ Eric Zimmerman
+ Masters Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, July 2005. +
+ ++ +As both programs and machines are becoming more complex, writing high +performance codes is an increasingly difficult task. In order to bridge the gap +between the compiled-code and peak performance, resorting to domain or +architecture-specific libraries has become compulsory. However, deciding when +and where to use a library function must be specified by the programmer. This +partition between library and user code is not questioned by the compiler +although it has great impact on performance. We propose in this paper a new +method that helps the user find in its application all code fragments that can +be replaced with library calls. The same technique can be used to change or +fusion multiple calls into more efficient ones. The results of the alternative +detection of BLAS 1 and 2 in SPEC are presented.
+
+ "Deciding Where to Call Performance Libraries"+ +
+ By C. Alias and D. Barthou
+ Proceedings of the International IEEE Euro-Par Conference, August, 2005
+
+ ++ +The lack of virtual memory protection is a serious source of unreliability in many embedded systems. Without the segment-level +protection it provides, these systems are subject to memory access +violations, stemming from programmer error, whose results can be +dangerous and catastrophic in safety-critical systems. The traditional method of testing embedded software before its deployment +is an insufficient means of detecting and debugging all software +errors, and the reliance on this practice is a severe gamble when +the reliable performance of the embedded device is critical. Additionally, the use of safe languages and programming semantic restrictions as prevention mechanisms is often infeasible when considering the adoptability and compatibility of these languages since +most embedded applications are written in C and C++.
+ +This work improves system reliability by providing a completely +automatic software technique for guaranteeing segment protection +for embedded systems lacking virtual memory. This is done by +inserting optimized run-time checks before memory accesses that +detect segmentation violations in cases in which there would otherwise be no error, enabling remedial action before system failure +or corruption. This feature is invaluable for safety-critical embedded systems. Other advantages of our method include its low overhead, lack of any programming language or semantic restrictions, +and ease of implementation. Our compile-time analysis, known as +intended segment analysis, is a uniquely structured analysis that allows for the realization of optimizations used to reduce the number +of required run-time checks and foster our technique into a truly +viable solution for providing segment protection in embedded systems lacking virtual memory.
+Our experimental results show that these optimizations are effective at reducing the performance overheads associated with providing software segment protection to low, and in many cases, negligible levels. For the eight evaluated embedded benchmarks, the +average increase in run-time is 0.72%, the average increase in energy consumption is 0.44%, and the average increase in code size +is 3.60%.
+
+ "Segment Protection for Embedded Systems Using Run-time Checks"+ +
+ By Matthew Simpson, Bhuvan Middha and Rajeev Barua.
+ Proceedings of the ACM International Conference on Compilers,
+ Architecture, and Synthesis for Embedded Systems (CASES),
+ San Francisco, CA, September 25-27, 2005
+
+Software testing and retesting occurs continuously during the software development lifecycle to detect errors as early as possible and +to ensure that changes to existing software do not break the software. Test suites once developed are reused and updated frequently +as the software evolves. As a result, some test cases in the test suite +may become redundant as the software is modified over time since +the requirements covered by them are also covered by other test +cases. Due to the resource and time constraints for re-executing +large test suites, it is important to develop techniques to minimize +available test suites by removing redundant test cases. In general, +the test suite minimization problem is NP complete. In this paper, +we present a new greedy heuristic algorithm for selecting a minimal +subset of a test suite T that covers all the requirements covered by +T. We show how our algorithm was inspired by the concept analysis framework. We conducted experiments to measure the extent of +test suite reduction obtained by our algorithm and prior heuristics +for test suite minimization. In our experiments, our algorithm always selected same size or smaller size test suite than that selected +by prior heuristics and had comparable time performance. ++ +
+ "A Concept Analysis Inspired Greedy Algorithm for Test Suite Minimization"+ +
+By Sriraman Tallam and Neelam Gupta
+ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and +Engineering (PASTE 2005), Lisbon, Portugal, September 5-6, 2005. +
+ ++ +Techniques for global register allocation via graph coloring have +been extensively studied and widely implemented in compiler frameworks. This +paper examines a particular variant - the Callahan Koblenz allocator - and +compares it to the Chaitin-Briggs graph coloring register allocator. Both al- +gorithms were published in the 1990's, yet the academic literature does not +contain an assessment of the Callahan-Koblenz allocator. This paper evaluates +and contrasts the allocation decisions made by both algorithms. In particular, +we focus on two key differences between the allocators: +Spill code: The Callahan-Koblenz allocator attempts to minimize the effect of +spill code by using program structure to guide allocation and spill code place- +ment. We evaluate the impact of this strategy on allocated code. +Copy elimination: Effective register-to-register copy removal is important for +producing good code. The allocators use different techniques to eliminate these +copies. We compare the mechanisms and provide insights into the relative per- +formance of the contrasting techniques. +The Callahan-Koblenz allocator may potentially insert extra branches as part +of the allocation process. We also measure the performance overhead due to +these branches. +
+
+ "Revisiting Graph Coloring Register Allocation: A Study of the Chaitin-Briggs + and Callahan-Koblenz Algorithms"+ +
+ By Keith Cooper, Anshuman Dasgupta, and Jason Eckhardt.
+ Proceedings of the Workshop on Languages and Compilers for Parallel + Computing (LCPC'05), Hawthorne, NY, October 20-22, 2005 +
+ ++ ++Static analysis of programs in weakly typed languages such as C and C++ is +generally not sound because of possible memory errors due to dangling +pointer references, uninitialized pointers, and array bounds overflow. +Optimizing compilers can produce unpredictable results when such errors +occur, but this is quite undesirable for many tools that aim to analyze +security and reliability properties with guarantees of soundness. We +describe a relatively simple compilation strategy for standard C programs +that guarantees sound semantics for an aggressive interprocedural pointer +analysis (or simpler ones), a call graph, and type information for a +subset of memory. These provide the foundation for sophisticated static +analyses to be applied to such programs with a guarantee of soundness. Our +work builds on a previously published transformation called Automatic Pool +Allocation to ensure that hard-to-detect memory errors (dangling pointer +references and certain array bounds errors) cannot invalidate the call +graph, points-to information or type information. The key insights behind +our approach is that pool allocation can be used to create a run-time +partitioning of memory that matches the compile-time memory partitioning +in a points-to graph, and efficient checks can be used to isolate the +run-time partitions. Furthermore, we show that the sound analysis +information enables static checking techniques that reliably eliminate +many run-time checks. We formalize our approach as a new type system with +the necessary run-time checks in operational semantics and prove the +correctness of our approach for a subset of C. Our approach requires no +source code changes, allows memory to be managed explicitly, and does not +use meta-data on pointers or individual tag bits for memory. Using several +benchmarks and system codes, we show experimentally that the run-time +overheads are low (less than 10% in nearly all cases and 30% in the worst +case we have seen). We also show the effectiveness of reliable static +analyses for eliminating run-time checks. + +
+
+ "Enforcing Alias Analysis for Weakly Typed Languages"+ +
+ By Dinakar Dhurjati, Sumant Kowshik, and Vikram Adve.
+ Technical Report #UIUCDCS-R-2005-2657, Computer Science Dept., Univ. of + Illinois, Nov. 2005 +
+@techreport{dhurjati05enforcing, + author = {Dinakar Dhurjati and Sumant Kowshik and Vikram Adve}, + title = "{Enforcing Alias Analysis for Weakly Typed Languages}", + institution = {Computer Science Dept., Univ. of Illinois}, + year = {2005}, + month = {Nov}, + number = "{\#UIUCDCS-R-2005-2657}", + url = {http://llvm.org/pubs/2005-11-SAFECodeTR.html} +} ++ + + diff --git a/static/pubs/2005-11-SAFECodeTR.pdf b/static/pubs/2005-11-SAFECodeTR.pdf new file mode 100644 index 0000000..cd460a9 Binary files /dev/null and b/static/pubs/2005-11-SAFECodeTR.pdf differ diff --git a/static/pubs/2005-TR-DSAEvaluation.html b/static/pubs/2005-TR-DSAEvaluation.html new file mode 100644 index 0000000..54cf257 --- /dev/null +++ b/static/pubs/2005-TR-DSAEvaluation.html @@ -0,0 +1,74 @@ + + + + + +
+This report describes a set of experiments to evaluate qualitatively the +effectiveness of Data Structure Analysis (DSA) in identifying properties of a +program's data structures. We manually inspected several benchmarks to +identify linked data structures and their properties, and compared these +against the results produced by DSA. The properties we considered are those +that were the primary goals of DSA: +distinguishing different kinds of data structures, +distinct instances of a particular kind, +type information for objects within an LDS, +and information about the lifetime of such objects (particularly, those +local to a function rather than global). +We define a set of metrics for the DS graphs computed by DSA that we use to +summarize our results concisely for each benchmark. +The results of the study are summarized in the last section. ++ + + +
+ @TechReport{DSAEvaluation:TR05, + Author = {Patrick Meredith and Balpreet Pankaj and Swarup Sahoo and + Chris Lattner and Vikram Adve}, + Title = "How Successful Is Data Structure Analysis in Isolating and +Analyzing Linked Data Structures?", + Institution= {Computer Science Dept., + Univ. of Illinois at Urbana-Champaign}, + Number = {UIUCDCS-R-2005-2658}, + Type = {Tech. Report}, + Month = {Nov}, + Year = {2005} +} ++ + +
++ ++Modern network processors (NPs) typically resemble a highly-multithreaded multiprocessor-on- +a-chip, supporting a wide variety of mechanisms for on-chip storage and inter-task communication. +NP applications are themselves composed of many threads that share memory and other resources, +and synchronize and communicate frequently. In contrast, studies of new NP architectures and fea- +tures are often performed by benchmarking a simulation model of the new NP using independent +kernel programs that neither communicate nor share memory. In this paper we present a NP sim- +ulation infrastructure that (i) uses realistic NP applications that are multithreaded, share memory, +synchronize, and communicate; and (ii) automatically maps these applications to a variety of NP +architectures and features. We use our infrastructure to evaluate threading and scaling, on-chip +storage and communication, and to suggest future techniques for automated compilation for NPs.
+
+ "Towards a Compilation Infrastructure for Network Processors"+ +
+ Martin Labrecque
+ Masters Thesis, Department of Electrical and Computer Engineering, University of Toronto, January, 2006. +
+@MASTERSTHESIS{np_thesis06, + author = {Martin Labrecque}, + title = {Towards a Compilation Infrastructure for Network Processors}, + school = {University of Toronto}, + year = {2006}, +}+ + + diff --git a/static/pubs/2006-01-LabrecqueMSThesis.pdf b/static/pubs/2006-01-LabrecqueMSThesis.pdf new file mode 100644 index 0000000..f6e1cbb Binary files /dev/null and b/static/pubs/2006-01-LabrecqueMSThesis.pdf differ diff --git a/static/pubs/2006-04-04-CGO-GraphColoring.html b/static/pubs/2006-04-04-CGO-GraphColoring.html new file mode 100644 index 0000000..d8ad058 --- /dev/null +++ b/static/pubs/2006-04-04-CGO-GraphColoring.html @@ -0,0 +1,47 @@ + + + + + +
+Just-in-time compilers are invoked during application +execution and therefore need to ensure fast compilation +times. Consequently, runtime compiler designers are averse +to implementing compile-time intensive optimization algorithms. Instead, they tend to select faster but less effective +transformations. In this paper, we explore this trade-off for +an important optimization â global register allocation. We +present a graph-coloring register allocator that has been +redesigned for runtime compilation. Compared to Chaitin-Briggs [7], a standard graph-coloring technique, the reformulated algorithm requires considerably less allocation +time and produces allocations that are only marginally +worse than those of Chaitin-Briggs. Our experimental results indicate that the allocator performs better than the +linear-scan and Chaitin-Briggs allocators on most benchmarks in a runtime compilation environment. By increasing +allocation efficiency and preserving optimization quality, +the presented algorithm increases the suitability and profitability of a graph-coloring register allocation strategy for +a runtime compiler. ++ +
+ "Tailoring Graph-coloring Register Allocation For Runtime Compilation", Keith D. Cooper and Anshuman Dasgupta.+ +
+ Proceedings of the 2006 International Symposium on Code Generation and Optimization (CGO'06), New York, New York, 2006. +
+This invited talk gives a high-level overview of the LLVM Project, its +capabilities, features, progress, and the direction it is taking. It is +aimed at a GCC-centric audience, specifically to follow up a presentation +on the GCC Link-Time-Optimization proposal at the 2006 Gelato Itanium +Conference and Expo (ICE). ++ +
+ "Introduction to the LLVM Compiler Infrastructure", Chris Lattner,+ +
+ 2006 Itanium Conference and Expo, San Jose, California, April 2006.
+
+Static analysis of programs in weakly typed languages such as C and C++ is +generally not sound because of possible memory errors due to dangling pointer +references, uninitialized pointers, and array bounds overflow. We describe a +compilation strategy for standard C programs that guarantees that aggressive +interprocedural pointer analysis (or less precise ones), a call graph, and type +information for a subset of memory, are never invalidated by any possible +memory errors. We formalize our approach as a new type system with the +necessary run-time checks in operational semantics and prove the correctness of +our approach for a subset of C. Our semantics provide the foundation for other +sophisticated static analyses to be applied to C programs with a guarantee of +soundness. Our work builds on a previously published transformation called +Automatic Pool Allocation to ensure that hard-to-detect memory errors (dangling +pointer references and certain array bounds errors) cannot invalidate the call +graph, points-to information or type information. The key insight behind our +approach is that pool allocation can be used to create a run-time partitioning +of memory that matches the compile-time memory partitioning in a points-to +graph, and efficient checks can be used to isolate the run-time partitions. +Furthermore, we show that the sound analysis information enables static +checking techniques that eliminate many run-time checks. Our approach requires +no source code changes, allows memory to be managedexplicitly, and does not use +meta-data on pointers or individual tag bits for memory. Using several +benchmark s and system codes, we show experimentally that the run-time +overheads are low (less than 10% in nearly all cases and 30% in the worst case +we have seen).We also show the effectiveness of static analyses in eliminating +run-time checks. ++ +
+@inproceedings{1133999, + author = {Dinakar Dhurjati and Sumant Kowshik and Vikram Adve}, + title = {SAFECode: enforcing alias analysis for weakly typed languages}, + booktitle = {PLDI '06: Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation}, + year = {2006}, + isbn = {1-59593-320-4}, + pages = {144--157}, + location = {Ottawa, Ontario, Canada}, + doi = {http://doi.acm.org/10.1145/1133981.1133999}, + publisher = {ACM}, + address = {New York, NY, USA}, +} ++ + +
+The problem of enforcing correct usage of array and pointer references in C +and C++ programs remains unsolved. The approach proposed by Jones and Kelly +(extended by Ruwase and Lam) is the only one we know of that does not require +significant manual changes to programs, but it has extremely high overheads of +5x-6x and 11x--12x in the two versions. In this paper, we describe a +collection of techniques that dramatically reduce the overhead of this +approach, by exploiting a fine-grain partitioning of memory called Automatic +Pool Allocation. Together, these techniques bring the average overhead checks +down to only 12\% for a set of benchmarks (but 69\% for one case). +We show that the memory partitioning is key to bringing down this overhead. +We also show that our technique successfully detects all buffer overrun +violations in a test suite modeling reported violations in some important +real-world programs. ++ +
+ "Backwards-Compatible Array Bounds Checking for C + with Very Low Overhead", Dinakar Dhurjati and Vikram Adve.+ +
+ Proceedings of the 28th International Conference on Software Engineering (ICSE '06), Shanghai, China, 2006. +
+@techreport{da06icse, + author = {Dinakar Dhurjati and Vikram Adve}, + title = "{Backwards-Compatible Array Bounds Checking for C with Very Low Overhead}", + booktitle = "{Proceedings of the 2006 International Conference on Software Engineering (ICSE'06)}", + address = {Shanghai, China}, + month = {May}, + year = {2006} + url = {http://llvm.org/pubs/2006-05-24-SAFECode-BoundsCheck.html} +} ++ + + diff --git a/static/pubs/2006-05-24-SAFECode-BoundsCheck.pdf b/static/pubs/2006-05-24-SAFECode-BoundsCheck.pdf new file mode 100644 index 0000000..7f39ab1 Binary files /dev/null and b/static/pubs/2006-05-24-SAFECode-BoundsCheck.pdf differ diff --git a/static/pubs/2006-06-07-LewyckyChecker.html b/static/pubs/2006-06-07-LewyckyChecker.html new file mode 100644 index 0000000..5e4c7b6 --- /dev/null +++ b/static/pubs/2006-06-07-LewyckyChecker.html @@ -0,0 +1,56 @@ + + + + + +
++ +Automated software analysis is the process of testing program source +code against a set of conditions. These may be as simple as verifying +the coding standards, or as complicated as new languages which are +formally verifiable by a theorem solver.
+ +Checker is able to find two small classes of errors, one is memory +faults, the other, non-deterministic behaviour. Lacking interprocedural +analysis, checker can not be applied to real-world software.
+
+ Checker: a Static Program Checker.+ +
+ Undergraduate Thesis, Computer Science Dept., Ryerson University, June 2006. +
+ @MastersThesis{Lewycky:Checker06, + author = {Nicholas Lewycky}, + title = "{Checker: a Static Program Checker}", + school = "{Computer Science Dept., Ryerson University}", + year = {2006}, + address = {Toronto, ON}, + month = {June}, + note = {{\em See {\tt http://wagon.no-ip.org/checker}.}} + } ++ + + diff --git a/static/pubs/2006-06-07-LewyckyChecker.pdf b/static/pubs/2006-06-07-LewyckyChecker.pdf new file mode 100644 index 0000000..76f07b6 Binary files /dev/null and b/static/pubs/2006-06-07-LewyckyChecker.pdf differ diff --git a/static/pubs/2006-06-07-LewyckyChecker.ps b/static/pubs/2006-06-07-LewyckyChecker.ps new file mode 100644 index 0000000..7c9ba3c Binary files /dev/null and b/static/pubs/2006-06-07-LewyckyChecker.ps differ diff --git a/static/pubs/2006-06-12-PLDI-SAFECode.html b/static/pubs/2006-06-12-PLDI-SAFECode.html new file mode 100644 index 0000000..5607389 --- /dev/null +++ b/static/pubs/2006-06-12-PLDI-SAFECode.html @@ -0,0 +1,83 @@ + + + + + +
+Static analysis of programs in weakly typed languages such as C and C++ is +generally not sound because of possible memory errors due to dangling pointer +references, uninitialized pointers, and array bounds overflow. We describe a +compilation strategy for standard C programs that guarantees that aggressive +interprocedural pointer analysis (or less precise ones), a call graph, and type +information for a subset of memory, are never invalidated by any possible +memory errors. We formalize our approach as a new type system with the +necessary run-time checks in operational semantics and prove the correctness of +our approach for a subset of C. Our semantics provide the foundation for other +sophisticated static analyses to be applied to C programs with a guarantee of +soundness. Our work builds on a previously published transformation called +Automatic Pool Allocation to ensure that hard-to-detect memory errors (dangling +pointer references and certain array bounds errors) cannot invalidate the call +graph, points-to information or type information. The key insight behind our +approach is that pool allocation can be used to create a run-time partitioning +of memory that matches the compile-time memory partitioning in a points-to +graph, and efficient checks can be used to isolate the run-time partitions. +Furthermore, we show that the sound analysis information enables static +checking techniques that eliminate many run-time checks. Our approach requires +no source code changes, allows memory to be managedexplicitly, and does not use +meta-data on pointers or individual tag bits for memory. Using several +benchmark s and system codes, we show experimentally that the run-time +overheads are low (less than 10% in nearly all cases and 30% in the worst case +we have seen).We also show the effectiveness of static analyses in eliminating +run-time checks. ++ +
+@inproceedings{1133999, + author = {Dinakar Dhurjati and Sumant Kowshik and Vikram Adve}, + title = {SAFECode: enforcing alias analysis for weakly typed languages}, + booktitle = {PLDI '06: Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation}, + year = {2006}, + isbn = {1-59593-320-4}, + pages = {144--157}, + location = {Ottawa, Ontario, Canada}, + doi = {http://doi.acm.org/10.1145/1133981.1133999}, + publisher = {ACM}, + address = {New York, NY, USA}, +} ++ + +
+We present Vector LLVA, a virtual instruction set architecture (V-ISA) +that exposes extensive static information about vector parallelism +while avoiding the use of hardware-specific parameters. We provide +both arbitrary-length vectors (for targets that allow vectors of +arbitrary length, or where the target length is not known) and +fixed-length vectors (for targets that have a fixed vector length, +such as subword SIMD extensions), together with a rich set of +operations on both vector types. We have implemented translators that +compile (1) Vector LLVA written with arbitrary-length vectors to the +Motorola RSVP architecture and (2) Vector LLVA written with +fixed-length vectors to both AltiVec and Intel SSE2. Our +translator-generated code achieves speedups competitive with +handwritten native code versions of several benchmarks on all three +architectures. These experiments show that our V-ISA design captures +vector parallelism for two quite different classes of architectures +and provides virtual object code portability within the class of +subword SIMD architectures. ++ +
+ "Vector LLVA: A Virtual Vector Instruction Set for Media Processing", Robert L. Bocchino Jr. and Vikram S. Adve.+ +
+ Proceedings of the Second International Conference on Virtual Execution Environments (VEE '06), Ottawa, Canada, 2006. +
+In this paper, we propose and evaluate a virtual instruction set interface +between an operating system (OS) kernel and a general purpose processor +architecture. This interface is a set of operations added to a previously +proposed virtual instruction set architecture called LLVA (Low Level Virtual +Architecture) and can be implemented on existing processor hardware. The +long-term benefits of such an interface include +richer OS-information for hardware, +greater flexibility in evolving hardware, +and +sophisticated analysis and optimization capabilities for kernel code, +including +across the application-kernel boundary transformations. +Our interface design (LLVA-OS) contains +several novel features for machine-independence and performance, including +efficient saving and restoring of (hidden) native state, +mechanisms for error recovery, and several primitive +abstractions that expose semantic information to the underlying translator and +hardware. We describe the design, a prototype implementation of LLVA-OS +on x86, and our experience porting the Linux 2.4.22 kernel to LLVA-OS. +We perform a performance evaluation of this kernel, identifying and +explaining the root causes of key sources of virtualization overhead. ++ +
+ "A Virtual Instruction Set Interface for Operating System Kernels", John Criswell, Brent Monroe, and Vikram Adve.+ +
+ Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA '06), Boston, Massachusetts, 2006. +
+With the rapid increase of complexity in System-on-a-Chip (SoC) design, +the electronic design automation (EDA) +community is moving from RTL (Register Transfer Level) +synthesis to behavioral-level and system-level synthesis. The +needs of system-level verification and software/hardware co- +design also prefer behavior-level executable specifications, such +as C or SystemC. In this paper we present the platform-based +synthesis system, named xPilot, being developed at UCLA. The +first objective of xPilot is to provide novel behavioral synthesis +capability for automatically generating efficient RTL code from +a C or SystemC description for a given system platform and +optimizing the logic, interconnects, performance, and power +simultaneously. The second objective of xPilot is to provide a +platform-based system-level synthesis capability, including both +synthesis for application-specific configurable processors and +heterogeneous multi-core systems. Preliminary experiments on +FPGAs demonstrate the efficacy of our approach on a wide range +of applications and its value in exploring various design tradeoffs. + ++ +
+"Platform-Based Behavior-Level and System-Level Synthesis"+ +
+J. Cong, Y. Fan, G. Han, W. Jiang, and Z. Zhang
+Proceedings of IEEE International SOC Conference, pp. 199-202, Austin, Texas, Sept. 2006. +
+Random access memory (RAM) is tightly-constrained in many +embedded systems. This is especially true for the least expensive, lowest-power embedded systems, such as sensor network +nodes and portable consumer electronics. The most widely-used sensor network nodes have only 4-10 KB of RAM and +do not contain memory management units (MMUs). It is very +difficult to implement increasingly complex applications under +such tight memory constraints. Nonetheless, price and power +consumption constraints make it unlikely that increases in RAM +in these systems will keep pace with the requirements of applications.+ ++ +We propose the use of automated compile-time and run-time +techniques to increase the amount of usable memory in MMU-less embedded systems. The proposed techniques do not in- +crease hardware cost, and are designed to require few or no +changes to existing applications. We have developed a fast compression algorithm well suited to this application, as well as run-time library routines and compiler transformations to control +and optimize the automatic migration of application data between compressed and uncompressed memory regions. These +techniques were experimentally evaluated on Crossbow TelosB +sensor network nodes running a number of data collection and +signal processing applications. The results indicate that available memory can be increased by up to 50% with less than 10% +performance degradation for most benchmarks. +
+ "Automated Compile-Time and Run-Time Techniques to Increase Usable Memory in MMU-Less Embedded Systems"+ +
+L. Bai, L. Yang, and R. P. Dick
+Proc. Int. Conf. Compilers, Architecture & Synthesis for Embedded Systems, +pp. 125-135, Oct. 2006.
+
+The PyPy project seeks to prove both on a research and a practical level the feasibility of constructing a virtual machine (VM) for a dynamic language in a dynamic language - in this case, Python. The aim is to translate (i.e. compile) the VM to arbitrary target environments, ranging in level from C/Posix to Smalltalk/Squeak via Java and CLI/.NET, while still being of reasonable efficiency within these environments.A key tool to achieve this goal is the systematic reuse of the Python language as a system programming language at various levels of our architecture and translation process. For each level, we design a corresponding type system and apply a generic type inference engine - for example, the garbage collector is written in a style that manipulates simulated pointer and address objects, and when translated to C these operations become C-level pointer and address instructions. + ++ +
+ "PyPy's Approach to Virtual Machine Construction" ++
+ Armin Rigo and Samuele Pedroni. +
+ +Dynamic Languages Symposium (DLS'06) +, Portland, Oregon, October 2006. +
+@inproceedings{1176753, + author = {Rigo, Armin and Pedroni, Samuele}, + title = {PyPy's approach to virtual machine construction}, + booktitle = {OOPSLA '06: Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications}, + year = {2006}, + isbn = {1-59593-491-X}, + pages = {944--953}, + location = {Portland, Oregon, USA}, + doi = {http://doi.acm.org/10.1145/1176617.1176753}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Modern network processors (NPs) are highly multithreaded chip multiprocessors (CMPs), supporting +a wide variety of mechanisms for on-chip storage and inter-task communication. Real network processor +applications are hard to program and must be tailored to fit the resources of the underlying NP, motivating +an automated approach to mapping multithreaded applications to NPs. In this paper we propose and evaluate +compiler-based automated task and data management techniques to scale the throughput of network +processing task graphs onto NPs. We evaluate these techniques using a NP simulation infrastructure based +on realistic NP applications, and present an approach to discovering performance bottlenecks. Finally we +demonstrate how our techniques enhance throughput-scaling for NPs. ++ +
+ "Scaling Task Graphs for Network Processors"+ +
+ Martin Labrecque and J. Gregory Steffan
+ IFIP International Conference on Network and Parallel Computing, Tokyo, + Japan, October, 2006.
+
+@INPROCEEDINGS{scaling06, + author = {Martin Labrecque and J. Gregory Steffan}, + title = {Scaling Task Graphs for Network Processors}, + booktitle = {IFIP International Conference on Network and Parallel Computing}, + year = {2006}, + address = {Tokyo, Japan}, + month = {October}, +} ++ + + diff --git a/static/pubs/2006-10-ICNPC-ScalingTaskGraphs.pdf b/static/pubs/2006-10-ICNPC-ScalingTaskGraphs.pdf new file mode 100644 index 0000000..038d299 Binary files /dev/null and b/static/pubs/2006-10-ICNPC-ScalingTaskGraphs.pdf differ diff --git a/static/pubs/2006-DSN-DanglingPointers.html b/static/pubs/2006-DSN-DanglingPointers.html new file mode 100644 index 0000000..3f55506 --- /dev/null +++ b/static/pubs/2006-DSN-DanglingPointers.html @@ -0,0 +1,82 @@ + + + + + +
+In this paper, we propose a novel technique to detect all +dangling pointer uses at run-time that is efficient enough +for production use in server codes. One idea (previously +used by Electric Fence, PageHeap) is to use a new virtual +page for each allocation of the program and rely on page +protection mechanisms to check dangling pointer accesses. +This naive approach has two limitations that makes it im- +practical to use in production software: increased physical +memory usage and increased address space usage. We pro- +pose two key improvements that alleviate both these prob- +lems. First, we use a new virtual page for each allocation +of the program but map it to the same physical page as the +original allocator. This allows using nearly identical physi- +cal memory as the original program while still retaining the +dangling pointer detection capability. We also show how to +implement this idea without requiring any changes to the +underlying memory allocator. Our second idea alleviates +the problem of virtual address space exhaustion by using +a previously developed compiler transformation called Au- +tomatic Pool Allocation to reuse many virtual pages. The +transformation partitions the memory of the program based +on their lifetimes and allows us to reuse virtual pages when +portions of memory become inaccessible. Experimentally +we find that the run-time overhead for five unix servers is +less than 4%, for other unix utilities less than 15%. How- +ever, in case of allocation intensive benchmarks, we find our +overheads are much worse (up to 11x slowdown). ++ +
+@inproceedings{1135707, + author = {Dinakar Dhurjati and Vikram Adve}, + title = {Efficiently Detecting All Dangling Pointer Uses in Production Servers}, + booktitle = {DSN '06: Proceedings of the International Conference on Dependable Systems and Networks}, + year = {2006}, + isbn = {0-7695-2607-1}, + pages = {269--280}, + doi = {http://dx.doi.org/10.1109/DSN.2006.31}, + publisher = {IEEE Computer Society}, + address = {Washington, DC, USA}, +} ++ + +
+Clearly, the traditional one-size-fits-all approach to security and reliability is no longer sufficient or acceptable from the user perspective. A potentially much more cost-effective and precise approach is to customize the mechanisms for detecting security attacks and execution errors using knowledge about the expected or allowed program behavior. ++ +
+ "Toward Application-Aware Security and Reliability" ++
+ Ravishankar K. Iyer, Zbigniew Kalbarczyk, Karthik Pattabiraman, William Healey, Wen-Mei W. Hwu, Peter Klemperer, and Reza Farivar. +
+ +IEEE Security and Privacy +, January 2007. +
+@article{1262662, + author = {Iyer, Ravishankar K. and Kalbarczyk, Zbigniew and Pattabiraman, Karthik and Healey, William and Hwu, Wen-Mei W. and Klemperer, Peter and Farivar, Reza}, + title = {Toward Application-Aware Security and Reliability}, + journal = {IEEE Security and Privacy}, + volume = {5}, + number = {1}, + year = {2007}, + issn = {1540-7993}, + pages = {57--62}, + doi = {http://dx.doi.org/10.1109/MSP.2007.23}, + publisher = {IEEE Educational Activities Department}, + address = {Piscataway, NJ, USA}, + } ++ + +
++ ++LLVM (http://llvm.org) is a suite of carefully designed open source +libraries which implement compiler components (like language +front-ends, code generators, aggressive optimizers, Just-In-Time +compiler support, debug support, link-time optimization, etc). The +goal of the LLVM project is to build these components in a way that +allows them to be combined together to create familiar tools (like a C +compiler), interesting new tools (like an OpenGL JIT compiler) and +many other things we haven't thought of yet. Because LLVM is under +continuous development, clients of these components naturally benefit +from improvements in the libraries.
+ +This talk gives an overview of LLVM's architecture, design and +philosophy, and gives a high-level overview of the various components +that are available. It then describes implementation details and +design points of some example clients—LLVM's GCC-based +C/C++/Objective-C compiler, the OpenGL stack in Mac OS/X Leopard, and +scripting language compilers—describing some of the novel +capabilities that LLVM contributes to these projects.
+ +
+ "The LLVM Compiler System", Chris Lattner,+ +
+ 2007 Bossa Conference on Open Source, Mobile Internet and Multimedia, Recife, Brazil, March 2007.
+
+Unlocking the potential of field-programmable gate arrays requires compilers that translate algorithmic high-level language code into hardware circuits. The Trident open source compiler translates C code to a hardware circuit description, providing designers with extreme flexibility in prototyping reconfigurable supercomputers. ++ +
+ "Trident: From High-Level Language to Hardware Circuitry" ++
+ Justin L. Tripp, Maya B. Gokhale, and Kristopher D. Peterson. +
+ +IEEE Computer +, March 2007. +
+@article{1251714, + author = {Tripp, Justin L. and Gokhale, Maya B. and Peterson, Kristopher D.}, + title = {Trident: From High-Level Language to Hardware Circuitry}, + journal = {Computer}, + volume = {40}, + number = {3}, + year = {2007}, + issn = {0018-9162}, + pages = {28--37}, + doi = {http://dx.doi.org/10.1109/MC.2007.107}, + publisher = {IEEE Computer Society Press}, + address = {Los Alamitos, CA, USA}, + } ++ + +
+The last couple of years, various idioms used in the 15 MLOC C code base of ASML, the world's biggest lithography machine manufacturer, have been unmasked as crosscutting concerns. However, finding a scalable aspect-based implementation for them did not succeed thusfar, prohibiting sufficient separation of concerns and introducing possibly dangerous programming mistakes. This paper proposes a concise aspect-based implementation in Aspicere2 for ASML's exception handling idiom, based on prior work of join point properties, annotations and type parameters, to which we add the new concept of (local) continuation join points. Our solution takes care of the error value propagation mechanism (which includes aborting the main success scenario), logging, resource cleanup, and allows for local overrides of the default aspect-based recovery. The highly idiomatic nature of the problem in tandem with the aforementioned concepts renders our aspects very robust and tolerant to future base code evolution. ++ +
+ "An Aspect for Idiom-based Exception Handling (using local continuation join points, join point properties, annotations and type parameters)"+ +
+ Bram Adams and Kris De Schutter.
+ Proc. of the 5th Software-Engineering Properties of Languages and Aspect Technologies Workshop (SPLAT), AOSD 2007, Vancouver, Canada, March, 2007. +
+@INPROCEEDINGS{Adams07, + author = {Bram Adams and Kris De Schutter}, + title = {An Aspect for Idiom-based Exception Handling (using local continuation join points, join point properties, annotations and type parameters)}, + booktitle = {Proceedings of the 5th Software-Engineering Properties of Languages and Aspect Technologies Workshop (SPLAT), AOSD}, + year = {2007}, + address = {Vancouver, Canada}, +} ++ + +
++ ++When developing or deploying large applications, one would like to have more insights +into what an application is doing at runtime. Frequently it is required to change defective +parts of an application as fast as possible. For instance one may wish to replace a certain +function call in a program with another function call whenever a specified condition +holds. This master thesis aims at building the change framework, a system for dynamic +program instrumentation and analysis. This research builds atop of the Low Level Virtual +Machine (LLVM) for representing C/C++ applications in an intermediate form. The +change framework consists of two parts, the application under analysis, and a monitor +process. The application under analysis is a C/C++ application compiled to LLVM +bytecodes. The monitor process communicates with the application process and is able +to dynamically instrument and analyze the application process using a domain specific +language. This change language has powerful constructs for defining and conditionally +applying application changes. An important overall goal of this system is to ease the +analysis as well as alteration of low level system software at run-time. +
+
+ A Change Framework based on the Low Level +Virtual Machine Compiler Infrastructure, Jakob Praher.+ +
+ Masters Thesis, Institute for System Software +Johannes Kepler University Linz, April 2007. +
+Instruction selection is a compiler optimisation that translates the intermediate representation of a program into a lower intermediate representation or an assembler program. We use the SSA form as an intermediate representation for instruction selection. Patterns are used for translation and are expressed as production rules in a graph grammar. The instruction selector seeks for a syntax derivation with minimal costs optimising execution time, code size, or a combination of both. Production rules are either base rules which match nodes in the SSA graph or chain rules which convert results of operations.+ ++We present a new algorithm for placing chain rules in a control flow graph. This new algorithm places chain rules optimally for an arbitrary cost metric. Experiments with the MiBench and SPEC2000 benchmark suites show that our proposed algorithm is feasible and always yields better results than simple strategies currently in use. We reduce the costs for placing chain rules by 25% for the MiBench suite and by 11% for the SPEC2000 suite. +
+ "Optimal chain rule placement for instruction selection based on SSA graphs" ++
+ Stefan Schafer and Bernhard Scholz. +
+ +Proceedings of the 10th international workshop on Software & compilers for embedded systems (SCOPES'07) +, Nice, France, April 2007. +
+@inproceedings{1269857, + author = {Sch\"{a}fer, Stefan and Scholz, Bernhard}, + title = {Optimal chain rule placement for instruction selection based on SSA graphs}, + booktitle = {SCOPES '07: Proceedingsof the 10th international workshop on Software \& compilers for embedded systems}, + year = {2007}, + pages = {91--100}, + location = {Nice, France}, + doi = {http://doi.acm.org/10.1145/1269843.1269857}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Switch-case statements (or switches) provide a natural way to express multiway +branching control flow semantics. They are common in many applications including +compilers, parsers, text processing programs, virtual machines. Various +optimizations for switches has been studied for many years. This paper presents +the description of switch lowering refactoring recently made for the LLVM +Compiler System. ++ +
+ "Improving Switch Lowering for The LLVM Compiler System"+ +
+ Anton Korobeynikov
+ Proc. of the 2007 Spring Young Researchers Colloquium on Software + Engineering (SYRCoSE'2007), Moscow, Russia, May, 2007. +
+ @InProceedings{SYRCoSE:SwitchLowering, + author = {Anton Korobeynikov}, + title = "{Improving Switch Lowering for The LLVM Compiler System}", + booktitle = "{Proceedings of the 2007 Spring Young Researchers Colloquium on Software Engineering (SYRCoSE'2007)}", + address = {Moscow, Russia}, + month = {May}, + year = {2007} + } ++ + +
+Context-sensitive pointer analysis algorithms with full "heap +cloning" are powerful but are widely considered to be too expensive +to include in production compilers. This paper shows, for the first +time, that a context-sensitive, field-sensitive algorithm with full +heap cloning (by acyclic call paths) can indeed be both scalable and +extremely fast in practice. Overall, the algorithm is able to analyze +programs in the range of 100K-200K lines of C code in 1-3 seconds, +takes less than 5% of the time it takes for GCC to compile the code +(which includes no whole-program +analysis), and scales well across five orders of magnitude of code +size. It is also able to analyze the Linux kernel (about 355K lines +of code) in 3.1 seconds. The paper describes the major algorithmic +and engineering design choices that are required to achieve these +results, including (a) using flow-insensitive and unification-based +analysis, which are essential to avoid exponential behavior in +practice; +(b) sacrificing context-sensitivity within strongly connected components +of the call graph; and +(c) carefully eliminating several kinds of O(N2) behaviors (largely +without affecting precision). +The techniques used for (b) and (c) eliminated several major bottlenecks +to scalability, and both are generalizable to +other context-sensitive algorithms. We show that the engineering +choices collectively reduce analysis time by factors of up to 3x-21x +in our ten largest programs, and that the savings grow strongly +with program size. +Finally, we briefly summarize results demonstrating the precision of the +analysis. ++ +
+ "Making Context-Sensitive Points-to Analysis with Heap Cloning + Practical For The Real World"+ +
+ Chris Lattner, Andrew Lenharth, and Vikram Adve.
+ Proc. of the 2007 ACM SIGPLAN Conference on Programming Language + Design and Implementation (PLDI'07), San Diego, CA, Jun, 2007. +
+ @InProceedings{DSA:PLDI07, + author = {Chris Lattner and Andrew Lenharth and Vikram Adve}, + title = "{Making Context-Sensitive Points-to Analysis with Heap Cloning Practical For The Real World}", + booktitle = "{Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07)}", + address = {San Diego, California}, + month = {June}, + year = {2007} + } ++ + +
++ ++The LLVM 2.0 release brings a number of new features and capabilities to +the LLVM toolset. This talk briefly describes those features, then moves +on to talk about what is next: llvm 2.1, llvm-gcc 4.2, and puts a special +emphasis on the 'clang' C front-end. This describes how the 'clang' +preprocessor can be used to improve the scalability of distcc by up to +4.4x.
+ +
+ "LLVM 2.0 and Beyond!", Chris Lattner,+ +
+ Google Tech Talk, Mountain View, CA, July 2007.
+
+Precise software analysis and verification require tracking the exact +path along which a statement is executed (path-sensitivity), the different contexts +from which a function is called (context-sensitivity), and the bit-accurate +operations performed. Previously, verification with such precision has been considered +too inefficient to scale to large software. In this paper, we present +a novel approach to solving such verification conditions, based on an automatic +abstraction-checking-refinement framework that exploits natural abstraction +boundaries present in software. Experimental results show that our approach +easily scales to over 200,000 lines of real C/C++ code. ++ +
+"Structural Abstraction of Software Verification Conditions" ++ +
+Domagoj Babic and Alan J. Hu. +
+ + Proc. of the 19th International Conference on + Computer Aided Verification (CAV'07) +, +Berlin, Germany, July 3-7, 2007. +
+@inproceedings{bh07structural, + author = {Domagoj Babi\'c and Alan J. Hu}, + title = {{Structural Abstraction of Software Verification Conditions}}, + booktitle = {Proceedings of the 19th Int. Conf. on Computer Aided Verification + (CAV'07), Berlin, Germany} + publisher = {Springer}, + series = {Lecture Notes in Computer Science}, + year = {2007}, + month = {July} +} ++ + +
+We present ongoing work and first results in static and detailed quantitative runtime analysis of LLVM byte code for the purpose of automatic procedural level partitioning and co-synthesis of complex software systems. Runtime behaviour is captured by reverse compilation of LLVM bytecode into augmented, self-profiling ANSI-C simulator programs retaining the LLVM instruction level. The actual global data flow is captured both in quantity and value range to guide function unit layout in the synthesis of application specific processors. Currently the implemented tool LLILA (Low Level Intermediate Language Analyzer) focuses on static code analysis on the inter-procedural data flow via e.g. function parameters and global variables to uncover a program's potential paths of data exchange. ++ +
+ "Compiled low-level virtual instruction set simulation and profiling for code partitioning and ASIP-synthesis in hardware/software co-design" ++
+ Carsten Gremzow. +
+ +Proceedings of the 2007 summer computer simulation conference (SCSC'07) +, San Diego, California, July 2007. +
+@inproceedings{1358025, + author = {Gremzow, Carsten}, + title = {Compiled low-level virtual instruction set simulation and profiling for code partitioning and ASIP-synthesis in hardware/software co-design}, + booktitle = {SCSC: Proceedings of the 2007 summer computer simulation conference}, + year = {2007}, + isbn = {1-56555-316-0}, + pages = {741--748}, + location = {San Diego, California}, + publisher = {Society for Computer Simulation International}, + address = {San Diego, CA, USA}, + } ++ + +
+Transactional memory dramatically reduces the complexity of writing concurrent +code. Yet, seamless integration of transactional constructs in application code +typically comes with a significant performance penalty. Recent studies have +shown that compiler support allows producing highly efficient STM-based +applications without putting the hassle on the programmer. So far, STM +integration has been partially implemented in custom, proprietary compiler +infrastructures. In this paper, we propose and evaluate the use of the LLVM +open compiler framework to generate efficient concurrent applications using +word-based STM libraries. Since LLVM uses the GCC compiler suite as front-end, +it can process code written in C or C++ (with partial support for other +languages). We also present a tool that allows ``transactifying'' assembly code +and can complement LLVM for legacy code and libraries. Experiments using a +lightweight C word-based STM library show that LLVM integration performs as +well as hand-optimized calls to the STM library and better than assembly code +instrumentation of the application code. ++ +
+@inproceedings { felber2007tanger, + title = {Transactifying Applications using an Open Compiler Framework}, + author = {Pascal Felber and Christof Fetzer and Ulrich M\"uller and +Torvald Riegel and Martin S\"u{\ss}kraut and Heiko Sturzrehm }, + booktitle = {TRANSACT}, + month = {August}, + year = {2007}, +} ++ +
+Soft real-time applications lack a formal methodology for their design optimization. Well-established techniques from hard real-time systems cannot be directly applied to soft real-time applications, without losing key benefits of the soft real-time paradigm. We introduce a statistical analysis framework that is well-suited for discovering opportunities for optimization of soft real-time applications. We demonstrate how programmers can use the analysis provided by our framework to perform aggressive soft real-time design optimizations on their applications. The paper introduces the Context Execution Tree (CET) representation for capturing the statistical properties of function calls in the context of their execution in the program. The CET is constructed from an offline-profile of the application. Statistical measures are coupled with techniques that extract runtime distinguishable call-chains. This combination of techniques is applied to the CET to find statistically significant patterns of activity that i) expose slack in the execution of the application with respect to its soft real-time requirements, and ii) can be predicted with low overhead and high reliability during the normal execution of the application. ++ +
+ "A Profile-Driven Statistical Analysis Framework for the +Design Optimization of Soft Real-Time Applications" ++
+ Tushar Kumar, Jaswanth Sreeram, Romain Cledat, and Santosh Pande. +
+ +Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (ESEC-FSE '07) +, Dubrovnik, Croatia, September 2007. +
+@inproceedings{1287702, + author = {Kumar, Tushar and Sreeram, Jaswanth and Cledat, Romain and Pande, Santosh}, + title = {A profile-driven statistical analysis framework for the design optimization of soft real-time applications}, + booktitle = {ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering}, + year = {2007}, + isbn = {978-1-59593-811-4}, + pages = {529--532}, + location = {Dubrovnik, Croatia}, + doi = {http://doi.acm.org/10.1145/1287624.1287702}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+This paper describes an efficient and robust +approach to provide a safe execution environment for an entire +operating system, such as Linux, and all its applications. The +approach, which we call Secure Virtual Architecture (SVA), +defines a virtual, low-level, typed instruction set suitable for +executing all code on a system, including kernel and +application code. SVA code is translated for execution by a virtual +machine transparently, offline or online. +SVA aims to enforce fine-grained (object level) memory safety, +control-flow integrity, +type safety for a subset of objects, and sound analysis. +A virtual machine implementing SVA achieves these goals by using a +novel approach that exploits properties of existing memory pools in +the kernel and by preserving the kernel's explicit control over +memory, including custom allocators and explicit deallocation. +Furthermore, the safety properties can be encoded compactly as +extensions to the SVA type system, +allowing the (complex) safety checking compiler to be outside +the trusted computing base. SVA also defines a set of OS interface +operations that abstract all privileged hardware instructions, +allowing the virtual machine to monitor all privileged operations +and control the physical resources on a given hardware platform. +We have ported the Linux kernel to SVA, treating it as a new +architecture, and made only minimal code changes (less than 300 lines of code) +to the machine-independent parts of the kernel and device drivers. +SVA is able to prevent 4 out of 5 memory safety exploits previously reported +for the Linux 2.4.22 kernel for which exploit code is available, and would +prevent the fifth one simply by compiling an additional kernel library. ++ +
+@inproceedings{SVA:SOSP07, + author = {John Criswell, Andrew Lenharth , Dinakar Dhurjati, and Vikram Adve}, + title = {Secure Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems}, + booktitle = {SOSP '07: Proceedings of the Twenty First ACM Symposium on Operating Systems Principles}, + month = {October}, + year = {2007}, + location = {Stevenson, WA}, +} ++ + +
+Although the C-based interpreter of Python is reasonably fast, implementations on the CLI or the JVM platforms offers some advantages in terms of robustness and interoperability. Unfortunately, because the CLI and JVM are primarily designed to execute statically typed, object-oriented languages, most dynamic language implementations cannot use the native bytecodes for common operations like method calls and exception handling; as a result, they are not able to take full advantage of the power offered by the CLI and JVM.+ ++We describe a different approach that attempts to preserve the flexibility of Python, while still allowing for efficient execution. This is achieved by limiting the use of the more dynamic features of Python to an initial, bootstrapping phase. This phase is used to construct a final RPython (Restricted Python) program that is actually executed. RPython is a proper subset of Python, is statically typed, and does not allow dynamic modification of class or method definitions; however, it can still take advantage of Python features such as mixins and first-class methods and classes.
+This paper presents an overview of RPython, including its design and its translation to both CLI and JVM bytecode. We show how the bootstrapping phase can be used to implement advanced features, like extensible classes and generative programming. We also discuss what work remains before RPython is truly ready for general use, and compare the performance of RPython with that of other approaches. +
+ "RPython: a Step Towards Reconciling Dynamically and Statically +Typed OO Languages" ++
+ Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. +
+ +Proceedings of the 2007 symposium on Dynamic languages (DLS'07) +, Montreal, Quebec, Canada, October 2007. +
+@inproceedings{1297091, + author = {Ancona, Davide and Ancona, Massimo and Cuni, Antonio and Matsakis, Nicholas D.}, + title = {RPython: a step towards reconciling dynamically and statically typed OO languages}, + booktitle = {DLS '07: Proceedings of the 2007 symposium on Dynamic languages}, + year = {2007}, + isbn = {978-1-59593-868-8}, + pages = {53--64}, + location = {Montreal, Quebec, Canada}, + doi = {http://doi.acm.org/10.1145/1297081.1297091}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Embedded computer systems can be found everywhere as the result of the +need to develop ever more intelligent and complex electronic devices. +To meet requirements for factors such as power consumpiton and performance +these systems often require customized processors which are optimized for +a specific application. However, designing an application specific processor +can be time-consuming and costly, and therefore the toolset used for processor +design has an important role.+ ++ +TTA Codesign Environment (TCE) is a semi-automated toolset developed at the +Tampere University of Technology for designing processors based on an easily +customizable Transport Triggered Architecture (TTA) processor architecture +template. The toolset provides a complete co-design toolchain from +program source code to synthesizable hardware design and program binaries.
+ +One of the most important tools in the toolchain is the compiler. The compiler +is required to adapt to customized target architectures and to utilize the +available processor resources as efficiently as possible and still produce +programs with correct behavior. The compiler is therefore the most complicated +and challenging tool to design in the toolset.
+ +The work completed for this thesis consists of the design, implementation and +verification of a retargetable compiler backend for the TCE project. This +thesis describes the role of the compiler in the toolchain and presents the +design of the implemented compiler backend. In addition, the methods and +benchmark results of the compiler verification are presented. +
+ Retargetable Compiler Backend for Transport Triggered Architectures, Veli-Pekka Jaaskelainen.+ +
+ Masters Thesis, Tampere University of Technology, Oct 2007. +
+This paper describes an efficient and robust +approach to provide a safe execution environment for an entire +operating system, such as Linux, and all its applications. The +approach, which we call Secure Virtual Architecture (SVA), +defines a virtual, low-level, typed instruction set suitable for +executing all code on a system, including kernel and +application code. SVA code is translated for execution by a virtual +machine transparently, offline or online. +SVA aims to enforce fine-grained (object level) memory safety, +control-flow integrity, +type safety for a subset of objects, and sound analysis. +A virtual machine implementing SVA achieves these goals by using a +novel approach that exploits properties of existing memory pools in +the kernel and by preserving the kernel's explicit control over +memory, including custom allocators and explicit deallocation. +Furthermore, the safety properties can be encoded compactly as +extensions to the SVA type system, +allowing the (complex) safety checking compiler to be outside +the trusted computing base. SVA also defines a set of OS interface +operations that abstract all privileged hardware instructions, +allowing the virtual machine to monitor all privileged operations +and control the physical resources on a given hardware platform. +We have ported the Linux kernel to SVA, treating it as a new +architecture, and made only minimal code changes (less than 300 lines of code) +to the machine-independent parts of the kernel and device drivers. +SVA is able to prevent 4 out of 5 memory safety exploits previously reported +for the Linux 2.4.22 kernel for which exploit code is available, and would +prevent the fifth one simply by compiling an additional kernel library. ++ +
Awarded an SOSP 2007 Audience Choice Award
+ ++@inproceedings{SVA:SOSP07, + author = {John Criswell, Andrew Lenharth , Dinakar Dhurjati, and Vikram Adve}, + title = {Secure Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems}, + booktitle = {SOSP '07: Proceedings of the Twenty First ACM Symposium on Operating Systems Principles}, + month = {October}, + year = {2007}, + location = {Stevenson, WA}, +} ++ + +
+Current transactifying compilers for unmanaged environments (e.g., systems software written +in C/C++) target only word-based software transactional memories (STMs) because the +compiler cannot easily infer whether it is safe to transform a transactional access to a +certain memory location in an object-based way. To use object-based STMs in these +environments, programmers must use explicit calls to the STM or use a restricted language +dialect, both of which are not practical. In this paper, we show how an existing pointer +analysis can be used to let a transactifying compiler for C/C++ use object-based accesses +whenever this is possible and safe, while falling back to word-based accesses otherwise. +Programmers do not need to provide any annotations and do not have to use a restricted +language. Our evaluation also shows that an object-based STM can be significantly faster +than a word-based STM with an otherwise identical design and implementation, even if the +parameters of the latter have been tuned. ++ +
+@inproceedings{Riegel2008objbased, + author = {{T}orvald {R}iegel and {B}ecker de {B}rum, {D}iogo}, + title = {{M}aking {O}bject-{B}ased {STM} {P}ractical in {U}nmanaged {E}nvironments}, + booktitle = {{TRANSACT} 2008}, + year = {2008}, +} ++ +
+ +Malware programs that incorporate trigger-based behavior +initiate malicious activities based on conditions satisfied +only by specific inputs. State-of-the-art malware analyzers +discover code guarded by triggers via multiple path +exploration, symbolic execution, or forced conditional execution, +all without knowing the trigger inputs. We present +a malware obfuscation technique that automatically conceals specific +trigger-based behavior from these malware +analyzers. Our technique automatically transforms a program +by encrypting code that is conditionally dependent on an input +value with a key derived from the input and then +removing the key from the program. We have implemented +a compiler-level tool that takes a malware source program +and automatically generates an obfuscated binary. Experiments +on various existing malware samples show that our +tool can hide a significant portion of trigger based code. We +provide insight into the strengths, weaknesses, and possible +ways to strengthen current analysis approaches in order to +defeat this malware obfuscation technique. ++ +
+"Impeding Malware Analysis Using Conditional Code Obfuscation"
+Monirul Sharif, Andrea Lanzi, Jonathon Giffin and Wenke Lee
+In the Proceedings of the 15th Annual Network and Distributed System
+Security Symposium (NDSS'08), San Diego, CA, February 2008
+
+With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field faults. To be broadly deployable, the hardware reliability solution must incur low overheads, precluding use of expensive redundancy. We explore a cooperative hardware-software solution that watches for anomalous software behavior to indicate the presence of hardware faults. Fundamental to such a solution is a characterization of how hardware faults indifferent microarchitectural structures of a modern processor propagate through the application and OS.+ ++ +This paper aims to provide such a characterization, resulting in identifying low-cost detection methods and providing guidelines for implementation of the recovery and diagnosis components of such a reliability solution. We focus on hard faults because they are increasingly important and have different system implications than the much studied transients. We achieve our goals through fault injection experiments with a microarchitecture-level full system timing simulator. Our main results are: (1) we are able to detect 95% of the unmasked faults in 7 out of 8 studied microarchitectural structures with simple detectors that incur zero to little hardware overhead; (2) over 86% of these detections are within latencies that existing hardware checkpointing schemes can handle, while others require software checkpointing; and (3) a surprisingly large fraction of the detected faults corrupt OS state, but almost all of these are detected with latencies short enough to use hardware checkpointing, thereby enabling OS recovery in virtually all such cases. +
+ "Understanding the Propagation of Hard Errors to Software and +Implications for Resilient System Design" ++
+ Man-Lap Li, Pradeep Ramachandran, Swarup K. Sahoo, Sarita V. Adve, Vikram S. Adve, Yuanyuan Zhou. +
+ +Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS'08) +, Seattle, WA, March 2008. +
+@inproceedings{1346315, + author = {Li, Man-Lap and Ramachandran, Pradeep and Sahoo, Swarup Kumar and Adve, Sarita V. and Adve, Vikram S. and Zhou, Yuanyuan}, + title = {Understanding the propagation of hard errors to software and implications for resilient system design}, + booktitle = {ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems}, + year = {2008}, + isbn = {978-1-59593-958-6}, + pages = {265--276}, + location = {Seattle, WA, USA}, + doi = {http://doi.acm.org/10.1145/1346281.1346315}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+This paper presents a novel cycle-approximate performance estimation technique for automatically generated +transaction level models (TLMs) for heterogeneous multicore designs. The inputs are application C processes and +their mapping to processing units in the platform. The processing unit model consists of pipelined datapath, memory +hierarchy and branch delay model. Using the processing +unit model, the basic blocks in the C processes are analyzed +and annotated with estimated delays. This is followed by +a code generation phase where delay-annotated C code is +generated and linked with a SystemC wrapper consisting of +inter-process communication channels. The generated TLM +is compiled and executed natively on the host machine. Our +key contribution is that the estimation technique is close to +cycle-accurate, it can be applied to any multi-core platform +and it produces high-speed native compiled TLMs. For experiments, timed TLMs for industrial scale designs such as +MP3 decoder were automatically generated for 4 heterogeneous multi-processor platforms with up to 5 PEs under +1 minute. Each TLM simulated under 1 second, compared +to 3-4 hrs of instruction set simulation (ISS) and 15-18 hrs +of RTL simulation. Comparison to on-board measurement +showed only 8% error on average in estimated number of +cycles. ++ +
+Despite extensive testing in the development phase, residual defects can be a great threat to dependability in the operational phase. This paper studies the utility of low-cost, generic invariants ("screeners") in their capacity of error detectors within a spectrum-based fault localization (SFL) approach aimed to diagnose program defects in the operational phase. The screeners considered are simple bit-mask and range invariants that screen every load/store and function argument/return program point. Their generic nature allows them to be automatically instrumented without any programmer-effort, while training is straightforward given the test cases available in the development phase. Experiments based on the Siemens program set demonstrate diagnostic performance that is similar to the traditional, development-time application of SFL based on the program pass/fail information known before-hand. This diagnostic performance is currently attained at an average 14% screener execution time overhead, but this overhead can be reduced at limited performance penalty. ++ +
+ "Automatic Software Fault Localization using Generic Program Invariants" ++
+ Rui Abreu, Alberto Gonzalez, Peter Zoeteweij, and Arjan J.C. van Gemund +
+ +Proceedings of the 2008 ACM symposium on Applied computing (SAC'08) +, Fortaleza, Ceara, Brazil, March 2008. +
+@inproceedings{1363855, + author = {Abreu, Rui and Gonz\'{a}lez, Alberto and Zoeteweij, Peter and van Gemund, Arjan J. C.}, + title = {Automatic software fault localization using generic program invariants}, + booktitle = {SAC '08: Proceedings of the 2008 ACM symposium on Applied computing}, + year = {2008}, + isbn = {978-1-59593-753-7}, + pages = {712--717}, + location = {Fortaleza, Ceara, Brazil}, + doi = {http://doi.acm.org/10.1145/1363686.1363855}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
++ ++Security vulnerabilities are software bugs that are exploited by an attacker. Systems software is at high risk of exploitation: attackers commonly exploit security vulnerabilities to gain control over a system, remotely, over the internet. Bug-checking tools have been used with fair success in recent years to automatically find bugs in software. However, for finding software bugs that can cause security vulnerabilities, a bug checking tool must determine whether the software bug can be controlled by user-input. +
+ ++In this paper we introduce a static program analysis for computing user-input dependencies. This analysis is used as a pre-processing filter to our static bug checking tool, currently under development, to identify bugs that can be exploited as security vulnerabilities. Runtime speed and scalability of the user-input dependence analysis is of key importance if the analysis is used for large commercial systems software. +
+ ++Our user-input dependency analysis takes both data and control dependencies into account. We extend Static Single Assignment (SSA) form by augmenting phi-nodes with control dependencies of its arguments. A formal definition of user-input dependency is expressed in a dataflow analysis framework as a Meet-Over-all-Paths (MOP) solution. We reduce the equation system to a sparse equation system exploiting the properties of SSA. The sparse equation system is solved as a reachability problem that results in a fast algorithm for computing user-input dependencies. We have implemented a call-insensitive and a call-sensitive version of the analysis. The paper compares their efficiency for various systems codes. +
+
+@techreport{SunTR171:2008, + author = "Bernard Scholz and Chenyi Zhang and Cristina Cifuentes", + title = "{User-Input Dependence Analysis via Graph Reachability}", + number = "TR-2008-171", + month = "March", + year = "2008", + url = "http://research.sun.com/techrep/2008/abstract-171.html" +} ++ +
++ ++This talk gives a gentle introduction to LLVM and Clang, particularly suited +for those without a deep compiler hacker background. +
+ +
+ "LLVM and Clang: Next Generation Compiler Technology", Chris Lattner,+ +
+ BSDCan 2008: The BSD Conference, Ottawa, Canada, May 2008.
+
+The paper presents a deductive framework for proving program equivalence +and its application to automatic verification of transformations performed +by optimizing compilers. To leverage existing program analysis techniques, + we reduce the equivalence checking problem to +analysis of one system - a cross-product of the two input programs. We +show how the approach can be effectively used for checking equivalence of +consonant (i.e., structurally similar) programs. Finally, we report on the +prototype tool that applies the developed methodology to verify that a +compiler optimization run preserves the program semantics. Unlike existing +frameworks, CoVaC accommodates absence of compiler annotations +and handles most of the classical intraprocedural optimizations such as +constant folding, reassociation, common subexpression elimination, code +motion, dead code elimination, branch optimizations, and others. ++ +
+@inproceedings{ZP2008, + Author = {Anna Zaks and Amir Pnueli}, + Title = {{CoVaC}: Compiler Validation by Program Analysis of +the Cross-Product}, + Booktitle = {International Symposium on Formal Methods (FM 2008)}, + Address = {Turku, Finland}, + Month = {May}, + Year = 2008 +} ++ +
+Automatically detecting bugs in programs has been a long-held goal in software engineering. Many techniques exist, trading-off varying levels of automation, thoroughness of coverage of program behavior, precision of analysis, and scalability to large code bases. This paper presents the Calysto static checker, which achieves an unprecedented combination of precision and scalability in a completely automatic extended static checker. Calysto is interprocedurally path-sensitive, fully context-sensitive, and bit-accurate in modeling data operations --- comparable coverage and precision to very expensive formal analyses --- yet scales comparably to the leading, less precise, static-analysis-based tool for similar properties. Using Calysto, we have discovered dozens of bugs, completely automatically, in hundreds of thousands of lines of production, open-source applications, with a very low rate of false error reports. This paper presents the design decisions, algorithms, and optimizations behind Calysto's performance. ++ +
+ "Calysto: Scalable and Precise Extended Static Checking" ++
+ Domagoj Babic and Alan J. Hu. +
+ +Proceedings of the 30th international conference on Software engineering (ICSE'08) +, Leipzig, Germany, May 2008. +
+@inproceedings{1368118, + author = {Babic, Domagoj and Hu, Alan J.}, + title = {Calysto: scalable and precise extended static checking}, + booktitle = {ICSE '08: Proceedings of the 30th international conference on Software engineering}, + year = {2008}, + isbn = {978-1-60558-079-1}, + pages = {211--220}, + location = {Leipzig, Germany}, + doi = {http://doi.acm.org/10.1145/1368088.1368118}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+We investigate to which extent data partitioning can help improve the performance of +software transactional memory (STM). Our main idea is that the access patterns of the +various data structures of an application might be sufficiently different so that it would +be beneficial to tune the behavior of the STM for individual data partitions. We evaluate +our approach using standard transactional memory benchmarks. We show that these +applications contain partitions with different characteristics and, despite the runtime +overhead introduced by partition tracking and dynamic tuning, that partitioning provides +significant performance improvements. ++ +
+@inproceedings{Riegel2008partitioning, + author = {{T}orvald {R}iegel and {C}hristof {F}etzer and {P}ascal {F}elber}, + title = {{A}utomatic {D}ata {P}artitioning in {S}oftware {T}ransactional {M}emories}, + booktitle = {20th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)}, + year = {2008}, +} ++ +
++ ++Most modern compilers generate executables by targeting high level languages (C, Java, etc.) or managed virtual environments such as the JVM. In this thesis we explore an alternate approach: generating executables by targeting a typed assembly language. Typed assembly languages are low level, machine centric, languages that abstract away from platform specifics. They provide type safety in a language neutral way and they make no assumptions regarding the runtime environment's memory model, security guarantees etc.. The LLVM is a mature optimizing compiler framework that provides a typed assembly language.
++In this talk, we will present the design and implementation of a backend for EHC that targets the LLVM. We will follow a small Haskell program as it passes through different stages of the EHC compiler, paying special attention to the code semantics and performance. Finally, we contrast the performance of executables produced by the LLVM and C backends of EHC and draw some conclusions regarding the efficiency of LLVM as a backend target for Haskell compilers. +
+ +
+ "Compiling Haskell To LLVM", John van Schie,+ +
+ Thesis Defense, Utrecht University, Netherlands, June 2008.
+
+Instruction-level derating encompasses the mechanisms by which computation on incorrect values can result in correct computation. We characterize the instruction-level derating that occurs in the SPEC CPU2000 INT benchmarks, classifying it (by source) into six categories: value comparison, sub-word operations, logical operations, overflow/precision, lucky loads, and dynamically-dead values. We also characterize the temporal nature of this derating, demonstrating that the effects of a fault persist in architectural state long after the last time they are referenced. Finally, we demonstrate how this characterization can be used to avoid unnecessary error recoveries (when a fault will be masked by software anyway) in the context of a dual modular redundant (DMR) architecture. ++ +
+ "A characterization of instruction-level error derating and its implications for error detection" ++
+ Jeffrey J. Cook and Craig Zilles. +
+ +IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN'08) +, June 2008. +
++ +Instruction selection is a well-studied compiler phase that translates +the compiler's intermediate representation of programs to a sequence +of target-dependent machine instructions optimizing for +various compiler objectives (e.g. speed and space). Most existing +instruction selection techniques are limited to the scope of a single +statement or a basic block and cannot cope with irregular instruction +sets that are frequently found in embedded systems.
+ +We consider an optimal technique for instruction selection that +uses Static Single Assignment (SSA) graphs as an intermediate +representation of programs and employs the Partitioned Boolean +Quadratic Problem (PBQP) for finding an optimal instruction selection. +While existing approaches are limited to instruction patterns that +can be expressed in a simple tree structure, we consider complex +patterns producing multiple results at the same +time including pre/post increment addressing modes, div-mod instructions, +and SIMD extensions frequently found in embedded +systems. Although both instruction selection on SSA-graphs and +PBQP are known to be NP-complete, the problem can be solved +efficiently - even for very large instances.
+ ++Our approach has been implemented in LLVM for an embedded +ARMv5 architecture. Extensive experiments show speedups of up +to 57% on typical DSP kernels and up to 10% on SPECINT 2000 +and MiBench benchmarks. All of the test programs could be com- +piled within less than half a minute using a heuristic PBQP solver +that solves 99.83% of all instances optimally.
+ +
+We have shown that register allocation can be viewed as solving a collection of puzzles. We model the register file as a puzzle board and the program variables as puzzle pieces; pre-coloring and register aliasing fit in naturally. For architectures such as x86, PowerPC, and StrongARM, we can solve the puzzles in polynomial time, and we have augmented the puzzle solver with a simple heuristic for spilling. For SPEC CPU2000, our implementation is as fast as the extended version of linear scan used by LLVM, which is the JIT compiler in the openGL stack of Mac OS 10.5. Our implementation produces Pentium code that is of similar quality to the code produced by the slower, state-of-the-art iterated register coalescing algorithm of George and Appel augmented with extensions by Smith, Ramsey, and Holloway. ++ +
+@inproceedings{Pereira08PLDI, + author = {Fernando Magno Quintao Pereira and Jens Palsberg}, + title = {Register Allocation by Puzzle Solving}, + booktitle = {ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI'08)}, + year = {2008}, +} ++ +
+In this thesis ray tracing of dynamic scenes in real-time is explored based on a separation of static from animated primitives in acceleration structures suited for each type of geometry.+ ++ +For dynamic geometry a two-level bounding volume hierarchy (BVH) is introduced that efficiently supports rigidly animated geometry, deformable geometry and fully dynamic geometry with incoherent motion and topology changes. With selective rebuilding an updating technique for BVHs is described that limits costly rebuilding operations to degenerated parts of the hierarchy and allows for balancing updating and rendering times. Furthermore a new ordered traversal scheme for BVHs is introduced that is based on a probabilistic model.
+ +Kd-trees are the acceleration structure of choice for static geometry and are commonly built by employing the surface area heuristic to determine optimal splitting planes. In this thesis two approaches for reducing the memory footprint of kd-trees are presented. Index list compaction compresses the list of triangle indices used by leaves to reference triangles. The cost-scaling termination criterion for kd-tree construction, on the other hand, limits the creation of deep trees by weighing the costs of splitting a node higher with an increasing depth. +
+ Real-Time Ray Tracing of Dynamic Scenes, Stephan Reiter.+ +
+ Diploma Thesis, Institute for Graphics and Parallel Processing, +Johannes Kepler University, Linz, Austria, June 2008 +
+We present the design of Parfait, a static layered program analysis framework for bug checking, designed for scalability and precision by improving false positive rates and scale to millions of lines of code. The Parfait framework is inherently parallelizable and makes use of demand driven analyses.+ ++ +In this paper we provide an example of several layers of analyses for buffer overflow, summarize our initial implementation for C, and provide preliminary results. Results are quantified in terms of correctly-reported, false positive and false negative rates against the NIST SAMATE synthetic benchmarks for C code. +
+ "Parfait – Designing a Scalable Bug Checker" ++
+ Cristina Cifuentes and Bernhard Scholz. +
+ +Proceedings of the 2008 workshop on Static analysis (SAW'08) +, Tucson, Arizona, June 2008. +
+@inproceedings{1394505, + author = {Cifuentes, Cristina and Scholz, Bernhard}, + title = {Parfait: designing a scalable bug checker}, + booktitle = {SAW '08: Proceedings of the 2008 workshop on Static analysis}, + year = {2008}, + isbn = {978-1-59593-924-1}, + pages = {4--11}, + location = {Tucson, Arizona}, + doi = {http://doi.acm.org/10.1145/1394504.1394505}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+CHiMPS (Compiling High level language to +Massively Pipelined System) system, developed by +Xilinx is gaining popularity due to its convenient +computational model and architecture for field +programmable gate array computing. The CHiMPS +system utilizes CHiMPS target language as an +intermediate representation to bridge between the high +level language and the data flow architecture +generated from it. However, currently the CHiMPS +frontend does not provide many commonly used +optimizations and has some use restrictions. In this +paper we present an alternative compiler environment +based on low level virtual machine compiler +environment extended to generate CHiMPS target +language code for the CHiMPS architecture. Our +implementation provides good support for global +optimizations and analysis and overcomes many +limitations of the original Xilinx CHiMPS compiler. +Simulation results from codes based on this approach +show to outperform those obtained with the original +CHiMPS compiler. ++ +
+ "LLVM-CHiMPS: Compilation Environment for FPGAs Using LLVM +Compiler Infrastructure and CHiMPS Computational Model" ++
+ Seung J. Lee, David K. Raila, and Volodymyr V. Kindratenko. +
+ +Reconfigurable Systems Summer Institute 2008 (RSSI'08) +, Champaign, IL, July 2008. +
+ Typical material systems are based on a collection of properties + that are set on a per-object or per-surface basis usually by +evaluation of a description file at run-time. Shading of hit points is then + implemented by interpreting these properties. In order to be able + to support a large feature set, different code paths need to be +created, e.g. for reflection and refraction effects. Conditions for the + branches associated with these fragments of functionality need to + be evaluated for each shaded ray, which may degrade performance + considerably. Material specific optimization opportunities are also + missed out by having a generic function for all material +configurations. We propose the use of run-time code generation for materials. + ++ +
+A key challenge in model checking software is the difficulty +of verifying properties of implementation code, as opposed to checking an +abstract algorithmic description. We describe a tool for verifying +multithreaded C programs that uses the SPIN model checker. Our tool works +by compiling a multi-threaded C program into a typed bytecode format, +and then using a virtual machine that interprets the bytecode and +computes new program states under the direction of SPIN. Our virtual +machine is compatible with most of SPIN's search options and optimization +flags, such as bitstate hashing and multi-core checking. It provides +support for dynamic memory allocation (the malloc and free family of +functions), and for the pthread library, which provides primitives often +used by multi-threaded C programs. A feature of our approach is that it +can check code after compiler optimizations, which can sometimes introduce +race conditions. We describe how our tool addresses the state space +explosion problem by allowing users to define data abstraction functions +and to constrain the number of allowed context switches. We also describe +a reduction method that reduces context switches using dynamic +knowledge computed on-the-fly, while being sound for both safety and +liveness properties. Finally, we present initial experimental results with +our tool on some small examples. ++ +
+@inproceedings{ZJ2008, + Author = {Anna Zaks and Rajeev Joshi}, + Title = {Verifying Multi-threaded {C} Programs with {SPIN}}, + Booktitle = {15th International SPIN Workshop on Model Checking of Software (SPIN 2008)}, + Address = {Los Angeles, USA}, + Month = {August}, + Year = 2008 +} ++ +
+ +Many approaches to software verification are currently +semi-automatic: a human must provide key logical insights +— e.g., loop invariants, class invariants, and frame axioms +that limit the scope of changes that must be analyzed. +This paper describes a technique for automatically inferring frame +axioms of procedures and loops using static +analysis. The technique builds on a pointer analysis +that generates limited information about all data structures +in the heap. Our technique uses that information +to over-approximate a potentially unbounded set of memory +locations modified by each procedure/loop; this over-approximation +is a candidate frame axiom. +We have tested this approach on the buffer-overflow +benchmarks from ASE 2007. With manually provided specifications +and invariants/axioms, our tool could verify/falsify +226 of the 289 benchmarks. With our automatically inferred +frame axioms, the tool could verify/falsify 203 of the 289, +demonstrating the effectiveness of our approach. + ++ +
+The development of a complete Java Virtual Machine (JVM) +implementation is a tedious process which involves knowledge in different +areas: garbage collection, just in time compilation, interpretation, file +parsing, data structures, etc. The result is that developing its own virtual +machine requires a considerable amount of man/year. In this paper we show that +one can implement a JVM with third party software and with performance +comparable to industrial and top open-source JVMs. Our proof-of-concept +implementation uses existing versions of a garbage collector, a just in +time compiler, and the base library, and is robust enough to +execute complex Java applications such as the OSGi Felix +implementation and the Tomcat servlet container. + ++ +
+@inproceedings{geoffray08ladyvm, + author = {N. Geoffray and G. Thomas and C. Cl\'ement and B. Folliot}, + title = { A Lazy Developer Approach: Building a JVM with Third Party Software }, + booktitle = {{International Conference on Principles and Practice of Programming In Java (PPPJ 2008) }}, + year = {2008}, + address = {Modena, Italy}, + month = {September}, +} ++ +
+Adobe Flash: the current de facto standard for rich content web applications is +powered by an ECMAScript derived language called Action-Script. The bytecode +for the language is designed to run on a stack based virtual machine. We +introduce a Just in Time compiler and runtime en- vironment for such bytecode. +The LLVM framework is used to generate optimized native assembly from an +intermediate representation, generated from the bytecode while optimizing stack +traffic, local variable accesses and exploiting implicit type information. ++ +
+ An Efficient ActionScript 3.0 Just-In-Time Compiler Implementation, + Alessandro Pignotti.+ +
+ Bachelor Thesis, Universita degli Studi di Pisa, September 2008 +
++ ++This talk gives a gentle introduction to LLVM, talking about the benefits of +using llvm-gcc and including updated compile-time and run-time performance +numbers. This talk is high level and particularly suited +for those without a deep compiler hacker background. +
+ +
+ "Introduction to the LLVM Compiler System", Chris Lattner,+ +
+ ACAT 2008: Advanced Computing and Analysis Techniques in Physics Research, Erice, Sicily, Italy, Nov 2008.
+
+Power, energy, and thermal concerns have constrained embedded systems designs. Computing capability and storage density have increased dramatically, enabling the emergence of handheld devices from special to general purpose computing. In many mobile systems, the disk is among the top energy consumers. Many previous optimizations for disk energy have assumed uniprogramming environments. However, many optimizations degrade in multiprogramming because programs are unaware of other programs (execution context). We introduce a framework to make programs aware of and adapt to their runtime execution context.+ ++ +We evaluated real workloads by collecting user activity traces and characterizing the execution contexts. The study confirms that many users run a limited number of programs concurrently. We applied execution context optimizations to eight programs and tested ten combinations. The programs ran concurrently while the disk's power was measured. Our measurement infrastructure allows interactive sessions to be scripted, recorded, and replayed to compare the optimizations' effects against the baseline. Our experiments covered two write cache policies. For write-through, energy savings was in the range 3-63% with an average of 21%. For write-back, energy savings was in the range -33-61% with an average of 8%. In all cases, our optimizations incurred less than 1% performance penalty. +
+ "Execution Context Optimization for Disk Energy" ++
+ Jerry Hom and Ulrich Kremer. +
+ +Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems (CASES'08) +, Atlanta, GA, USA, October 2008. +
+@inproceedings{1450132, + author = {Hom, Jerry and Kremer, Ulrich}, + title = {Execution context optimization for disk energy}, + booktitle = {CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems}, + year = {2008}, + isbn = {978-1-60558-469-0}, + pages = {255--264}, + location = {Atlanta, GA, USA}, + doi = {http://doi.acm.org/10.1145/1450095.1450132}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+C's volatile qualifier is intended to provide a reliable link between +operations at the source-code level and operations at the memory-system +level. We tested thirteen production-quality C compilers +and, for each, found situations in which the compiler generated +incorrect code for accessing volatile variables. This result is disturbing +because it implies that embedded software and operating +systems—both typically coded in C, both being bases for many +mission-critical and safety-critical applications, and both relying +on the correct translation of volatiles—may be being miscompiled. +Our contribution is centered on a novel technique for finding +volatile bugs and a novel technique for working around them. First, +we present access summary testing: an efficient, practical, and automatic +way to detect code-generation errors related to the volatile +qualifier. We have found a number of compiler bugs by performing +access summary testing on randomly generated C programs. Some +of these bugs have been confirmed and fixed by compiler developers. +Second, we present and evaluate a workaround for the compiler +defects we discovered. In 96% of the cases in which one of +our randomly generated programs is miscompiled, we can cause the +faulty C compiler to produce correctly behaving code by applying +a straightforward source-level transformation to the test program. ++ +
+The ability to integrate diverse components such as processor cores, memories, custom hardware blocks and complex network-on-chip (NoC) communication frameworks onto a single chip has greatly increased the design space available for system-on-chip (SoC) designers. Efficient and accurate performance estimation tools are needed to assist the designer in making design decisions. In this paper, we present MC-Sim, a heterogeneous multi-core simulator framework which is capable of accurately simulating a variety of processor, memory, NoC configurations and application specific coprocessors. We also describe a methodology to automatically generate fast, cycle-true behavioral, C-based simulators for coprocessors using a high-level synthesis tool and integrate them with MC-Sim, thus augmenting it with the capacity to simulate coprocessors. Our C-based simulators provide on an average 45x improvement in simulation speed over that of RTL descriptions. We have used this framework to simulate a number of real-life applications such as the MPEG4 decoder and litho-simulation, and experimented with a number of design choices. Our simulator framework is able to accurately model the performance of these applications (only 7% off the actual implementation) and allows us to explore the design space rapidly and achieve interesting design implementations. ++ +
+"MC-Sim: an efficient simulation tool for MPSoC designs"+
+Jason Cong, Karthik Gururaj, Guoling Han, Adam Kaplan, Mishali Naik, and +Glenn Reinman.
+ +Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design (ICCAD'08) +, San Jose, CA, November 2008. +
+@inproceedings{1509541, + author = {Cong, Jason and Gururaj, Karthik and Han, Guoling and Kaplan, Adam and Naik, Mishali and Reinman, Glenn}, + title = {MC-Sim: an efficient simulation tool for MPSoC designs}, + booktitle = {ICCAD '08: Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design}, + year = {2008}, + isbn = {978-1-4244-2820-5}, + pages = {364--371}, + location = {San Jose, California}, + publisher = {IEEE Press}, + address = {Piscataway, NJ, USA}, + } ++ + +
+The advent of multicores presents a promising opportunity for speeding up sequential programs via profile-based speculative parallelization of these programs. In this paper we present a novel solution for efficiently supporting software speculation on multicore processors. We propose the Copy or Discard (CorD) execution model in which the state of speculative parallel threads is maintained separately from the nonspeculative computation state. If speculation is successful, the results of the speculative computation are committed by copying them into the non-speculative state. If misspeculation is detected, no costly state recovery mechanisms are needed as the speculative state can be simply discarded. Optimizations are proposed to reduce the cost of data copying between nonspeculative and speculative state. A lightweight mechanism that maintains version numbers for non-speculative data values enables misspeculation detection. We also present an algorithm for profile-based speculative parallelization that is effective in extracting parallelism from sequential programs. Our experiments show that the combination of CorD and our speculative parallelization algorithm achieves speedups ranging from 3.7 to 7.8 on a Dell PowerEdge 1900 server with two Intel Xeon quad-core processors. ++ +
+Translation Validation is an approach of ensuring compilation correctness in which each compiler run is followed by a validation pass that proves that the target code produced by the compiler is a correct translation (implementation) of the source code. It has been previously shown that the problem of translation validation can be reduced to checking if a single system - the corss-product of the source and target, satisfies a specific property. In this paper, we show how to adapt the existing program analysis techniques in the setting of translation validation. In addition, we present a novel invariant generation algorithm which strengthens our analysis when the input programs contain dynamically allocated data structures. Finally, we report on the prototype tool that applies the developed methodology to verification of the LLVM compiler. The tool handles many of the classical intraprocedural compiler optimizations such as constant folding, reassociation, common subexpression elimination, code motion, dead code elimination, and others. ++ +
+ "Program Analysis for Compiler Validation" ++
+ Anna Zaks and Amir Pnueli. +
+ +Workshop on Program Analysis for Software Tools and Engineering (PASTE'08) +, Atlanta, GA, November 2008. +
+@inproceedings{1512477, + author = {Zaks, Anna and Pnueli, Amir}, + title = {Program analysis for compiler validation}, + booktitle = {PASTE '08: Proceedings of the 8th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering}, + year = {2008}, + isbn = {978-1-60558-382-2}, + pages = {1--7}, + location = {Atlanta, Georgia}, + doi = {http://doi.acm.org/10.1145/1512475.1512477}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+We present a new symbolic execution tool, KLEE, capable of automatically +generating tests that achieve high coverage on a diverse set of complex and +environmentally-intensive programs. We used KLEE to thoroughly check all +89 stand-alone programs in the GNU COREUTILS utility suite, which form the core +user-level environment installed on millions of Unix systems, and arguably +are the single most heavily tested set of open-source programs in +existence. KLEE-generated tests achieve high line coverage — on average over 90% +per tool (median: over 94%) — and significantly beat the coverage of the +developers' own hand-written test suites. When we did the same for 75 +equivalent tools in the BUSYBOX embedded system suite, results were even +better, including 100% coverage on 31 of them. +We also used KLEE as a bug finding tool, applying it to 452 applications (over +430K total lines of code), where it found 56 serious bugs, including +three in COREUTILS that had been missed for over 15 years. Finally, we +used KLEE to cross-check purportedly identical BUSY-BOX and COREUTILS utilities, +finding functional correctness errors and a myriad of inconsistencies. ++ +
+Instruction selection is a key component of code generation. High +quality instruction selection is of particular importance in the embedded space where complex instruction sets are common and code +size is a prime concern. Although instruction selection on tree expressions is a well understood and easily solved problem, instruction selection on directed acyclic graphs is NP-complete. In this +paper we present NOLTIS, a near-optimal, linear time instruction +selection algorithm for DAG expressions. NOLTIS is easy to im- +plement, fast, and effective with a demonstrated average code size +improvement of 5.1% compared to the traditional tree decomposi- +tion and tiling approach. ++ +
+We propose an automatic instrumentation method for +embedded software annotation to enable performance modeling in +high level hardware/software co-simulation environments. The +proposed "cross-annotation" technique consists of extending a +retargetable compiler infrastructure to allow the automatic +instrumentation of embedded software at the basic block level. +Thus, target and annotated native binaries are guaranteed to have +isomorphic control flow graphs (CFG). The proposed method takes +into account the processor-specific optimizations at the compiler +level and proves to be accurate with low simulation overhead. ++ +
+ "Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation"+ +
+ Aimen Bouchhima, Patrice Gerin, Frédéric Pétrot
+ Proceedings of the 14th Asia South Pacific Design Automation Conference (ASP-DAC'09), Yokohama, Japan, January 09 +
@inproceedings{bouchhima_gerin09aspdac, + author = {A. Bouchhima and P. Gerin and F. P\'etrot}, + title = { Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation }, + booktitle = {{ Proceeding of the 14th Asia South Pacific Design Automation Conference (ASP-DAC'09) }}, + year = {2009}, + address = {Yokohama, Japan}, + month = {January}, +} ++ + +
+Design of Multiprocessor System-on-a-Chips requires efficient and accurate simulation of every component. Since the memory subsystem accounts for up to 50% of the performance and energy expenditures, it has to be considered in system-level design space exploration. In this paper, we present a novel technique to simulate memory accesses in software TLM/T models. We use a compiler to automatically expose all memory accesses in software and annotate them onto efficient TLM/T models. A reverse address map provides target memory addresses for accurate cache and memory simulation. Simulating at more than 10MHz, our models allow realistic architectural design space explorations on memory subsystems. We demonstrate our approach with a design exploration case study of an industrial-strength MPEG-2 decoder. ++ +
+ "Memory subsystem simulation in software TLM/T models"+ +
+ Eric Cheung, Harry Hsieh, and Felice Balarin
+ Proceedings of the 14th Asia South Pacific Design Automation Conference (ASP-DAC'09), Yokohama, Japan, January 09 +
+@inproceedings{1509814, + author = {Cheung, Eric and Hsieh, Harry and Balarin, Felice}, + title = {Memory subsystem simulation in software TLM/T models}, + booktitle = {ASP-DAC '09: Proceedings of the 2009 Conference on Asia and South Pacific Design Automation}, + year = {2009}, + isbn = {978-1-4244-2748-2}, + pages = {811--816}, + location = {Yokohama, Japan}, + publisher = {IEEE Press}, + address = {Piscataway, NJ, USA}, + } ++ + +
+The goal of the Parfait project is to find bugs in C source code in a scalable and precise way. To this end, Parfait was designed as a framework with layers of sound program analyses, multiple layers per bug type, to identify bugs in a program more quickly and accurately.+ ++ +Parfait also aims to identify security bugs, i.e., bugs that may be exploited by a malicious user. To this end, an optional pre-processing step is available to reduce the scope of potential bugs of interest.
+ +To evaluate Parfait's precision and recall, we have developed BegBunch, a bug benchmarking suite that contains existing synthetic benchmarks and samples of bugs ("bug kernels") taken from open source code.
+
+"Program analysis for bug detection using Parfait"+
+Cristina Cifuentes, Nathan Keynes, Lian Li, and Bernhard Scholz.
+ +Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation (PEPM'09) +, Savannah, GA, January 2009. +
+@inproceedings{1480947, + author = {Cifuentes, Cristina and Keynes, Nathan and Li, Lian and Scholz, Bernhard}, + title = {Program analysis for bug detection using Parfait}, + booktitle = {PEPM '09: Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation}, + year = {2009}, + isbn = {978-1-60558-327-3}, + pages = {7--8}, + location = {Savannah, GA, USA}, + doi = {http://doi.acm.org/10.1145/1480945.1480947}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Pointer analysis is a prerequisite for many program analyses, and the effectiveness of these analyses depends on the precision of the pointer information they receive. Two major axes of pointer analysis precision are flow-sensitivity and context-sensitivity, and while there has been significant recent progress regarding scalable context-sensitive pointer analysis, relatively little progress has been made in improving the scalability of flow-sensitive pointer analysis.+ ++ +This paper presents a new interprocedural, flow-sensitive pointer analysis algorithm that combines two ideas-semi-sparse analysis and a novel use of BDDs-that arise from a careful understanding of the unique challenges that face flow-sensitive pointer analysis. We evaluate our algorithm on 12 C benchmarks ranging from 11K to 474K lines of code. Our fastest algorithm is on average 197x faster and uses 4.6x less memory than the state of the art, and it can analyze programs that are an order of magnitude larger than the previous state of the art. +
+"Semi-sparse flow-sensitive pointer analysis"+
+Ben Hardekopf and Calvin Lin.
+ +Proceedings of the 2009 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL'09) +, Savannah, GA, January 2009. +
+@inproceedings{1480911, + author = {Hardekopf, Ben and Lin, Calvin}, + title = {Semi-sparse flow-sensitive pointer analysis}, + booktitle = {POPL '09: Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages}, + year = {2009}, + isbn = {978-1-60558-379-2}, + pages = {226--238}, + location = {Savannah, GA, USA}, + doi = {http://doi.acm.org/10.1145/1480881.1480911}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+As computer systems become more and more complex, it becomes harder to ensure that they are dependable i.e. reliable and secure. Existing dependability techniques do not take into account the characteristics of the application and hence detect errors that may not manifest in the application. This results in wasteful detections and high overheads. In contrast to these techniques, this dissertation proposes a novel paradigm called "Application-Aware Dependability", which leverages application properties to provide low-overhead, targeted detection of errors and attacks that impact the application. The dissertation focuses on derivation, validation and implementation of application-aware error and attack detectors.+ ++The key insight in this dissertation is that certain data in the program is more important than other data from a reliability or security point of view (we call this the critical data). Protecting only the critical data provides significant performance improvements while achieving high detection coverage. The technique derives error and attack detectors to detect corruptions of critical data at runtime using a combination of static and dynamic approaches. The derived detectors are validated using both experimental approaches and formal verification. The experimental approaches validate the detectors using random fault-injection and known security attacks. The formal approach considers the effect of all possible errors and attacks according to a given fault or threat model and finds the corner cases that escape detection. The detectors have also been implemented in reconfigurable hardware in the context of the Reliability and Security Engine (RSE) +
+ Automated Derivation of Application-Aware Error and Attack Detectors, Karthik Pattabiraman.+ +
+ Ph.D Thesis, Computer Science Dept., University of Illinois at + Urbana-Champaign, 2009. +
+Because of its critical importance underlying all other software, low-level +system software is among the most important targets for formal verification. +Low-level systems software must sometimes make type-unsafe memory accesses, +but because of the vast size of available heap memory in today's computer systems, +faithfully representing each memory allocation and access does not scale +when analyzing large programs. Instead, verification tools rely on abstract memory +models to represent the program heap. This paper reports on two related +investigations to develop an accurate (i.e., providing a useful level of soundness +and precision) and scalable memory model: First, we compare a recently introduced +memory model, specifically designed to more accurately model low-level +memory accesses in systems code, to an older, widely adopted memory model. +Unfortunately, we find that the newer memory model scales poorly compared to +the earlier, less accurate model. Next, we investigate how to improve the +soundness of the less accurate model. A direct approach is to add assertions to the code +that each memory access does not break the assumptions of the memory model, +but this causes verification complexity to blow-up. Instead, we develop a novel, +extremely lightweight static analysis that quickly and conservatively guarantees +that most memory accesses safely respect the assumptions of the memory model, +thereby eliminating almost all of these extra type-checking assertions. Furthermore, +this analysis allows us to create automatically memory models that flexibly +use the more scalable memory model for most of memory, but resorting to a more +accurate model for memory accesses that might need it. ++ +
+The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost. ++ +
+"Mapping parallelism to multi-cores: a machine learning based approach"+ +
+Zheng Wang and Michael F.P. O'Boyle.
+ +Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP'09) +, Raleigh, NC, USA, February 2009. +
+@inproceedings{1504189, + author = {Wang, Zheng and O'Boyle, Michael F.P.}, + title = {Mapping parallelism to multi-cores: a machine learning based approach}, + booktitle = {PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming}, + year = {2009}, + isbn = {978-1-60558-397-6}, + pages = {75--84}, + location = {Raleigh, NC, USA}, + doi = {http://doi.acm.org/10.1145/1504176.1504189}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Most modern microprocessor-based systems provide support for superpages both at the hardware and software level. Judicious use of superpages can significantly cut down the number of TLB misses and improve overall system performance. However, indiscriminate superpage allocation results in page fragmentation and increased application footprint, which often outweigh the benefits of reduced TLB misses. Previous research has explored policies for smart allocation of superpages from an operating systems perspective. This paper presents a compiler-based strategy for automatic and profitable memory allocation via superpages. A significant advantage of a compiler-based approach is the availability of data-reuse information within an application. Our strategy employs data-locality analysis to estimate the TLB demands of a program and uses this metric to determine if the program will benefit from superpage allocation. Apart from its obvious utility in improving TLB performance, this strategy can be used to improve the effectiveness of certain data-layout transformations and can be a useful tool in benchmarking and empirical tuning. We demonstrate the effectiveness of this strategy with experiments on an Intel Core 2 Duo with a two-level TLB. ++ +
+ "A Case for Compiler-driven Superpage Allocation" ++
+ Joshua Magee and Apan Qasem. +
+ +Proceedings of the 47th ACM Southeast Regional Conference (ACMSE09) +, Mar 2009. +
+Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits the ability to properly test multithreaded code, becoming a major stumbling block to the much-needed widespread adoption of parallel programming.+ ++In this paper we make the case for fully deterministic shared memory multiprocessing (DMP). The behavior of an arbitrary multithreaded program on a DMP system is only a function of its inputs. The core idea is to make inter-thread communication fully deterministic. Previous approaches to coping with nondeterminism in multithreaded programs have focused on replay, a technique useful only for debugging. In contrast, while DMP systems are directly useful for debugging by offering repeatability by default, we argue that parallel programs should execute deterministically in the field as well. This has the potential to make testing more assuring and increase the reliability of deployed multithreaded software. We propose a range of approaches to enforcing determinism and discuss their implementation trade-offs. We show that determinism can be provided with little performance cost using our architecture proposals on future hardware, and that software-only approaches can be utilized on existing systems. + + +
+ "DMP: deterministic shared memory multiprocessing"+ +
+ Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin.
+ Proceedings of the Fourteenth International Conference on + Architectural Support for Programming Languages and Operating Systems (ASPLOS '09), + Washington DC, March, 2009. +
+We describe a strategy for enabling existing commodity operating +systems to recover from unexpected run-time errors in nearly any part +of the kernel, including core kernel components. Our approach is +dynamic and request-oriented, in the sense that it isolates the +effects of a fault to requests that cause a fault rather than to +static kernel components. The approach is based on a notion of +``recovery domains,'' an organizing principle to enable partial +rollback of affected state within a request in a multithreaded system. +We have applied this approach to the Linux kernel and it required less +than 126 lines of changed or new code: the other changes are all +performed by a simple instrumentation pass of a compiler. Our +experiments show that the approach is able to recover from otherwise +fatal faults with minimal collateral impact during a recovery event. ++ +
+ "Recovery Domains: An Organizing Principle for Recoverable Operating Systems"+ +
+ Andrew Lenharth, Samuel T. King, and and Vikram Adve.
+ Proceedings of the Fourteenth International Conference on + Architectural Support for Programming Languages and Operating Systems + (ASPLOS '09), + Washington DC, March, 2009. +
+As semiconductor technology scales into the deep +submicron regime the occurrence of transient or soft errors will +increase. This will require new approaches to error detection. +Software checking approaches are attractive because they require +little hardware modification and can be easily adjusted to fit different reliability and performance requirements. Unfortunately, +software checking adds a significant performance overhead. + +In this paper we present ESoftCheck, a set of compiler +optimization techniques to determine which are the vital checks, +that is, the minimum number of checks that are necessary to +detect an error and roll back to a correct program state. ESoftCheck identifies the vital checks on platforms where registers +are hardware-protected with parity or ECC, when there are +redundant checks and when checks appear in loops. ESoftCheck +also provides knobs to trade reliability for performance based +on the support for recovery and the degree of trustiness of the +operations. Our experimental results on a Pentium 4 show that +ESoftCheck can obtain 27.1% performance improvement without +losing fault coverage. + ++ +
+ "ESoftCheck: Removal of Non-vital Checks for Fault Tolerance"+ +
+ Jing Yu, Maria Jesus Garzaran, and Marc Snir
+ Proceedings of the Seventh International Symposium on + Code Generation and Optimization (CGO '09), + Seattle WA, March, 2009. +
+Register allocation is a fundamental part of any optimizing compiler. Effectively managing the limited register resources of the constrained architectures commonly found in embedded systems is essential in order to maximize code quality. In this paper we deconstruct the register allocation problem into distinct components: coalescing, spilling, move insertion, and assignment. Using an optimal register allocation framework, we empirically evaluate the importance of each of the components, the impact of component integration, and the effectiveness of existing heuristics. We evaluate code quality both in terms of code performance and code size and consider four distinct instruction set architectures: ARM, Thumb, x86, and x86-64. The results of our investigation reveal general principles for register allocation design. ++ +
+"Register Allocation Deconstructed"+ +
+David Ryan Koes and Seth Copen Goldstein
+ +Proceedings of the 12th International Workshop on Software and Compilers for Embedded Systems (SCOPES'09) +, Nice, France, April 2009. +
+@inproceedings{1543824, + author = {Koes, David Ryan and Goldstein, Seth Copen}, + title = {Register allocation deconstructed}, + booktitle = {SCOPES '09: Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems}, + year = {2009}, + isbn = {978-1-60558-696-0}, + pages = {21--30}, + location = {Nice, France}}, + } ++ + +
+Instruction set simulation based on dynamic compilation is a popular approach that focuses on fast simulation of user-visible features according to the instruction-set-architecture abstraction of a given processor. Simulation of interrupts, even though they are rare events, is very expensive for these simulators, because interrupts may occur anytime at any phase of the programs execution. Many optimizations in compiling simulators can not be applied or become less beneficial in the presence of interrupts.+ ++ +We propose a rollback mechanism in order to enable effective optimizations to be combined with cycle accurate handling of interrupts. Our simulator speculatively executes instructions of the emulated processor assuming that no interrupts will occur. At restore-points this assumption is verified and the processor state reverted to an earlier restore-point if an interrupt did actually occur. All architecture dependent simulation functions are derived using an architecture description language that is capable to automatically generate optimized simulators using our new approach.
+ +We are able to eliminate most of the overhead usually induced by interrupts. The simulation speed is improved up to a factor of 2.95 and compilation time is reduced by nearly 30% even for lower compilation thresholds. +
+"Precise simulation of interrupts using a rollback mechanism"+ +
+Florian Brandner
+ +Proceedings of the 12th International Workshop on Software and Compilers for Embedded Systems (SCOPES'09) +, Nice, France, April 2009. +
+@inproceedings{1543833, + author = {Brandner, Florian}, + title = {Precise simulation of interrupts using a rollback mechanism}, + booktitle = {SCOPES '09: Proceedings of the 12th International Workshop on Software and Compilers for Embedded Systems}, + year = {2009}, + isbn = {978-1-60558-696-0}, + pages = {71--80}, + location = {Nice, France}}, + } ++ + +
+Random access memory (RAM) is tightly constrained in the least expensive, lowest-power embedded systems such as sensor network nodes and portable consumer electronics. The most widely used sensor network nodes have only 4 to 10KB of RAM and do not contain memory management units (MMUs). It is difficult to implement complex applications under such tight memory constraints. Nonetheless, price and power-consumption constraints make it unlikely that increases in RAM in these systems will keep pace with the increasing memory requirements of applications.+ ++ +We propose the use of automated compile-time and runtime techniques to increase the amount of usable memory in MMU-less embedded systems. The proposed techniques do not increase hardware cost, and require few or no changes to existing applications. We have developed runtime library routines and compiler transformations to control and optimize the automatic migration of application data between compressed and uncompressed memory regions, as well as a fast compression algorithm well suited to this application. These techniques were experimentally evaluated on Crossbow TelosB sensor network nodes running a number of data-collection and signal-processing applications. Our results indicate that available memory can be increased by up to 50% with less than 10% performance degradation for most benchmarks. +
+"MEMMU: Memory expansion for MMU-less embedded systems"+ +
+Lan S. Bai, Lei Yang, and Robert P. Dick
+ +Proceedings of the ACM Transactions on Embedded Computing Systems (TECS) +, New York, NY, April 2009. +
+@article{1509295, + author = {Bai, Lan S. and Yang, Lei and Dick, Robert P.}, + title = {MEMMU: Memory expansion for MMU-less embedded systems}, + journal = {ACM Trans. Embed. Comput. Syst.}, + volume = {8}, + number = {3}, + year = {2009}, + issn = {1539-9087}, + pages = {1--33}, + doi = {http://doi.acm.org/10.1145/1509288.1509295}, + publisher = {ACM}, + address = {New York, NY, USA}, +} ++ + +
+In the past, implementing virtual machines has either been a custom process or an endeavour into interfacing an existing virtual machine using (relatively) low level programming languages like C. Recently there has been a boom in high level scripting languages, which claim to make a programmer more productive, yet the field of compiler design is still rooted firmly with low level languages. It remains to be seen why language processors are not implemented in high level scripting languages. ++ +
+The following report presents an investigation into designing and implementing computer languages using a modern compiler construction tool-kit called the "Low Level Virtual Machine" (LLVM), in combination with a modern scripting language. The report covers in detail traditional approaches to compiler construction, parsing and virtual machine theory. Comparisons are made between traditional approaches and the modern approach offered by LLVM, via an experimental object oriented language called 3c, developed using LLVM, the Aperiot parser, Python and an incremental development methodology. ++ +
+@techreport{EBARRETT:3C, + author = "Edward Barrett", + institution = "Bournemouth University", + year = 2009, + title = "3c - A {JIT} Compiler with {LLVM}", + url = "http://llvm.org/pubs/2009-05-21-Thesis-Barrett-3c.html" +} ++ +
++ ++Traditionally, the verification effort is applied to the abstract +algorithmic descriptions of the underlining software. However, even +well understood protocols such as Peterson’s protocol for mutual +exclusion, whose algorithmic description takes only half a page, have +published implementations that are erroneous. Furthermore, the +semantics of the implementations can be altered by optimizing +compilers, which are very large applications and, consequently, are +bound to have bugs. Thus, it is highly desirable to ensure the +correctness of the compiled code especially in safety critical and +high-assurance software. This dissertation describes two alternative +approaches that bring us closer to solving the problem.
+ +First, we present Compiler Validation via Analysis of the +Cross-Product (CoVaC) - a deductive framework for proving program +equivalence and its application to automatic verification of +transformations performed by optimizing compilers. To leverage the +existing program analysis techniques, we reduce the equivalence +checking problem to analysis of one system - a cross-product of the +two input programs. We show how the approach can be effectively used +for checking equivalence of single-threaded programs that are +structurally similar. Unlike the existing frameworks, our approach +accommodates absence of compiler annotations and handles most of the +classical intraprocedural optimizations such as constant folding, +reassociation, common subexpression elimination, code motion, branch +optimizations, and others. In addition, we have developed rules for +translation validation of interprocedural optimizations, which can be +applied when compiler annotations are available.
+ ++The second contribution is the pancam framework for model checking +multi-threaded C programs. pancam first compiles a multi-threaded C +program into optimized bytecode format. The framework relies on Spin, +an existing explicit state model checker, to orchestrate the program's +state space search. However, the program transitions and states are +computed by the pancam bytecode interpreter. A feature of our approach +is that not only pancam checks the actual implementation, but it can +also check the code after compiler optimizations. Pancam addresses the +state space explosion problem by allowing users to define data +abstraction functions and to constrain the number of allowed context +switches. We also describe a partial order reduction method that +reduces context switches using dynamic knowledge computed on-the-fly, +while being sound for both safety and liveness properties. +
+ +
+ "Ensuring Correctness of Compiled Code", Ganna Zaks,+ +
+ Ph.D. Thesis, Computer Science Dept., New York University, New York, NY, May 2009.
+
++ + + diff --git a/static/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.pdf b/static/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.pdf new file mode 100644 index 0000000..4e8eb0a Binary files /dev/null and b/static/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.pdf differ diff --git a/static/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.ps.gz b/static/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.ps.gz new file mode 100644 index 0000000..d5b82e0 Binary files /dev/null and b/static/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.ps.gz differ diff --git a/static/pubs/2009-05-IWMSE-COMPASS.html b/static/pubs/2009-05-IWMSE-COMPASS.html new file mode 100644 index 0000000..5793048 --- /dev/null +++ b/static/pubs/2009-05-IWMSE-COMPASS.html @@ -0,0 +1,63 @@ + + + + + ++@PhdThesis{GZAKS:PHD, + author = {Ganna Zaks}, + title = "{Ensuring Correctness of Compiled Code}", + school = "{Computer Science Dept., New York University}", + year = {2009}, + address = {New York, NY}, + month = {May} +} ++
+The widespread adoption of multicores has renewed the emphasis on the use of parallelism to improve performance. The present and growing diversity in hardware architectures and software environments, however, continues to pose difficulties in the effective use of parallelism thus delaying a quick and smooth transition to the concurrency era. In this paper, we describe the research being conducted at Columbia University on a system called COMPASS that aims to simplify this transition by providing advice to programmers while they reengineer their code for parallelism. The advice proffered to the programmer is based on the wisdom collected from programmers who have already parallelized some similar code. The utility of COMPASS rests, not only on its ability to collect the wisdom unintrusively but also on its ability to automatically seek, find and synthesize this wisdom into advice that is tailored to the task at hand, i.e., the code the user is considering parallelizing and the environment in which the optimized program is planned to execute. COMPASS provides a platform and an extensible framework for sharing human expertise about code parallelization - widely, and on diverse hardware and software. By leveraging the "wisdom of crowds" model [30], which has been conjectured to scale exponentially and which has successfully worked for wikis, COMPASS aims to enable rapid propagation of knowledge about code parallelization in the context of the actual parallelization reengineering, and thus continue to extend the benefits of Moores law scaling to science and society. ++ +
+ "COMPASS: A Community-driven Parallelization Advisor for Sequential Software" ++
+ Simha Sethumadhavan, Nipun Arora, Ravindra Babu Ganapathi, John Demme, and Gail E. Kaiser. +
+ +Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering (IWMSE '09) +, Washington, DC, USA, May 2009. +
+@inproceedings{1569143, + author = {Sethumadhavan, Simha and Arora, Nipun and Ganapathi, Ravindra Babu and Demme, John and Kaiser, Gail E.}, + title = {COMPASS: A Community-driven Parallelization Advisor for Sequential Software}, + booktitle = {IWMSE '09: Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering}, + year = {2009}, + isbn = {978-1-4244-3718-4}, + pages = {41--48}, + doi = {http://dx.doi.org/10.1109/IWMSE.2009.5071382}, + publisher = {IEEE Computer Society}, + address = {Washington, DC, USA}, + } ++ + +
+We describe a tool for finding out-of-bounds memory access bugs in C programs, +which can be used to check programs with more than 100 000 lines of source code +in time comparable to compilation time. +Our tool doesn't search for out-of-bounds pointers, but only for dereferences of +out-of-bounds pointers (not including NULL dereferences). +An interprocedural analysis marks the variables that influence the validity of the accesses, +and by using slicing we reduce the program to only the code corresponding to these. +Afterwards, by using a variety of code optimization techniques, we obtain a code based on which +we create symbolic expressions, and using them we check whether accesses are within bounds using them. +If this is undecidable based purely on the symbolic expressions, we take into account +the predicates that dominate these memory accesses, and query a solver to determine the validity +of the access. +We eliminate the accesses we could prove, and repeat the above steps until we either prove all +accesses, or we reach the inlining limit. +The tool is successful in catching simple cases of common bugs encountered during application development +such as off-by-one bugs. ++ +
+@unpublished{ETOROK:BOUNDS, + author = {Edvin T\"{o}r\"{o}k}, + title = {Interprocedural bounds checker for C programs using symbolic +constraints and slicing}, + note = {final year thesis}, + school = {"Politehnica" University of Timisoara}, + year = {2009}, + month = {June} +} ++ +
+Data structures define how values being computed are stored and accessed within programs. By recognizing what data structures are being used in an application, tools can make applications more robust by enforcing data structure consistency properties, and developers can better understand and more easily modify applications to suit the target architecture for a particular application.+ ++This paper presents the design and application of DDT, a new program analysis tool that automatically identifies data structures within an application. A binary application is instrumented to dynamically monitor how the data is stored and organized for a set of sample inputs. The instrumentation detects which functions interact with the stored data, and creates a signature for these functions using dynamic invariant detection. The invariants of these functions are then matched against a library of known data structures, providing a probable identification. That is, DDT uses program consistency properties to identify what data structures an application employs. The empirical evaluation shows that this technique is highly accurate across several different implementations of standard data structures. +
+ "Toward Automatic Data Structure Replacement for Effective Parallelization" ++
+ Changhee Jung and Nathan Clark. +
+ +Workshop on Parallel Execution of Sequential Programs on Multicore Architectures (PESPMA'09) +, June 2009. +
+BIBTEX ++ + +
+Symbolic execution is a powerful technique for analyzing program behavior, finding bugs, and generating tests, but suffers from severely limited scalability: the largest programs that can be symbolically executed today are on the order of thousands of lines of code. To ensure feasibility of symbolic execution, even small programs must curtail their interactions with libraries, the operating system, and hardware devices. This paper introduces selective symbolic execution, a technique for creating the illusion of full-system symbolic execution, while symbolically running only the code that is of interest to the developer. We describe a prototype that can symbolically execute arbitrary portions of a full system, including applications, libraries, operating system, and device drivers. It seamlessly transitions back and forth between symbolic and concrete execution, while transparently converting system state from symbolic to concrete and back. Our technique makes symbolic execution practical for large software that runs in real environments, without requiring explicit modeling of these environments. ++ +
+ "Selective Symbolic Execution" ++
+ Vitaly Chipounov, Vlad Georgescu, Cristian Zamfir, and George Candea. +
+ +Fifth Workshop on Hot Topics in System Dependability +, Lisbon, Portugal, June 2009. +
+Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applications, several challenges still remain in programming the GPUs. Thus, it is clearly desirable to be able to program GPUs using a higher-level interface.+ ++ +In this paper, we offer a solution that targets a specific class of applications, which are the data mining and scientific data analysis applications. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to map the applications to a GPU. Several additional optimizations are also performed by the system.
+ +We have evaluated our system using three popular data mining applications, k-means clustering, EM clustering, and Principal Component Analysis (PCA). The main observations from our experiments are as follows. The speedup that each of these applications achieve over a sequential CPU version ranges between 20 and 50. The automatically generated version did not have any noticeable overheads compared to hand written codes. Finally, the optimizations performed in the system resulted in significant performance improvements. +
+"A Translation System for Enabling Data Mining Applications on GPUs"+ +
+Wenjing Ma and Gagan Agrawal.
+ +Proceedings of the 23rd International Conference on Supercomputing (ISC'09) +, Yorktown Heights, NY, June 2009. +
+@inproceedings{1542331, + author = {Ma, Wenjing and Agrawal, Gagan}, + title = {A translation system for enabling data mining applications on GPUs}, + booktitle = {ICS '09: Proceedings of the 23rd international conference on Supercomputing}, + year = {2009}, + isbn = {978-1-60558-498-0}, + pages = {400--409}, + location = {Yorktown Heights, NY, USA}, + doi = {http://doi.acm.org/10.1145/1542275.1542331}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+The advent of multicores presents a promising opportunity for speeding up the execution of sequential programs through their parallelization. In this paper we present a novel solution for efficiently supporting software-based speculative parallelization of sequential loops on multicore processors. The execution model we employ is based upon state separation, an approach for separately maintaining the speculative state of parallel threads and non-speculative state of the computation. If speculation is successful, the results produced by parallel threads in speculative state are committed by copying them into the computation’s non-speculative state. If misspeculation is detected, no costly state recovery mechanisms are needed as the speculative state can be simply discarded. Techniques are proposed to reduce the cost of data copying between non-speculative and speculative state and efficiently carrying out misspeculation detection. We apply the above approach to speculative parallelization of loops in several sequential programs which results in significant speedups on a Dell PowerEdge 1900 server with two Intel Xeon quad-core processors. ++ +
+ "Speculative Parallelization of Sequential Loops on Multicores" ++
+ Chen Tian, Min Feng, Vijay Nagarajan, Rajiv Gupta. +
+ +International Journal of Parallel Programming +, June 2009 +
+I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling algorithms like rejection sampling and Markov chain Monte Carlo, and solve difficult Bayesian inference problems.+ ++I first introduce Church, a probabilistic programming language for describing probabilistic generative processes that induce distributions, which generalizes Lisp, a language for describing deterministic procedures that induce functions. I highlight the ways randomness meshes with the reflectiveness of Lisp to support the representation of structured, uncertain knowledge, including nonparametric Bayesian models from the current literature, programs for decision making under uncertainty, and programs that learn very simple programs from data. I then introduce systematic stochastic search, a recursive algorithm for exact and approximate sampling that generalizes a popular form of backtracking search to the broader setting of stochastic simulation and recovers widely used particle filters as a special case. I use it to solve probabilistic reasoning problems from statistical physics, causal reasoning and stereo vision. Finally, I introduce stochastic digital circuits that model the probability algebra just as traditional Boolean circuits model the Boolean algebra. I show how these circuits can be used to build massively parallel, fault-tolerant machines for sampling and allow one to efficiently run Markov chain Monte Carlo methods on models with hundreds of thousands of variables in real time.
+I emphasize the ways in which these ideas fit together into a coherent software and hardware stack for natively probabilistic computing, organized around distributions and samplers rather than deterministic functions. I argue that by building uncertainty and randomness into the foundations of our programming languages and computing machines, we may arrive at ones that are more powerful, flexible and efficient than deterministic designs, and are in better alignment with the needs of computational science, statistics and artificial intelligence. +
+ Natively Probabilistic Computation, Vikash Kumar Mansinghka.+ +
+ Ph.D Thesis, Department of Brain and Cognitive Sciences, MIT, 2009. +
+High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, time-tested, meritorious code that are impractical to recreate from scratch. Cross-language bindings can expose low-level C code to high-level languages. Unfortunately, writing bindings by hand is tedious and error-prone, while mainstream binding generators require extensive manual annotation or fail to offer the language features that users of modern languages have come to expect.+ ++ +We present an improved binding-generation strategy based on static analysis of unannotated library source code. We characterize three high-level idioms that are not uniquely expressible in C's low-level type system: array parameters, resource managers, and multiple return values. We describe a suite of interprocedural analyses that recover this high-level information, and we show how the results can be used in a binding generator for the Python programming language. In experiments with four large C libraries, we find that our approach avoids the mistakes characteristic of hand-written bindings while offering a level of Python integration unmatched by prior automated approaches. Among the thousands of functions in the public interfaces of these libraries, roughly 40% exhibit the behaviors detected by our static analyses. +
+ "Automatic generation of library bindings using static analysis" ++
+ Tristan Ravitch, Steve Jackson, Eric Aderhold, and Ben Liblit. +
+ +Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI09) +, Dublin, Ireland, June 2009. +
+@article{1542516, + author = {Ravitch, Tristan and Jackson, Steve and Aderhold, Eric and Liblit, Ben}, + title = {Automatic generation of library bindings using static analysis}, + journal = {SIGPLAN Not.}, + volume = {44}, + number = {6}, + year = {2009}, + issn = {0362-1340}, + pages = {352--362}, + doi = {http://doi.acm.org/10.1145/1543135.1542516}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware. ++ +
+ "Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory" ++
+ Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke. +
+ +Proc. of the 2009 ACM SIGPLAN conference on Programming language design and implementation (PLDI'09) +, Dublin, Ireland, June 2009. +
+@inproceedings{1542495, + author = {Mehrara, Mojtaba and Hao, Jeff and Hsu, Po-Chun and Mahlke, Scott}, + title = {Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory}, + booktitle = {PLDI '09: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation}, + year = {2009}, + isbn = {978-1-60558-392-1}, + pages = {166--176}, + location = {Dublin, Ireland}, + doi = {http://doi.acm.org/10.1145/1542476.1542495}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+The serious bugs and security vulnerabilities facilitated by C/C++'s +lack of bounds checking are well known, yet C and C++ remain in +widespread use. Unfortunately, C's arbitrary pointer arithmetic, +conflation of pointers and arrays, and programmer-visible memory layout +make retrofitting C/C++ with spatial safety guarantees extremely +challenging. Existing approaches suffer from incompleteness, have high +runtime overhead, or require non-trivial changes to the C source code. +Thus far, these deficiencies have prevented widespread adoption of such +techniques. + +This paper proposes SoftBound, a compile-time transformation for +enforcing spatial safety of C. Inspired by HardBound, a +previously proposed hardware-assisted approach, SoftBound similarly +records base and bound information for every pointer as disjoint +metadata. This decoupling enables SoftBound to provide +spatial safety without requiring changes to C source code. Unlike +HardBound, SoftBound is a software-only approach and performs +metadata manipulation only when loading or storing pointer values. A +formal proof shows that this is sufficient to provide spatial safety +even in the presence of arbitrary casts. SoftBound's full checking +mode provides complete spatial violation detection with 67% +runtime overhead on average. To further reduce overheads, SoftBound has a store-only checking mode that successfully detects all the +security vulnerabilities in a test suite at the cost of only +21% runtime overhead on average. ++ +
+"SoftBound: Highly Compatible and Complete Spatial Memory Safety for C"+
+Santosh Nagarakatte, Jianzhou Zhao, Milo M K Martin and Steve Zdancewic.
+ +Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI09) +, Dublin, Ireland, June 2009. +
+@inproceedings{SoftBound:PLDI09, + author = {Santosh Nagarakatte and Jianzhou Zhao and Milo M.K. Martin and Steve Zdancewic}, + title = {SoftBound: Highly Compatible and Complete Spatial Safety for C}, + booktitle = {Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation}, + month = {June}, + year = {2009}, + location = {Dublin, Ireland}, +} ++ + +
+Benchmarks for bug detection tools are still in their infancy. Though in recent years various tools and techniques were introduced, little effort has been spent on creating a benchmark suite and a harness for a consistent quantitative and qualitative performance measurement. For assessing the performance of a bug detection tool and determining which tool is better than another for the type of code to be looked at, the following questions arise: 1) how many bugs are correctly found, 2) what is the tool's average false positive rate, 3) how many bugs are missed by the tool altogether, and 4) does the tool scale.+ ++ +In this paper we present our contribution to the C bug detection community: two benchmark suites that allow developers and users to evaluate accuracy and scalability of a given tool. The two suites contain buggy, mature open source code; bugs are representative of "real world" bugs. A harness accompanies each benchmark suite to compute automatically qualitative and quantitative performance of a bug detection tool.
+ +BegBunch has been tested to run on the Solaris, Mac OS X and Linux operating systems. We show the generality of the harness by evaluating it with our own Parfait and three publicly available bug detection tools developed by others.
+
+ "BegBunch: benchmarking for C bug detection tools" ++
+ Cristina Cifuentes, Christian Hoermann, Nathan Keynes, Lian Li, Simon Long, Erica Mealy, Michael Mounteney, Bernhard Scholz. +
+ +Proceedings of the 2nd International Workshop on Defects in Large Software Systems: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2009) +, Chicago, Illinois, July 2009. +
+@inproceedings{1555866, + author = {Cifuentes, Cristina and Hoermann, Christian and Keynes, Nathan and Li, Lian and Long, Simon and Mealy, Erica and Mounteney, Michael and Scholz, Bernhard}, + title = {BegBunch: benchmarking for C bug detection tools}, + booktitle = {DEFECTS '09: Proceedings of the 2nd International Workshop on Defects in Large Software Systems}, + year = {2009}, + isbn = {978-1-60558-654-0}, + pages = {16--20}, + location = {Chicago, Illinois}, + doi = {http://doi.acm.org/10.1145/1555860.1555866}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Modern processor architectures provide the possibility to execute an instruction +on multiple values at once. So-called SIMD (Single Instruction, Multiple +Data) instructions work on packets (or vectors) of data instead +of scalar values. They offer a significant performance boost for data-parallel +algorithms that perform the same operations on large amounts of data, e.g. data +encoding and decoding, image processing, or ray tracing. +However, the performance gain comes at a price: programming languages +provide no elegant means to exploit SIMD instruction sets. Packet operations +have to be coded by hand, which is complicated, unintuitive, and error prone. +Thus, packetization—the transformation of scalar code to packet +form—is mostly applied automatically by local compiler optimizations +(e.g. during loop vectorization) or with a lot of manual effort at +performance-critical parts of a system. ++ ++This thesis describes an algorithm for automatic packetization that +allows a programmer to write scalar functions but use them on +packets of data. A compiler pass automatically transforms those +functions to work on packets of the target-architecture's SIMD width. The +resulting packetized function computes the same results as multiple executions +of the scalar code. +
+The algorithm is implemented in a source-language and target-architecture +independent intermediate representation (the Low Level Virtual Machine +(LLVM)), which enables its use in many different environments. +
+The performance of the generated code is shown in a real-world case +study in the context of real-time ray tracing: serial shader code written in +C++ is automatically specialized, optimized, and packetized at runtime. The +packetized shaders outperform their scalar counterparts by an average factor +of 3.6 on a standard SSE architecture of SIMD width 4. +
+ Automatic Packetization, Ralf Karrenberg.+ +
+ Masters Thesis, Universität des Saarlandes, July 2009. +
+@MASTERSTHESIS{Karrenberg:09:MSc, + author = {Ralf Karrenberg}, + title = {{Automatic Packetization}}, + school = {Saarland University}, + year = {2009}, + month = {July}, + webpdf = {http://www.prog.uni-saarland.de/people/karrenberg/content/karrenberg_automatic_packetization.pdf} + } ++ + + diff --git a/static/pubs/2009-07-Karrenberg-Thesis.pdf b/static/pubs/2009-07-Karrenberg-Thesis.pdf new file mode 100644 index 0000000..082c166 Binary files /dev/null and b/static/pubs/2009-07-Karrenberg-Thesis.pdf differ diff --git a/static/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.html b/static/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.html new file mode 100644 index 0000000..b0495f6 --- /dev/null +++ b/static/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.html @@ -0,0 +1,85 @@ + + + + + +
+Systems that enforce memory safety for +today's operating system kernels and other system software +do not account for the +behavior of low-level software/hardware interactions such as +memory-mapped I/O, +MMU configuration, and context switching. Bugs in +such low-level interactions can lead to violations of the memory +safety guarantees provided by a safe execution environment and +can lead to exploitable vulnerabilities in system software. +In this work, we present a set of program analysis and run-time instrumentation +techniques that ensure that errors in these low-level operations +do not violate the assumptions made by a safety checking system. +Our design introduces a small set of abstractions and interfaces for +manipulating processor state, kernel stacks, memory mapped I/O +objects, MMU mappings, and self modifying code to achieve this goal, +without moving resource allocation and management decisions out of +the kernel. +We have added these techniques to a compiler-based virtual machine +called Secure Virtual Architecture (SVA), to which the standard Linux +kernel has been ported previously. Our design changes to SVA required +only an additional 100 lines of code to be changed in this kernel. Our +experimental results show that our techniques prevent reported +memory safety violations due to low-level Linux operations and that +these violations are not prevented by SVA +without our techniques. Moreover, the new techniques in this paper +introduce very little overhead over and above the existing +overheads of SVA. Taken together, these results indicate that it is +clearly worthwhile to add these techniques to an existing memory +safety system. ++ +
+@inproceedings{SVAOS:UsenixSec09, + author = {John Criswell, Nicolas Geoffray, and Vikram Adve}, + title = {Memory Safety for Low-Level Software/Hardware Interactions}, + booktitle = {Proceedings of the Eighteenth Usenix Security Symposium}, + month = {August}, + year = {2009}, + location = {Montreal, Canada}, +} ++ + +
+Parallel programs are increasingly being written using programming frameworks and other environments that allow parallel constructs to be programmed with greater ease. The data structures used allow the modeling of complex mathematical structures like linear systems and partial differential equations using high-level programming abstractions. While this allows programmers to model complex systems in a more intuitive way, it also makes the debugging and profiling of these systems more difficult due to the complexity of mapping these high level abstractions down to the low level parallel programming constructs. This work discusses mapping mechanisms, called variable blame, for creating these mappings and using them to assist in the profiling and debugging of programs created using advanced parallel programming techniques. We also include an example of a prototype implementation of the system profiling three programs. + ++ +
+ "Assigning Blame: Mapping Performance to High Level Parallel Programming Abstractions" ++
+ Nick Rutar and Jeffrey K. Hollingsworth. +
+ +Euro-Par 2009 Parallel Processing +, Delft, August 2009. +
+This paper presents a tool Altair that automatically generates API function cross-references, which emphasizes reliable structural measures and does not depend on specific client code. Altair ranks related API functions for a given query according to pair-wise overlap, i.e., how they share state, and clusters tightly related ones into meaningful modules.+ ++ +Experiments against several popular C software packages show that Altair recommends related API functions for a given query with remarkably more precise and complete results than previous tools, that it can extract modules from moderate-sized software (e.g., Apache with 1000+ functions) at high precision and recall rates (e.g., both exceeding 70% for two modules in Apache), and that the computation can finish within a few seconds. +
+ "API hyperlinking via structural overlap" ++
+ Fan Long, Xi Wang, and Yang Cai. +
+ +Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium (FSE'09) +, Amsterdam, The Netherlands, August 2009. +
+BIBTEX ++ + +
+Many techniques for power management employed in advanced RTL synthesis tools rely explicitly or implicitly on observability don't-care (ODC) conditions. In this paper we present a systematic approach to maximizing the effectiveness of these techniques by generating power-friendly RTL descriptions in a behavioral synthesis tool. We first introduce the concept of behavior-level observability and investigate its relation with observability under a given schedule, using an extension of Boolean algebra. We then propose an efficient algorithm to compute behavior-level observability on a data-flow graph. Our algorithm exploits knowledge about select and Boolean instructions, and allows certain forms of other knowledge, once uncovered, to be considered for stronger observability conditions. We also describe a behavioral synthesis flow where behavior-level observability is used to guide the scheduler toward maximizing the likelihood that execution of power-hungry instructions will be avoided under a latency constraint. Experimental results show that our approach is able to reduce total power, and it outperforms a previous method in [15] by 17.7% on average, on a set of real-world designs. To the best of our knowledge, this is the first work to use a comprehensive behavioral-level observability analysis to guide optimizations in behavioral synthesis. ++ +
+ "Behavior-Level Observability Don't-Cares and Application to Low-Power Behavioral Synthesis" ++
+ Jason Cong, Bin Liu, Zhiru Zhang. +
+ +Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design (ISLPED'09) +, San Francisco, CA, August 2009 +
+Future computer processors may have tens or hundreds of cores, increasing the need for efficient parallel programming models. The nature of multicore processors will present applications with the challenge of diversity: a variety of operating environments, architectures, and data will be available and the compiler will have no foreknowledge of the environment until run time. Adaptive Online Parallelization (ADOPAR) is a unifying framework that attempts to overcome diversity by separating discovery and packaging of parallelism. Scheduling for execution may then occur at run time when diversity may best be resolved.+ ++This work presents a compact representation of parallelism based on the task graph programming model, tailored especially for ADOPAR and for regular and irregular parallel computations. Task graphs can be unmanageably large for fine-grained parallelism. Rather than representing each task individually, similar tasks are grouped into task descriptors. From these, a task descriptor graph, with relationship descriptors forming the edges of the graph, may be represented. While even highly irregular computations often have structure, previous representations have chosen to restrict what can be easily represented, thus limiting full exploitation by the back end. Therefore, in this work, task and relationship descriptors have been endowed with instantiation functions (methods of descriptors that act as factories) so the front end may have a full range of expression when describing the task graph. The representation uses descriptors to express a full range of regular and irregular computations in a very flexible and compact manner.
+The representation also allows for dynamic optimization and transformation, which assists ADOPAR in its goal of overcoming various forms of diversity. We have successfully implemented this representation using new compiler intrinsics, allow ADOPAR schedulers to operate on the described task graph for parallel execution, and demonstrate the low code size overhead and the necessity for native schedulers. +
+ AN INTERNAL REPRESENTATION FOR ADAPTIVE ONLINE PARALLELIZATION, Koy D. Rehme.+ +
+ M.S. Thesis, Department of Electrical and Computer Engineering, Brigham Young University, August 2009. +
+While intraprocedural Static Single Assignment (SSA) is ubiquitous in modern compilers, the use of interprocedural SSA, although seemingly a natural extension, is limited. We find that part of the impediment is due to the narrow scope of variables handled by previously reported approaches, leading to limited benefits in optimization.+ ++ +In this study, we increase the scope of Interprocedural SSA (ISSA) to record elements and singleton heap variables. We show that ISSA scales reasonably well (to all MediaBench and most of the SPEC2K), while resolving on average 1.72 times more loads to their definition. We propose and evaluate an interprocedural copy propagation and an interprocedural liveness analysis and demonstrate their effectiveness on reducing input and output instructions by 44.5% and 23.3%, respectively. ISSA is then leveraged for constant propagation and dead code removal, where 11.8% additional expressions are folded. +
+ "Increasing the scope and resolution of Interprocedural Static Single Assignment" ++
+ Silvian Calman and Jianwen Zhu. +
+ +Proceeding of the 16th International Static Analysis Symposium (SAS 2009) +, Los Angeles, CA, August, 2009. +
+Locating software components which are responsible for observed failures is the most expensive, error-prone phase in the software development life cycle. Automated diagnosis of software faults can improve the efficiency of the debugging process, and is therefore an important process for the development of dependable software. In this paper we present a toolset for automatic fault localization, dubbed Zoltar, which adopts a spectrum-based fault localization technique. The toolset provides the infrastructure to automatically instrument the source code of software programs to produce runtime data, which is subsequently analyzed to return a ranked list of likely faulty locations. Aimed at total automation (e.g., for runtime fault diagnosis), Zoltar has the capability of instrumenting the program under analysis with fault screeners, for automatic error detection. Using a small thread-based example program as well as a large realistic program, we show the applicability of the proposed toolset. ++ +
+ "Zoltar: a spectrum-based fault localization tool" ++
+ Tom Janssen, Rui Abreu, and Arjan J.C. van Gemund. +
+ +Proceedings of the 2009 ESEC/FSE workshop on Software integration and evolution @ runtime +, Amsterdam, The Netherlands, August 2009. +
+@inproceedings{1596502, + author = {Janssen, Tom and Abreu, Rui and van Gemund, Arjan J.C.}, + title = {Zoltar: a spectrum-based fault localization tool}, + booktitle = {SINTER '09: Proceedings of the 2009 ESEC/FSE workshop on Software integration and evolution @ runtime}, + year = {2009}, + isbn = {978-1-60558-681-6}, + pages = {23--30}, + location = {Amsterdam, The Netherlands}, + doi = {http://doi.acm.org/10.1145/1596495.1596502}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+With the demand for energy-efficient embedded computing and the rise of heterogeneous architectures, automatically retargetable techniques are likely to grow in importance. On the one hand, retargetable compilers do not handle real-time constraints properly. On the other hand, conventional worst-case execution time (WCET) approaches are not automatically retargetable: measurement-based methods require time-consuming dynamic characterization of target processors, whereas static program analysis and abstract interpretation are performed in a post-compiling phase, being therefore restricted to the set of supported targets. This paper proposes a retargetable technique to grant early real-time checking (ERTC) capabilities for design space exploration. The technique provides a general (minimum, maximum and exact-delay) timing analysis at compile time. It allows the early detection of inconsistent time-constraint combinations prior to the generation of binary executables, thereby promising higher design productivity. ERTC is a complement to state-of-the-art design flows, which could benefit from early infeasiblity detection and exploration of alternative target processors, before the binary executables are submitted to tight-bound BCET and WCET analyses for the selected target processor. ++ +
+ "An Early Real-Time Checker for Retargetable Compile-Time Analysis" ++
+ Emilio Wuerges, Luiz C. V. dos Santos, Olinto Furtado, and Sandro Rigo. +
+ +Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design (SBCCI'09) +, Natal, Brazil, Sep 2009. +
+BIBTEX ++ + +
+Register allocation has gained renewed attention in the recent past. Several authors propose a separation of the problem into decoupled sub-tasks including spilling, allocation, assignment, and coalescing. This approach is largely motivated by recent advances in SSA-based register allocation that suggest that a decomposition does not significantly degrade the overall allocation quality.+ ++The algorithmic challenges of intra-procedural spilling have been neglected so far and very crude heuristics were employed. In this work, (1) we introduce the constrained min-cut (CMC) problem for solving the spilling problem, (2) we provide an integer linear program formulation for computing an optimal solution of CMC, and (3) we devise a progressive Lagrangian solver that is viable for production compilers. Our experiments with Spec2k and MiBench show that optimal solutions are feasible, even for very large programs, and that heuristics leave significant potential behind for small register files. +
+ "Progressive Spill Code Placement" ++
+ Dietmar Ebner, Bernhard Scholz, and Andreas Krall. +
+ +Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems (CASES'09) +, Grenoble, France, October 2009. +
+@inproceedings{1629408, + author = {Ebner, Dietmar and Scholz, Bernhard and Krall, Andreas}, + title = {Progressive spill code placement}, + booktitle = {CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems}, + year = {2009}, + isbn = {978-1-60558-626-7}, + pages = {77--86}, + location = {Grenoble, France}, + doi = {http://doi.acm.org/10.1145/1629395.1629408}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Multiprocessor System-on-Chips (MPSoCs) are nowadays widely used, but the problem of their software development persists to be one of the biggest challenges for developers. Virtual Platforms (VPs) are introduced to the industry, which allow MPSoC software development without a hardware prototype. Nevertheless, for developers in early design stage where no VP is available, the software programming support is not satisfactory.+ ++This paper introduces a High-level Virtual Platform (HVP) which aims at early MPSoC software development. The framework provides a set of tools for abstract MPSoC simulation and the corresponding application programming support in order to enable the development of reusable C code at a high level. The case study performed on several MPSoCs shows that the code developed on the HVP can be easily reused on different target platforms. Moreover, the high simulation speed achieved by the HVP also improves the design efficiency of software developers. + +
+ "A High-Level Virtual Platform for Early MPSoC Software Development" ++
+ Jianjiang Ceng, Weihua Sheng, Jeronimo Castrillon, Anastasia Stulova, Rainer Leupers, Gerd Ascheid and Heinrich Meyr. +
+ +Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis (CODES+ISSS'09) +, Grenoble, France, October 2009 +
+@inproceedings{1629438, + author = {Ceng, Jianjiang and Sheng, Weihua and Castrillon, Jeronimo and Stulova, Anastasia and Leupers, Rainer and Ascheid, Gerd and Meyr, Heinrich}, + title = {A high-level virtual platform for early MPSoC software development}, + booktitle = {CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis}, + year = {2009}, + isbn = {978-1-60558-628-1}, + pages = {11--20}, + location = {Grenoble, France}, + doi = {http://doi.acm.org/10.1145/1629435.1629438}, + publisher = {ACM}, + address = {New York, NY, USA}, + } ++ + +
+Profilers play an important role in software/hardware de- sign, optimization, and verification. Various approaches have been proposed to implement profilers. The most widespread approach adopted in the embedded domain is Instruction Set Simulation (ISS) based profiling, which pro- vides uncompromised accuracy but limited execution speed. Source code profilers, on the contrary, are fast but less accu- rate. This paper introduces TotalProf, a fast and accurate source code cross profiler that estimates the performance of an application from three aspects: First, code optimiza- tion and a novel virtual compiler backend are employed to resemble the course of target compilation. Second, an opti- mistic static scheduler is introduced to estimate the behav- ior of the target processor’s datapath. Last but not least, dynamic events, such as cache misses, bus contention and branch prediction failures, are simulated at runtime. With an abstract architecture description, the tool can be easily retargeted in a performance characteristics oriented way to estimate different processor architectures, including DSPs and VLIW machines. Multiple instances of TotalProf can be integrated with SystemC to support heterogeneous Multi- Processor System-on-Chip (MPSoC) profiling. With only about a 5 to 15% error rate introduced to the major per- formance metrics, such as cycle count, memory accesses and cache misses, a more than one Giga-Instruction-Per-Second (GIPS) execution speed is achieved. ++ +
+ "TotalProf: A Fast and Accurate Retargetable Source Code Profiler" ++
+ Lei Gao, Jia Huang, Jianjiang Ceng, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. +
+ +Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis +, Grenoble, France, October 2009. +
+The memory subsystem is one of the major performance bottlenecks in modern computer systems. While much effort is spent on the optimization of codes which access data regularly, not all codes will do so. Programs using pointer linked data structures are notorious for producing such so called irregular memory access patterns. In this paper, we present a compilation and run-time framework that enables fully automatic restructuring of pointer-linked data structures for type-unsafe languages, such as C. The restructuring framework is based on run-time restructuring using run-time trace information. The compiler transformation chain first identifies disjoint data structures that are stored in type-homogeneous memory pools. Access to these pools is traced and from these run-time traces, a permutation vector is derived. The memory pool is restructured at run-time using this permutation, after which all pointers (both stack and heap) that refer to the restructured pool must be updated. While the run-time tracing incurs a considerable overhead, we show that restructuring pointer-linked data structures can yield substantial speedups and that in general, the incurred overhead is compensated for by the performance improvements. ++ +
+ "Automatic Restructuring of Linked Data Structures" ++
+ H.L.A. van der Spek, C.W.M. Holm and H.A.G. Wijshof. +
+ + Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC'09) +, Newark, DE, October 2009 +
+This thesis details the motivation, design and implementation of a new back-end for the Glasgow Haskell Compiler which uses the Low Level Virtual Machine compiler infrastructure for code generation. Haskell as implemented by GHC was found to map remarkably well onto the LLVM Assembly language, although some new approaches were required. The most notable of these being the use of a custom calling convention in order to implement GHC's optimisation feature of pinning STG virtual registers to hardware registers. In the evaluation of the LLVM back-end in regards to GHC's C and native code generator back-end, the LLVM back-end was found to offer comparable results in regards to performance in most situations with the surprising finding that LLVM's optimisations didn't offer any improvement to the run-time of the generated code. The complexity of the LLVM back-end proved to be far simpler though then either the native code generator or C back-ends and as such it offers a compelling primary back-end target for GHC. ++ +
+ Low Level Virtual Machine for Glasgow Haskell Compiler, David Terei.+ +
+ Bachelor's Thesis, Computer Science and Engineering Dept., The University of New South Wales, Oct 2009. +
++ +Data structures define how values being computed are stored and accessed within programs. By recognizing what data structures are being used in an application, tools can make applications more robust by enforcing data structure consistency properties, and developers can better understand and more easily modify applications to suit the target architecture for a particular application.
+ +This paper presents the design and application of DDT, a new program analysis tool that automatically identifies data structures within an application. An application binary is instrumented to dynamically monitor how the data is stored and organized for a set of sample inputs. The instrumentation detects which functions inter- act with the stored data, and creates a signature for these functions using dynamic invariant detection. The invariants of these functions are then matched against a library of known data structures, providing a probable identification. That is, DDT uses program consistency properties to identify what data structures an application employs. The empirical evaluation shows that this technique is highly accurate across several different implementations of standard data structures, enabling aggressive optimizations in many situations.
+ +
+ "DDT: Design and Evaluation of a Dynamic Program Analysis for Optimizing Data Structure Usage" ++
+ Changhee Jung, Nathan Clark. +
+ +Proc. of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture +, New York, NY, December 2009 +
+@inproceedings{1669122, + author = {Jung, Changhee and Clark, Nathan}, + title = {DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage}, + booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture}, + year = {2009}, + isbn = {978-1-60558-798-1}, + pages = {56--66}, + location = {New York, New York}, + doi = {http://doi.acm.org/10.1145/1669112.1669122}, + publisher = {ACM}, + address = {New York, NY, USA}, +} ++ + +
+This thesis investigates compiler optimisations applied to a JIT-compiling emulator which is used in a videcoding system. The JIT compiler is built using LLVM which provides some readily available optimisation passes. Some of these are evaluated, some are patched, and some new ones are implemented. The optimisations are evaluated by running a number of benchmarks. By applying optimisations that target redundancies introduced by the JIT-compiler, as well as some optimisations not performed on the firmware originally, a benchmark execution speed-up of between 20 and 70% is achieved. ++ +
+ Emulator Speed-up Using JIT and LLVM, Hans Wennborg.+ +
+ Master's Thesis, Lund University, Lund, Sweden, January 2010 +
++ +Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access and computation is not as direct as hardware description languages, and high-level synthesis of algorithmic descriptions may generate suboptimal implementations for bitwise computation-intensive applications. In this paper we introduce a bit-level transformation and optimization approach to assisting high-level synthesis of algorithmic descriptions. We introduce a bit-flow graph to capture bit-value information. Analysis and optimizing transformations can be performed on this representation, and the optimized results are transformed back to the standard data-flow graphs extended with a few instructions representing bitwise access. This allows high-level synthesis tools to automatically generate circuits with higher quality. Experiments show that our algorithm can reduce slice usage by 29.8% on average for a set of real-life benchmarks on Xilinx Virtex-4 FPGAs. In the meantime, the clock period is reduced by 13.6% on average, with an 11.4% latency reduction.
+
+ "Bit-Level Optimization for High-Level Synthesis and FPGA-Based Acceleration" ++
+ Jiyu Zhang, Zhiru Zhang, Sheng Zhou, Mingxing Tan, Xianhua Liu, Xu Cheng, Jason Cong +
+ +Proc. of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays +, Monterey, CA, February 2010. +
++ +This paper presents a novel framework to generating efficient custom instructions for common configurable processors with limited numbers of I/O ports in the register files and fixed-length instruction formats, such as RISCs. Unlike previous approaches which generate a single custom instruction from each subgraph, our approach generates a sequence of multiple custom instructions from each subgraph by applying high-level synthesis techniques such as scheduling and binding to the subgraphs. Because of this feature, our approach can provide both of the following two advantages simultaneously: (1) generation of effective custom instructions from Multiple Inputs Multiple Outputs (MIMO) subgraphs without any change in the configurable processor hardware and the instruction format, and (2) resource sharing among custom instructions. We performed synthesis, placement and routing of the automatically generated Custom Functional Units (CFUs) on an FPGA. Experimental results showed that our approach could generate custom instructions with significant speedups of 28% on average compared to a state-of-the-art framework of custom instruction generation for configurable processors with limited numbers of I/O ports in the register file and fixed-length instruction formats.
+
+ "Custom Instruction Generation for Configurable Processors with Limited Numbers of Operands" ++
+ Kenshu Seto and Masahiro Fujita +
+ +IPSJ Transactions on System LSI Design Methodology +, Volume 3. +
++ +Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.
+
+ "Conservation Cores: Reducing the Energy of Mature Computations" ++
+ Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, Michael Bedford Taylor +
+ +Proc. of the 15th international conference on Architectural Support for Programming Languages and Operating Systems, Pittsburgh, PA, March 2010. +
++ +This paper proposes an efficient hardware/software system that significantly enhances software security through diversified replication on multi-cores. Recent studies show that a large class of software attacks can be detected by running multiple versions of a program simultaneously and checking the consistency of their behaviors. However, execution of multiple replicas incurs significant overheads on today's computing platforms, especially with fine-grained comparisons necessary for high security. Orthrus exploits similarities in automatically generated replicas to enable simultaneous execution of those replicas with minimal overheads; the architecture reduces memory and bandwidth overheads by compressing multiple memory spaces together, and additional power consumption and silicon area by eliminating redundant computations. Utilizing the hardware architecture, Orthrus implements a fine-grained memory layout diversification with the LLVM compiler and can detect corruptions in both pointers and critical data. Experiments indicate that the Orthrus architecture incurs minimal overheads and provides a protection against a broad range of attacks.
+
+ "Orthrus: Efficient Software Integrity Protection on Multi-core" ++
+ Ruirui Huang, Daniel Y. Deng, G. Edward Suh +
+ +Proc. of the 15th international conference on Architectural Support for Programming Languages and Operating Systems, Pittsburgh, PA, March 2010. +
++ +Aggressive technology scaling provides designers with an ever increasing budget of cheaper and faster transistors. Unfortunately, this trend is accompanied by a decline in individual device reliability as transistors become increasingly susceptible to soft errors. We are quickly approaching a new era where resilience to soft errors is no longer a luxury that can be reserved for just processors in high-reliability, mission-critical domains. Even processors used in mainstream computing will soon require protection. However, due to tighter profit margins, reliable operation for these devices must come at little or no cost. This paper presents Shoestring, a minimally invasive software solution that provides high soft error coverage with very little overhead, enabling its deployment even in commodity processors with "shoestring" reliability budgets. Leveraging intelligent analysis at compile time, and exploiting low-cost, symptom-based error detection, Shoestring is able to focus its efforts on protecting statistically-vulnerable portions of program code. Shoestring effectively applies instruction duplication to protect only those segments of code that, when subjected to a soft error, are likely to result in user-visible faults without first exhibiting symptomatic behavior. Shoestring is able to recover from an additional 33.9% of soft errors that are undetected by a symptom-only approach, achieving an overall user-visible failure rate of 1.6%. This reliability improvement comes at a modest performance overhead of 15.8%.
+
+ "Shoestring: Probabilistic Soft Error Reliability on the Cheap" ++
+ Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke +
+ +Proc. of the 15th international conference on Architectural Support for Programming Languages and Operating Systems, Pittsburgh, PA, March 2010. +
++ +With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative techniques, speculative parallelism appears to be the key to continuing this trend for general-purpose applications. Recently-proposed code parallelization techniques, such as those by Bridges et al. and by Thies et al., demonstrate scalable performance on multiple cores by using speculation to divide code into atomic units (transactions) that span multiple threads in order to expose data parallelism. Unfortunately, most software and hardware Thread-Level Speculation (TLS) memory systems and transactional memories are not sufficient because they only support single-threaded atomic units. Multi-threaded Transactions (MTXs) address this problem, but they require expensive hardware support as currently proposed in the literature. This paper proposes a Software MTX (SMTX) system that captures the applicability and performance of hardware MTX, but on existing multicore machines. The SMTX system yields a harmonic mean speedup of 13.36x on native hardware with four 6-core processors (24 cores in total) running speculatively parallelized applications.
+
+ "Speculative Parallelization using Software Multi-Threaded Transactions" ++
+ Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin, David I. August. +
+ +Proc. of the 15th international conference on Architectural Support for Programming Languages and Operating Systems, Pittsburgh, PA, March 2010. +
++ +Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. While it may be possible for domain experts to optimize a high performance application for a very specific and well documented system, it may not perform as well or even function on a different system. Developers who have less experience with either the application domain or the system architecture may devote a significant effort to writing a program that merely functions correctly. We believe that a comprehensive analysis and modeling frame-work is necessary to ease application development and automate program optimization on heterogeneous platforms.
+ +This paper reports on an empirical evaluation of 25 CUDA applications on four GPUs and three CPUs, leveraging the Ocelot dynamic compiler infrastructure which can execute and instrument the same CUDA applications on either target. Using a combination of instrumentation and statistical analysis, we record 37 different metrics for each application and use them to derive relationships between program behavior and performance on heterogeneous processors. These relationships are then fed into a modeling frame-work that attempts to predict the performance of similar classes of applications on different processors. Most significantly, this study identifies several non-intuitive relationships between program characteristics and demonstrates that it is possible to accurately model CUDA kernel performance using only metrics that are available before a kernel is executed.
+
+ "Modeling GPU-CPU Workloads and Systems" ++
+ Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili +
+ +Proc. of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA, March 2010. +
+Managed Runtime Environments (MREs), such as the JVM and the CLI, form an +attractive environment for program execution, by providing portability and +safety, via the use of a bytecode language and automatic memory management, +as well as good performance, via just-in-time (JIT) compilation. +Nevertheless, developing a fully featured MRE, including e.g. +a garbage collector and JIT compiler, is a herculean +task. As a result, new languages cannot easily take advantage of the +benefits of MREs, and it is difficult to experiment with +extensions of existing MRE based languages. + +This paper describes and evaluates VMKit, a first attempt to build a common +substrate that eases the development of high-level MREs. We have successfully +used VMKit to build two MREs: a Java Virtual Machine and a Common +Language Runtime. We provide an extensive study of the lessons learned in +developing this infrastructure, and assess the ease of implementing new +MREs or MRE extensions and the resulting performance. In +particular, it took one of the authors only one month to develop a Common +Language Runtime using VMKit. VMKit furthermore has performance comparable +to the well established open source MREs Cacao, Apache Harmony and Mono, and is +1.2 to 3 times slower than JikesRVM on most of the DaCapo +benchmarks. + ++ +
+@inproceedings{geoffray10vmkit, + author = {N. Geoffray and G. Thomas and J.Lawall and G. Muller and B. Folliot}, + title = {{VMKit: a Substrate for Managed Runtime Environments}}, + booktitle = {Virtual Execution Environment Conference (VEE 2010)}, + publisher = {ACM Press}, + year = {2010}, + month = {March}, + address = {Pittsburgh, USA} +} ++ +
++ +The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many tasks, including debugging, testing, and automatic replication. In this work, we avoid these complications by eliminating their root cause: we develop a compiler and runtime system that runs arbitrary multithreaded C/C++ POSIX Threads programs deterministically.
+ +A trivial non-performant approach to providing determinism is simply deterministically serializing execution. Instead, we present a compiler and runtime infrastructure that ensures determinism but resorts to serialization rarely, for handling interthread communication and synchronization. We develop two basic approaches, both of which are largely dynamic with performance improved by some static compiler optimizations. First, an ownership-based approach detects interthread communication via an evolving table that tracks ownership of memory regions by threads. Second, a buffering approach uses versioned memory and employs a deterministic commit protocol to make changes visible to other threads. While buffering has larger single-threaded overhead than ownership, it tends to scale better (serializing less often). A hybrid system sometimes performs and scales better than either approach individually.
+
+ "CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution" ++
+ Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. +
+ +Proc. of the 15th international conference on Architectural Support for Programming Languages and Operating Systems +, Pittsburgh, PA, March 2010. +
+AMD's Advanced Synchronization Facility (ASF) is an x86 instruction set +extension proposal intended to simplify and speed up the +synchronization of concurrent programs. In this paper, we report our +experiences using ASF for implementing transactional memory. We have extended +a C/C++ compiler to support language-level transactions and generate +code that takes advantage of ASF. We use a software fallback mechanism for +transactions that cannot be committed within ASF (e.g., because of hardware +capacity limitations). Our evaluation uses a cycle-accurate x86 simulator +that we have extended with ASF support. Building a complete ASF-based software +stack allows us to evaluate the performance gains that a user-level program +can obtain from ASF. Our measurements on a wide range of benchmarks indicate +that the overheads traditionally associated with software transactional +memories can be significantly reduced with the help of ASF. ++ +
+ "Evaluation of AMD's Advanced Synchronization +Facility within a Complete Transactional Memory Stack" ++
+ Dave Christie and Jae-Woong Chung and Stephan Diestelhorst +and Michael Hohmuth and Martin Pohlack and Christof Fetzer and Martin Nowack +and Torvald Riegel and Pascal Felber and Patrick Marlier and Etienne Riviere +
+ +Proc. of the 5th ACM European Conference on Computer Systems +, Paris, France, April 2010. +
++ +Debugging real systems is hard, requires deep knowledge of the code, and is time-consuming. Bug reports rarely provide sufficient information, thus forcing developers to turn into detectives searching for an explanation of how the program could have arrived at the reported failure point.
+ +Execution synthesis is a technique for automating this detective work: given a program and a bug report, it automatically produces an execution of the program that leads to the reported bug symptoms. Using a combination of static analysis and symbolic execution, it “synthesizes” a thread schedule and various required program inputs that cause the bug to manifest. The synthesized execution can be played back deterministically in a regular debugger, like gdb. This is particularly useful in debugging concurrency bugs.
+ +Our technique requires no runtime tracing or program modifications, thus incurring no runtime overhead and being practical for use in production systems. We evaluate ESD, a debugger based on execution synthesis on popular software (e.g., the SQLite database, ghttpd Web server, HawkNL network library, UNIX utilities): starting from mere bug reports, ESD reproduces on its own several real concurrency and memory safety bugs in less than three minutes.
+
+ "Execution Synthesis: A Technique for Automated Software Debugging" ++
+ Cristian Zamfir and George Candea +
+ +Proc. of the 5th ACM European Conference on Computer Systems +, Paris, France, April 2010. +
++ ++Targeting the operating system (OS) kernel, kernel rootkits pose a formidable +threat to computer systems and their users. Recent efforts have made +significant progress in blocking them from injecting malicious code into the OS +kernel for execution. Unfortunately, they cannot block the emerging so-called +return-oriented rootkits (RORs). Without the need of injecting their own +malicious code, these rootkits can discover and chain together "return-oriented +gadgets" (that consist of only legitimate kernel code) for rootkit computation. +
+ ++In this paper, we propose a compiler-based approach to defeat these +return-oriented rootkits. Our approach recognizes the hallmark of +return-oriented rootkits, i.e., the ret instruction, and accordingly aims to +completely remove them in a running OS kernel. Specifically, one key technique +named return indirection is to replace the return address in a stack frame into +a return index and disallow a ROR from using their own return addresses to +locate and assemble return-oriented gadgets. Further, to prevent legitimate +instructions that happen to contain return opcodes from being misused,we also +propose two other techniques, register allocation and peephole optimization, to +avoid introducing them in the first place. We have developed a LLVM-based +prototype and used it to generate a return-less FreeBSD kernel. Our evaluation +results indicate that the proposed approach is generic, effective, and can be +implemented on commodity hardware with a low performance overhead. +
+
+ "Defeating Return-Oriented Rootkits with "Return-Less" Kernels" ++
+ Jinku Li, Zhi Wang, Xuxian Jiang, Michael Grace, and Sina Bahram +
+ +Proc. of the 5th ACM European Conference on Computer Systems +, Paris, France, April 2010. +
++ +This paper presents a technique that helps automate the reverse engineering of device drivers. It takes a closed-source binary driver, automatically reverse engineers the driver’s logic, and synthesizes new device driver code that implements the exact same hardware protocol as the original driver. This code can be targeted at the same or a different OS. No vendor documentation or source code is required.
+ +Drivers are often proprietary and available for only one or two operating systems, thus restricting the range of device support on all other OSes. Restricted device support leads to low market viability of new OSes and hampers OS researchers in their efforts to make their ideas available to the “real world.” Reverse engineering can help automate the porting of drivers, as well as produce replacement drivers with fewer bugs and fewer security vulnerabilities.
+ +Our technique is embodied in RevNIC, a tool for reverse engineering network drivers. We use RevNIC to reverse engineer four proprietary Windows drivers and port them to four different OSes, both for PCs and embedded systems. The synthesized network drivers deliver performance nearly identical to that of the original drivers.
+
+ "Reverse Engineering of Binary Device Drivers with RevNIC" ++
+ Vitaly Chipounov and George Candea +
+ +Proc. of the 5th ACM European Conference on Computer Systems +, Paris, France, April 2010. +
+In computer science profiling is the process of determining the execution frequencies of parts of a program. This can be done by instrumenting the program code with counters that are incremented when a part of the program is executed or by sampling the program counter at certain time intervals. From this data it is possible to calculate exact (in the case of counters) or relative (in the case of sampling) execution frequencies of all parts of the program.+ ++ +Currently the LLVM Compiler Infrastructure supports the profiling of programs by placing counters in the code and reading the resulting profiling data during consecutive compilations. But these counters are placed with a naive and inefficient algorithm that uses more counters than necessary. Also the recorded profiling information is not used in the compiler during optimisation or in the code generating backend when recompiling the program.
+ +This work tries to improve the existing profiling support in LLVM in several ways. First, the number of counters placed in the code is decreased as presented by Ball and Larus. Counters are placed only at the leaves of each functions control flow graph (CFG), which gives an incomplete profile after the program execution. This incomplete profile can be completed by propagating the values of the leaves back into the tree.
+ +Secondly, the profiling information is made available to the code generating backend. The CFG modifications and instruction selection passes are modified where necessary to preserve the profiling information so that backend passes and code generation can benefit from it. For example the register allocator is one such backend pass that could benefit since the spilling decisions are based on the execution frequency information.
+ +Thirdly, a compile time estimator to predict execution frequencies when no profiling information is available is implemented and evaluated as proposed by Wu et. al. This estimator is based on statistical data which is combined in order to give more accurate branch predictions as compared to methods where only a single heuristic is used for prediction.
+ +The efficiency of the implemented counter placing algorithm is evaluated by measuring profiling overhead for the naive and for the improved counter placement. The improvements from having the profiling information in the code generating backend is measured by the program performance for code which was compiled without and with profiling information as well as for code that was compiled using the compile time estimator. +
+ Efficient Profiling in the LLVM Compiler Infrastructure, Andreas Neustifter.+ +
+ Masters Thesis, Vienna University of Technology, April 2010. +
++ ++This talk describes the status and progress of LLVM and Clang +and integration into the FreeBSD base.
+ +
+ "ClangBSD", Roman Divacky,+ +
+ BSDcan, May 2010.
+
++ +Many computations exhibit a trade off between execution time and quality of service. A video encoder, for example, can often encode frames more quickly if it is given the freedom to produce slightly lower quality video. A developer attempting to optimize such computations must navigate a complex trade-off space to find optimizations that appropriately balance quality of service and performance.
+ +We present a new quality of service profiler that is designed to help developers identify promising optimization opportunities in such computations. In contrast to standard profilers, which simply identify time-consuming parts of the computation, a quality of service profiler is designed to identify subcomputations that can be replaced with new (and potentially less accurate) subcomputations that deliver significantly increased performance in return for acceptably small quality of service losses.
+ +Our quality of service profiler uses loop perforation (which transforms loops to perform fewer iterations than the original loop) to obtain implementations that occupy different points in the performance/quality of service trade-off space. The rationale is that optimizable computations often contain loops that perform extra iterations, and that removing iterations, then observing the resulting effect on the quality of service, is an effective way to identify such optimizable subcomputations. Our experimental results from applying our implemented quality of service profiler to a challenging set of benchmark applications show that it can enable developers to identify promising optimization opportunities and deliver successful optimizations that substantially increase the performance with only small quality of service losses.
+
+ "Quality of Service Profiling" ++
+ Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, Martin Rinard +
+ +Proc. of the 2010 IEEE 32st International Conference on Software Engineering +, Cape Town, South Africa, May 2010. +
++ ++Virtualization is being widely adopted in today’s +computing systems. Its unique security advantages in isolating +and introspecting commodity OSes as virtual machines (VMs) +have enabled a wide spectrum of applications. However, a com- +mon, fundamental assumption is the presence of a trustworthy +hypervisor. Unfortunately, the large code base of commodity +hypervisors and recent successful hypervisor attacks (e.g., VM +escape) seriously question the validity of this assumption. +In this paper, we present HyperSafe, a lightweight approach +that endows existing Type-I bare-metal hypervisors with a +unique self-protection capability to provide lifetime control- +flow integrity. Specifically, we propose two key techniques. The +first one – non-bypassable memory lockdown – reliably protects +the hypervisor’s code and static data from being compromised +even in the presence of exploitable memory corruption bugs +(e.g., buffer overflows), therefore successfully providing hyper- +visor code integrity. The second one – restricted pointer indexing +– introduces one layer of indirection to convert the control data +into pointer indexes. These pointer indexes are restricted such +that the corresponding call/return targets strictly follow the +hypervisor control flow graph, hence expanding protection to +control-flow integrity. We have built a prototype and used it to +protect two open-source Type-I hypervisors: BitVisor and Xen. +The experimental results with synthetic hypervisor exploits +and benchmarking programs show HyperSafe can reliably +enable the hypervisor self-protection and provide the integrity +guarantee with a small performance overhead. +
+
+ "HyperSafe: A Lightweight Approach to Provide Lifetime Hypervisor Control-Flow Integrity" ++
+ Zhi Wang and Xuxian Jiang +
+ +Proceedings of the Thirty First IEEE Symposium on Security & Privacy (Oakland +2010), + Oakland, CA, May 2010. +
+This report describes the current maturity state of the Clang/LLVM C/C++ +compiler. The main focus is on compile time and memory consumption compared +to GCC 4.2, the version used by the FreeBSD operating system project to avoid +the integration of GPLv3 code. ++ +
+ Clang/LLVM Maturity Report, Dominic Fandrey+ +
+ Proceedings of the Summer 2010 Research Seminar, Computer Science Dept., University of Applied Sciences Karlsruhe, June 2010. +
+ @Proceedings{Fandrey, + author = {Dominic Fandrey}, + title = "{Clang/LLVM Maturity Report}", + school = "{Computer Science Dept., University of Applied Sciences Karlsruhe}", + year = {2010}, + address = {Moltkestr. 30, 76133 Karlsruhe - Germany}, + month = {June}, + note = {{\em See {\tt http://www.iwi.hs-karlsruhe.de}.}} + } ++ + + diff --git a/static/pubs/2010-06-06-Clang-LLVM.pdf b/static/pubs/2010-06-06-Clang-LLVM.pdf new file mode 100644 index 0000000..165c971 Binary files /dev/null and b/static/pubs/2010-06-06-Clang-LLVM.pdf differ diff --git a/static/pubs/2010-06-ISCA-Relax.html b/static/pubs/2010-06-ISCA-Relax.html new file mode 100644 index 0000000..c6f9ed9 --- /dev/null +++ b/static/pubs/2010-06-ISCA-Relax.html @@ -0,0 +1,70 @@ + + + + + +
++ ++As technology scales ever further, device unreliability is creating excessive +complexity for hardware to maintain the illusion of perfect operation. In this +paper, we consider whether exposing hardware fault information to software and +allowing software to control fault recovery simplifies hardware design and helps +technology scaling.
+ +The combination of emerging applications and emerging many-core architectures +makes software recovery a viable alternative to hardware-based fault recovery. +Emerging applications tend to have few I/O and memory side-effects, which limits +the amount of information that needs checkpointing, and they allow discarding +individual sub-computations with small qualitative impact. Software recovery can +harness these properties in ways that hardware recovery cannot.
+ ++We describe Relax, an architectural framework for software recovery of hardware +faults. Relax includes three core components: (1) an ISA extension that allows +software to mark regions of code for software recovery, (2) a hardware +organization that simplifies reliability considerations and provides energy +efficiency with hardware recovery support removed, and (3) software support for +compilers and programmers to utilize the Relax ISA. Applying Relax to counter +the effects of process variation, our results show a 20% energy efficiency +improvement for PARSEC applications with only minimal source code changes and +simpler hardware. +
+
+ "Relax: An Architectural Framework for Software Recovery of Hardware Faults"+ +
+ M. de Kruijf, S. Nomura, and K.Sankaralingam
+In Proceedings of ISCA '10: International Symposium on Computer Architecture, +June 2010. +
++ ++Temporal memory safety errors, such as dangling pointer dereferences and double +frees, are a prevalent source of software bugs in unmanaged languages such as +C. Existing schemes that attempt to retrofit temporal safety for such languages +have high runtime overheads and/or are incomplete, thereby limiting their +effectiveness as debugging aids. This paper presents CETS, a compile-time +transformation for detecting all violations of temporal safety in C programs. +Inspired by existing approaches, CETS maintains a unique identifier with each +object, associates this metadata with the pointers in a disjoint metadata space +to retain memory layout compatibility, and checks that the object is still +allocated on pointer dereferences. A formal proof shows that this is sufficient +to provide temporal safety even in the presence of arbitrary casts if the +program contains no spatial safety violations. Our CETS prototype employs both +temporal check removal optimizations and traditional compiler optimizations to +achieve a runtime overhead of just 48% on average. When combined with a +spatial-checking system, the average overall overhead is 116% for complete +memory safety. +
+
+ "CETS: Compiler Enforced Temporal Safety for C" ++
+ Santosh Nagarakatte, Jianzhou Zhao, Milo M K Martin and Steve Zdancewic. +
+ +Proceedings of the International Symposium on Memory Management, + Toronto, Canada, June 2010. +
++ ++With the availability of chip multiprocessor (CMP) and simultaneous +multithreading (SMT) machines, extracting thread level parallelism from a +sequential program has become crucial for improving performance. However, many +sequential programs cannot be easily parallelized due to the presence of +dependences. To solve this problem, different solutions have been proposed. Some +of them make the optimistic assumption that such dependences rarely manifest +themselves at runtime. However, when this assumption is violated, the recovery +causes very large overhead. Other approaches incur large synchronization or +computation overhead when resolving the dependences. Consequently, for a loop +with frequently arising cross-iteration dependences, previous techniques are not +able to speed up the execution. In this paper we propose a compiler technique +which uses state separation and multiple value prediction to speculatively +parallelize loops in sequential programs that contain frequently arising +cross-iteration dependences. The key idea is to generate multiple versions of a +loop iteration based on multiple predictions of values of variables involved in +cross-iteration dependences (i.e., live-in variables). These speculative +versions and the preceding loop iteration are executed in separate memory states +simultaneously. After the execution, if one of these versions is correct (i.e., +its predicted values are found to be correct), then we merge its state and the +state of the preceding iteration because the dependence between the two +iterations is correctly resolved. The memory states of other incorrect versions +are completely discarded. Based on this idea, we further propose a runtime +adaptive scheme that not only gives a good performance but also achieves better +CPU utilization. We conducted experiments on 10 benchmark programs on a real +machine. The results show that our technique can achieve 1.7x speedup on average +across all used benchmarks. +
+
+ "Speculative Parallelization Using State Separation and Multiple Value Prediction" ++
+ Chen Tian, Min Feng and Rajiv Gupta. +
+ +Proceedings of the International Symposium on Memory Management, + Toronto, Canada, June 2010. +
++ ++We describe an interpolant-based approach to test generation and model checking for sequential programs. The method generates Floyd/Hoare style annotations of the program on demand, as a result of failure to achieve goals, in a manner analogous to conflict clause learning in a DPLL style SAT solver. +
+
+ "Lazy Annotation for Program Testing and Verification"+ +
+ Kenneth McMillan
+In Proceedings of Computer Aided Verification, +Edinburgh, UK, July 15-19, 2010. +
+@incollection {springerlink:10.1007/978-3-642-14295-6_10, + author = {McMillan, Kenneth}, + affiliation = {Cadence Berkeley Labs}, + title = {Lazy Annotation for Program Testing and Verification}, + booktitle = {Computer Aided Verification}, + series = {Lecture Notes in Computer Science}, + editor = {Touili, Tayssir and Cook, Byron and Jackson, Paul}, + publisher = {Springer Berlin / Heidelberg}, + isbn = {}, + pages = {104-118}, + volume = {6174}, + url = {http://dx.doi.org/10.1007/978-3-642-14295-6_10}, + note = {10.1007/978-3-642-14295-6_10}, + abstract = {We describe an interpolant-based approach to test generation and model checking for sequential programs. The method generates Floyd/Hoare style annotations of the program on demand, as a result of failure to achieve goals, in a manner analogous to conflict clause learning in a DPLL style SAT solver.}, + year = {2010} +} ++ + + +
++ ++Static Single Information form (SSI) is a program +representation that enables optimizations such as array bound checking +elimination and conditional constant propagation. Transforming a program +into SSI form has a non-negligible impact on compilation time; but, only +a few SSI clients, that is, optimizations that use SSI, require a full +conversion. This paper describes the SSI framework we have implemented +for the LLVM compiler, and that is now part of this compiler's standard +distribution. In our design, optimizing passes inform the compiler a list +of variables of interest, which are then transformed to present, fully +or partially, the SSI properties. It is provided to each client only the +subset of SSI that the client needs. Our implementation orchestrates the +execution of clients in sequence, avoiding redundant work when two +clients request the conversion of the same variable. As empirically +demonstrated, in the context of an industrial strength compiler, our +approach saves compilation time and keeps the program representation +small, while enabling a vast array of code optimizations. +
+
+ "Efficient SSI Conversion" ++
+ André Tavares and Fernando Magno Pereira and Mariza Bigonha and +Roberto Bigonha +
+ +In Proceedings of the 14th Brazilian Symposium on Programming Languages, + Salvador, Brazil, September 2010. +
+@INPROCEEDINGS{x:2010, + AUTHOR="André Tavares and Fernando Magno Pereira and Mariza Bigonha and + Roberto Bigonha", + TITLE="Efficient SSI Conversion", + BOOKTITLE="SBLP 2010", + ADDRESS="", + DAYS="27-29", + MONTH="sep", + YEAR="2010", + ABSTRACT="Static Single Information form (SSI) is a program + representation that enables optimizations such as array bound checking + elimination and conditional constant propagation. Transforming a program + into SSI form has a non-negligible impact on compilation time; but, only + a few SSI clients, that is, optimizations that use SSI, require a full + conversion. This paper describes the SSI framework we have implemented + for the LLVM compiler, and that is now part of this compiler's standard + distribution.In our design, optimizing passes inform the compiler a list + of variables of interest, which are then transformed to present, fully + or partially, the SSI properties. It is provided to each client only the + subset of SSI that the client needs. Our implementation orchestrates the + execution of clients in sequence, avoiding redundant work when two + clients request the conversion of the same variable. As empirically + demonstrated, in the context of an industrial strength compiler, our + approach saves compilation time and keeps the program representation + small, while enabling a vast array of code optimizations.", + KEYWORDS="Program transformations; Program analysis and verification; + Compilation and interpretation techniques", + URL="http://llvm.org/pubs/2010-08-SBLP-SSI.pdf", + } ++ + + +
++ ++The Integer-Overflow-to-Buffer-Overflow (IO2BO) vulnerability is an underestimated threat. Automatically identifying and fixing this kind of vulnerability are critical for software security. In this paper, we present the design and implementation of IntPatch, a compiler extension for automatically fixing IO2BO vulnerabilities in C/C++ programs at compile time. IntPatch utilizes classic type theory and dataflow analysis framework to identify potential IO2BO vulnerabilities, and then instruments programs with runtime checks. Moreover, IntPatch provides an interface for programmers to facilitate checking integer overflows. We evaluate IntPatch on a number of real-world applications. It has caught all 46 previously known IO2BO vulnerabilities in our test suite and found 21 new bugs. Applications patched by IntPatch have a negligible runtime performance loss which is averaging about 1%. +
+
+ "IntPatch: Automatically Fix Integer-Overflow-to-Buffer-Overflow Vulnerability at Compile-Time"+ +
+ Chao Zhang, Tielei Wang, Tao Wei, Yu Chen, and Wei Zou
+Proc. of the 15th European Symposium on Research in Computer Security (ESORICS 2010), +Athen, Greece, Sep. 2010 +
++ ++In the presence of ever-changing computer architectures, high-quality optimising +compiler backends are moving targets that require specialist knowledge and +sophisticated algorithms. In this paper, we explore a new backend for the +Glasgow Haskell Compiler (GHC) that leverages the Low Level Virtual Machine +(LLVM), a new breed of compiler written explicitly for use by other compiler +writers, not high-level programmers, that promises to enable outsourcing of +low-level and architecture-dependent aspects of code generation. We discuss the +conceptual challenges and our backend design. We also provide an extensive +quantitative evaluation of the performance of the backend and of the code it +produces. +
+
+ "An LLVM Backend for GHC"+ +
+ David A. Terei and Manuel M. T. Chakravarty
+In Proceedings ACM SIGPLAN Haskell Symposium 2010, +Baltimore MD, United States, September 2010. +
++ ++We present an in-depth analysis of the crash-recovery +problem and propose a novel approach to recover from +otherwise fatal operating system (OS) crashes. We show +how an unconventional, but careful, OS design, aided by +automatic compiler-based code instrumentation, offers a +practical solution towards the survivability of the entire +system. Current results are encouraging and show that +our approach is able to recover even the most critical OS +subsystems without exposing the failure to user applications +or hampering the scalability of the system. +
+
+ "We Crashed, Now What?"+ +
+ Cristiano Giuffrida, Lorenzo Cavallaro, and Andrew S. Tanenbaum
+In the Proceedings of the 6th Workshop on Hot Topics in System Dependability + (HotDep '10), +October 3, 2010, Vancouver, BC, Canada +
++ ++Deployed multithreaded applications contain many races because these applications are difficult to write, test, and debug. Worse, the number of races in deployed applications may drastically increase due to the rise of multicore hardware and the immaturity of current race detectors.
+ +LOOM is a "live-workaround" system designed to quickly and safely bypass application races at runtime. LOOM provides a flexible and safe language for developers to write execution filters that explicitly synchronize code. It then uses an evacuation algorithm to safely install the filters to live applications to avoid races. It reduces its performance overhead using hybrid instrumentation that combines static and dynamic instrumentation.
+ +We evaluated LOOM on nine real races from a diverse set of six applications, including MySQL and Apache. Our results show that (1) LOOM can safely fix all evaluated races in a timely manner, thereby increasing application availability; (2) LOOM incurs little performance overhead; (3) LOOM scales well with the number of application threads; and (4) LOOM is easy to use. +
+
+ "Bypassing Races in Live Applications with Execution Filters"+ +
+ Jingyue Wu, Heming Cui, Junfeng Yang
+In Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), +Vancouver, BC, Canada, Oct 2010 +
++ ++A deterministic multithreading (DMT) system eliminates nondeterminism in thread scheduling, simplifying the development of multithreaded programs. However, existing DMT systems are unstable; they may force a program to (ad)venture into vastly different schedules even for slightly different inputs or execution environments, defeating many benefits of determinism. Moreover, few existing DMT systems work with server programs whose inputs arrive continuously and nondeterministically.
+ +TERN is a stable DMT system. The key novelty in TERN is the idea of schedule memoization that memoizes past working schedules and reuses them on future inputs, making program behaviors stable across different inputs. A second novelty in TERN is the idea of windowing that extends schedule memoization to server programs by splitting continuous request streams into windows of requests. Our TERN implementation runs on Linux. It operates as user-space schedulers, requiring no changes to the OS and only a few lines of changes to the application programs. We evaluated TERN on a diverse set of 14 programs (e.g., Apache and MySQL) with real and synthetic workloads. Our results show that TERN is easy to use, makes programs more deterministic and stable, and has reasonable overhead. +
+
+ "Stable Deterministic Multithreading through Schedule Memoization"+ +
+ Heming Cui, Jingyue Wu, Chia-che Tsai, Junfeng Yang
+In Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), +Vancouver, BC, Canada, Oct 2010 +
++ ++Profiling monitors a program’s execution flow via the insertion of +counters at key points in the program. Profiling information can then +be used by a compiler’s optimization passes to increase the +performance of frequently executed sections of code. This document +describes the implementation of edge profiling, path profiling and a +method with which to combine profiles in the Low Level Virtual Machine +(LLVM) compiler infrastructure. +
+
+ "Implementation of Path Profiling in the Low-Level Virtual-Machine (LLVM) Compiler Infrastructure"+ +
+ Adam Preuss
+Technical Report #10-05, University of Alberta, +December, 2010, Alberta, Canada +
++ ++This talk introduces LLVM, giving a brief sense for its library based +design. It then dives into Clang to describe the end-user benefits of LLVM +compiler technology, finally wrapping up with mentions of Clang Static Analyzer, +LLDB, libc++ and the LLVM MC projects. +
+ +
+ "LLVM and Clang: Advancing Compiler Technology", Chris Lattner,+ +
+ FOSDEM 2011: Free and Open Source Developers' European Meeting, Brussels, Belgium, Feb 2011.
+
++ ++SIMD parallelism has become an increasingly important mechanism for delivering +performance in modern CPUs, due its power efficiency and relatively low cost in +die area compared to other forms of parallelism. Unfortunately, languages and +compilers for CPUs have not kept up with the hardware's capabilities. Existing +CPU parallel programming models focus primarily on multi-core parallelism, +neglecting the substantial computational capabilities that are available in CPU +SIMD vector units. GPU-oriented languages like OpenCL support SIMD but lack +capabilities needed to achieve maximum efficiency on CPUs and suffer from +GPU-driven constraints that impair ease of use on CPUs. +
+ ++We have developed a compiler, the Intel SPMD Program Compiler (ispc), that +delivers very high performance on CPUs thanks to effective use of both multiple +processor cores and SIMD vector units. ispc draws from GPU programming +languages, which have +shown that for many applications the easiest way to program SIMD units is to +use a single-program, multiple-data (SPMD) model, with each instance of the +program mapped to one SIMD lane. We discuss language features that make ispc +easy to adopt and use productively with existing software systems and show that +ispc delivers up to 35x speedups on a 4-core system and up to 240x speedups on +a 40-core system for complex workloads (compared to serial C++ code). +
+
+ "ispc: A SPMD Compiler for High-Performance CPU Programming"+ +
+ Matt Pharr and William R. Mark
+In Proceedings Innovative Parallel Computing (InPar), +San Jose, CA, May 2012. +
++ ++Integer overflow bugs in C and C++ programs are difficult to track down and may lead to fatal errors or exploitable vulnerabilities. Although a number of tools for finding these bugs exist, the situation is complicated because not all overflows are bugs. Better tools need to be constructed—but a thorough understanding of the issues behind these errors does not yet exist. We developed IOC, a dynamic checking tool for integer overflows, and used it to conduct the first detailed empirical study of the prevalence and patterns of occurrence of integer overflows in C and C++ code. Our results show that intentional uses of wraparound behaviors are more common than is widely believed; for example, there are over 200 distinct locations in the SPEC CINT2000 benchmarks where overflow occurs. Although many overflows are intentional, a large number of accidental overflows also occur. Orthogonal to programmers' intent, overflows are found in both well-defined and undefined flavors. Applications executing undefined operations can be, and have been, broken by improvements in compiler optimizations. Looking beyond SPEC, we found and reported undefined integer overflows in SQLite, PostgreSQL, SafeInt, GNU MPC and GMP, Firefox, GCC, LLVM, Python, BIND, and OpenSSL; many of these have since been fixed. Our results show that integer overflow issues in C and C++ are subtle and complex, that they are common even in mature, widely used programs, and that they are widely misunderstood by developers. +
+
+ "Understanding Integer Overflow in C/C++"+ +
+ Will Dietz, Peng Li, John Regehr, and Vikram Adve
+Proc. of the 2012 International Conference on Software Engineering (ICSE'12) +Zurich, Switzerland, June 2012. +
Awarded an ACM SIGSOFT Distinguished Paper Award
+ ++@InProceedings{DietzLi:ICSE12, + author = {Dietz, Will and Li, Peng and Regehr, John and Adve, Vikram}, + title = {Understanding Integer Overflow in C/C++}, + booktitle = {Proceedings of the 2012 International Conference on Software Engineering}, + series = {ICSE 2012}, + year = {2012}, + isbn = {978-1-4673-1067-3}, + location = {Zurich, Switzerland}, + pages = {760--770}, + numpages = {11}, + url = {http://dl.acm.org/citation.cfm?id=2337223.2337313}, + acmid = {2337313}, + publisher = {IEEE Press}, + address = {Piscataway, NJ, USA}, +} ++ + + +