Statistical Program Slicing: a Hybrid Slicing Technique for Analyzing Deployed Software

Bogdan Alexandru Stoica University of Chicago , Swarup K. Sahoo DeepFence Inc. , James R. Larus EPFL and Vikram S. Adve University of Illinois at Urbana-Champaign

Abstract.

Dynamic program slicing can significantly reduce the code developers need to inspect, by narrowing it down to only a subset of relevant program statements. However, despite an extensive body of research showing its usefulness, dynamic slicing is still short from production-level use due to the high cost of runtime instrumentation.

As an alternative, we propose statistical program slicing, a novel hybrid dynamic-static slicing technique that explores the trade-off between accuracy and runtime cost. Our approach relies on modern hardware support for control flow monitoring and a novel, cooperative heap memory tracing mechanism combined with static program analysis for data flow tracking. We evaluate statistical slicing for debugging on $\nrBugs$ failures from $\nrApps$ widely deployed applications and show it recovers $\woktAvgRRsliceT$ of the program statements on a dynamic slice with only $\wokAvgOverhead$ overhead.

Abstract.

Dynamic program slicing can significantly reduce the code developers need to inspect by narrowing it down to only a subset of relevant program statements. However, despite an extensive body of research showing its usefulness, dynamic slicing is still short from production-level use due to the high cost of runtime instrumentation.

program slicing, program analysis, hardware execution tracing, debugging

1. Introduction

Program slicing is a decomposition technique that extracts all program statements relevant to a particular seed instruction $\sigma$ . Informally, program slicing answers the following question: “Which instructions directly or indirectly affect the computations performed by $\sigma$ ?”. Initially proposed by Mark Weiser (weiser81, ) as a mechanism for debugging, program slicing has been successfully applied to multiple areas of software engineering such as testing, compiler optimizations, security, software maintenance, or code integration (tip94, ).

Broadly, there are two main flavors of program slicing: static and dynamic. Static slicing makes conservative assumptions about which program statements affect the seed. Specifically, this approach relies on static program analysis to determine any program statement that can potentially affect a target instruction regardless of the program execution. This, in turn, causes static slicing to quickly lose precision, particularly when applied to programs that make extensive use of pointers (zhang05, ). In contrast, dynamic slicing identifies the minimal set of statements that affect the seed for a particular program run (korel88, ; agrawal90, ). A dynamic slice is input-sensitive and uses program flow information captured at runtime. This makes dynamic slicing a more precise alternative when analyzing real-world programs (zhang03, ; zhang05, ).

Dynamic slicing is an elegant and intuitive mechanism to reason about a particular program execution. A vast amount of research was been dedicated to improving the technique and implementing efficient slicing tools. Significant improvements focused on precision and cost-effectiveness of dynamic slicing for in-house testing (zhang03, ; zhang06, ; xin09, ). Later, these algorithms were implemented as state-of-the-art dynamic slicing tools for source-level (swarup13, ; kasikci15, ) and machine-level code (reps16, ). Other tools tackled slicing multi-language applications (binkley14, ) by iteratively removing statements while ensuring semantics preservation. Several works proposed combining dynamic and static program analysis (mock05, ; wee10, ; kasikci17, ; devecsery18, ) or reconstruct slices at a coarser abstraction level (manu07, ; zhang06, ) in order to improve scalability.

Refer to caption — Figure 1. Example of a statistical program slice (Curl bug #965 (curlbug, )). The code snippet on the left highlights the data flow observed over multiple runs and depicts several data dependencies (blue arcs). The snippet in the center represents the control flow trace for the faulty run. The snippet on the right depicts a statistical slice after combing data and control flow observed at runtime, complemented by static program analysis.

1.1. Motivation

Despite significant improvements over the past decades, dynamic program slicing is still unsuited for analyzing production failures (ernst19, ; devecsery18, ). In particular, developers face two main challenges in the context of today’s modern software.

The first is related to the increasing difficulty of recovering faulty inputs in order to accurately reproduce production bugs offline. Recent studies (zhang17, ; cui18, ; kasikci17, ) show that for an increasing number of user-site bugs, failure-inducing inputs are difficult to record either due to the high cost of capturing context and environment information (zhang17, ; cui18, ), or because of privacy concerns (kasikci17, ).

The second is associated with the high cost of the underlying control and data flow instrumentation for computing full dynamic slices which is unsuited for production use (devecsery18, ; kasikci17, ). When failures cannot be reproduced offline because of missing or incomplete faulty inputs, developers default on runtime monitoring. This, in turn, creates a tension between deploying low-cost instrumentation and recording enough program behavior to reconstruct informative failure paths (orso15, ; parnin11, ; ernst19, ).

We believe it is worthwhile to revisit dynamic slicing at its core and design algorithms that yield useful program slices at low (production-grade) cost, without burdening developers to obtain failure-inducing inputs.

1.2. Contributions

In this paper, we propose statistical program slicing, a hybrid dynamic-static program slicing technique that explores the trade-offs between runtime overhead and slice accuracy. The key insight of our technique is that we can reconstruct dynamic slices incrementally from observed program behavior through a combination of cooperative dynamic analysis and static program analysis. Our approach is lightweight, with minimal effect on program execution; transparent, tracing executions without changes to the target binary; and, crucially, requires no knowledge of the faulty input.

We implement statistical program slicing in a prototype called Wok (section 3). Wok relies on Intel Processor Trace (intel17, )—an efficient branch recording extension available on modern hardware—to collect precise control flow from faulty runs. Since Intel PT lacks memory tracing capabilities, Wok further performs data flow analysis by using pointer tagging to strategically sample and track a subset of heap-allocated objects across multiple regular runs. Finally, Wok relies on offline static alias analysis to complement dynamic data flow collection.

Conceptually, Wok reconstructs an unsound approximation of the traditional program dependency graph (Horwitz1990, ) in an incremental fashion. For brevity, we call this new program abstraction an observed dependency graph.

Although distributing data flow monitoring reduces the overhead to under $5\%$ (see section 5), aggregating data dependencies over multiple runs poses an additional challenge: We cannot reliably compare memory accesses between two distinct runs due to program relocation techniques (e.g., address space layout randomization). To mitigate this issue, we discard memory address information and keep data dependencies as pairs of static program statements instead (similar to (agrawal90, )). We later partially recover control flow sensitivity by relying on the precise control flow from the faulty run collected by Intel PT and pruning data dependencies unrelated to the failure. As a consequence, our slices represent a projection of the dynamic slice on static code. Thus, conceptually, statistical slices dwell in the space between static and dynamic slices.

We use Wok to reconstruct dynamic slices for $\nrBugs$ bugs from $\nrApps$ widely used open source applications. We compare Wok’s output against Giri, a state-of-the-art program slicing tool (swarup13, ). Specifically, we (1) measure how many program statements of the original dynamic slice can our tool recover, (2) the size of our resulting statistical slices, as well as (3) the accuracy of the recovered failure path (i.e., the set of instructions between the fault and the root cause). Our evaluation (section 5) shows that Wok recovers $\woktAvgRRsliceT$ of the unique program statements of the dynamic slice with only $5\%$ runtime overhead. Overall, statistical slices are, on average, $89\%$ larger than original dynamic slices. However, the corresponding failure paths are, on average, only $21\%$ larger. We consider this is an acceptable trade-off between over-approximation and low overhead (i.e., $\leq 5\%$ ) that makes our technique suitable for production use.

To summarize, we make the following contributions:

—

We present statistical program slicing, a hybrid dynamic–static slicing technique that leverages efficient hardware support for runtime monitoring combined with a customized static alias analysis to improve slice accuracy (section 3);
—

We develop a novel technique for runtime data flow analysis by relying on pointer tagging and distributing memory tracing across multiple runs of the same program (section 3.1);
—

We evaluate our approach on $\nrBugs$ failures from $\nrApps$ widely deployed applications and show its usefulness for bug diagnosis. On average, we recover $\woktAvgRRsliceT$ of the program statements present on the corresponding dynamic slice with only $\wokAvgOverhead$ instrumentation overhead (section 5).

2. Running Example

We illustrate statistical slices on an excerpt from Curl bug #965 (curlbug, ). Curl allows users to fetch multiple URLs that share a common pattern (e.g., enumerating pages of a particular root domain) by using special symbols such as ”{}” (curlurl, ). For instance, fetching pages from multiple ICSE editions can be encoded as ”curl -0 https://conf.researchr.org/home/icse-{2019, 2020, 2021}”. In this particular code version, having an unmatched bracket causes urls->content in function next_url (line $3$ ) to become NULL which subsequently triggers a segmentation fault (curlbug, ).

Figure 1 depicts the tree key stages of our algorithm: data flow tracking (left), control flow tracing (center) and combining the two together in a program slice (right), While statistical slices are projected onto the source code, we collect control and data flow in the form of dependencies between program statements (blue and orange dotted arcs). This helps us establish causality and decide which instructions to include in the final slice.

The snippet on the left highlights Wok’s data flow tracking mechanism. Our tool aggregates data flow from multiple runs to establish dependencies between program statements accessing the same memory location. For example, tagging url during allocation (line $9$ ), enables Wok can subsequently track all direct references of url (lines $11$ , $14$ , $18$ ) and derivatives (line $12$ ), and establish read/write dependencies between program statements (e.g., lines $11$ and $18$ ).

The snippet in the center illustrates control flow tracing via Intel PT. While monitoring all runs, Wok retains control flow only when failures occurs. In this particular example, the program statements highlighted in yellow show the code blocks executed when bug #965 occurs.

The snippet on the right depicts the resulting statistical slice. Wok combines data flow aggregated from multiple runs with the precise control flow information from the faulty run. Data flow not exercised by the failure (e.g., line $14$ ) is discarded and control flow involving variables unrelated to the segmentation fault at line $3$ is disregarded (e.g., lines $8$ , $9$ , $20$ , etc.).

Finally, we rely on static analysis to include program dependencies not captured by our runtime tracing strategies. Typically these are relevant dependencies related to stack variables. In this particular example, static analysis helps us include the declaration line $7$ in the final slice.

3. Design

Wok operates in a client-server environment where runtime monitoring happens on the client (always-on) and slices are computed offline on the server whenever requested by developers. Client-side, Wok relies on a combination of hardware support, sampling, and cooperative tracing to capture control and data flow at runtime. Server-side, Wok combines execution information gathered cooperatively from multiple runs of the same program with customized static analysis, and computes a statistical slice on demand. We thus expect Wok to run in a cooperative environment such as a datacenter or deployed on multiple user end-points, similar to prior failure path reconstruction tools (ChilimbiLMNV09Holmes, ; kasikci15, ; liblit2007CBI, ; zhang17, ).

Client-side runtime monitoring (always-on). Wok monitors a target program on the client-side to record control and data flow information. Wok tracks control flow using Intel Processor Trace, a new hardware extension which allows always-on execution monitoring at low overhead (intel17, ) (section 3.2).

As Intel PT has no memory monitoring capabilities, Wok tracks data flow by adding an extra level of indirection to memory addressing and leveraging the hardware protection mechanism available on x86-64 architectures (section 3.1). Specifically, Wok intercepts memory allocations and tags the high order bits thus making pointers uncanonical (intel17, ). This forces the CPU to raise a protection fault and interrupt the target program every time a uncanonical address gets accessed. When a trap occurs, Wok extracts information about the memory access (program counter, memory address, etc.) to infer data dependencies between program statements accessing the same memory location. For brevity, we call this mechanism pointer poisoning.

The key advantage of pointer poisoning is to enable monitoring for a range of addresses simultaneously and allows setting watchpoints at data-structure granularity. Tagged bytes propagate virally every time the base pointer is copied or involved in pointer arithmetic. One drawback is the overhead associated with hardware interrupts as every CPU fault triggers a context switch. To reduce the cost, Wok samples a subset of the pointers to track per run over multiple executions.

Pointer poisoning is similar in spirit with behavioral watchpoints (kumar13, ). However, there are several key differences. Our mechanism generalizes to the entire heap and enables tracing heap memory arbitrarily. Pointer poisoning further avoids the high cost of interrupts by selectively monitoring only a fraction of the pointers per run. Moreover, the goal of our approach is to infer READ–WRITE dependencies between instructions that operate on the same bytes, as opposed to being switching dynamic binary translation on and off (kumar13, ).

Server-side offline processing. Wok computes statistical slices offline, on demand (e.g., after a failure occurs). Our tool combines data flow collected cooperatively from multiple runs and precise control flow from the faulty execution into an observed dependency graph where nodes represent static program statements and arcs represent control or data dependencies observed between them. This graph is further augmented with statically inferred memory dependencies (e.g., stack variables) using a specialized static alias analysis restricted to the faulty code recorded by the PT trace (section 3.3).

In order to compare and combine data flow from multiple runs, Wok keeps data dependencies as static pairs of program statements that accessed the same memory location thus discarding memory addresses, similar to (agrawal90, ). Thus, statistical slices represent a projected (i.e., flattened) version of their dynamic counterparts onto the source code (see Figure 1).

Usage model. Figure 2 illustrates the usage model of Wok. Developers deploy two pieces of software on the client’s machine: a shared library to perform pointer poisoning ( , Fig. 2) and a kernel driver to enable the Intel PT tracing ( ). When a failure occurs, developers instruct Wok to process the information collected at runtime and to compute a statistical program slice. First, Wok combines the control flow and data flow into arcs in the observed dependency graph ( ). Second, Wok augments the graph with statically inferred memory dependencies (e.g., stack variables) using a specialized static alias analysis restricted to faulty code only ( ). Third, Wok performs a backward traversal on this augmented observed dependency graph starting from the seed instruction and walking backwards on observed dependencies ( ). Lastly, developers are presented with a slice report detailing the static program statements that directly or indirectly affect the seed ( ).

3.1. Data–Flow Monitoring

The virtual address space on x86-64 architectures can accommodate up to $2^{64}$ addresses (i.e., $16$ exabytes) which far exceeds the memory needs of modern applications This far exceeds the needs of modern applications as current memory management implementations use only $48$ bits to represent memory addresses (intel17, ). Current systems us only $48$ bits to represent addresses. The reminder high order $16$ bits are either all $0$ or $1$ . Departing from this specification makes the pointers uncanonical which raises a protection fault interrupting normal execution (intel17, ; kumar13, ).

⬇

1ptr = (unsigned int *)malloc(mySize);

2...

3// Pointer poisoning

4 ptag = rand(0x0001, 0xfffe);

5 ptr = ptr | (ptag << 48);

6...

7// Pointer unpoisoning

8 ptr = ptr & 0x0000ffffffffffff;

9*ptr = 42;

Listing 1: Pointer Poisoning Operations

Pointer Poisoning. We take advantage of the memory protection mechanism described above to implement pointer poisoning. We use a shared library to intercept heap allocation and tag base pointers. Adding and removing tags are simply bitwise operations as illustrated in Listing LABEL:lst:pp. Crucially, our approach makes no changes to the target binary and does not require recompilation.

Wok intercepts protection faults to peak at the current state of the execution. Specifically, Wok records the program counter, memory location and type of memory access. This information allows Wok to infer data dependencies between instructions that write and read the same address. Wok preserves tags for the entire life time of the corresponding object in order to ensure continuous monitoring and preserve memory consistency. Removing tags could further force us to maintain two versions of the same pointer— tagged and un-tagged—and ensure compiler optimizations such as direct pointer comparisons work correctly.

The benefits of pointer poisoning are two-fold. First, it allows us to trace memory virally: if $p$ is poisoned, its tag propagates whenever a new pointer is derived from $p$ . Second, unlike traditional hardware breakpoints that can trace a limited number of bytes at a time, poisoning enables coarse grain memory monitoring at the object level with potentially unlimited number of watchpoints.

The drawback is the associated overhead penalty incurred by each protection fault.

Selective Memory Instrumentation. To reduce the overhead, we use poisoned pointers economically. We strategically sample a subset of pointers to monitor per run. This allows us to disseminate an expensive memory tracing procedure across multiple runs of the same program in a cooperative setting (e.g., a datacenter).

To avoid under-sampling infrequent code paths and increase coverage, we rely on an adaptive sampling strategy that ensures objects are sampled inversely proportional to their access frequency. We rely on calling context (i.e., callstack) which is proven to be an effective predictor of objects with similar runtime behavior (hauswirth04, ; novark09, ; sumner08, ) to differentiate between allocation contexts and re-adjust sampling rates. Specifically, our policy oscillates between two phases:

—

low frequency: starting from $1.0$ (i.e., indiscriminate pointer poisoning), the sampling rate decreases exponentially if no memory accesses that lead to new data dependencies are being observed;
—

high frequency: once new memory accesses occur the sampling rate gets reset to $1.0$ and the process repeats.

We argue this best-effort strategy yields a reasonable approximation of the set of data dependencies executed during a faulty run and becomes more accurate with the number of runs observed. Intuitively, memory objects are often accessed from a limited number of places in the source code (barr13, ) (e.g., a linked list is accessed through dedicated member functions). Prior work has found that for short-running programs the set of memory accessing instructions is small and for long-running programs that set is mostly stable across different phases of the execution (novark09, ; chilimbi02, ). For this reason, we expect the number of data dependencies between instructions accessing memory objects to eventually converge.

We observe similar behavior when analyzing memory access patterns for our evaluations (section 5). Figure 3 illustrates the cumulative distribution of data dependencies collected as a function of the number of runs. As the number of runs increases, data dependency coverage begins to stabilize. And a little over half the inputs account for $\sim$ $80\%$ of the data dependencies for $5$ out of $\nrApps$ . SQLite’s behavior is an artifact of the available test suite that include numerous inputs designed to maximize branch coverage and test functionality in isolation (th3, ).

3.2. Control Flow Monitoring

We rely on Intel PT for runtime control flow tracing with low overhead. Intel PT is an extension of the Intel PMU architecture available on Broadwell and Skylake CPU families (intel17, ). Intel PT records control flow information in a highly compressed format ( $\sim$ $0.5$ bits per retired assembly instruction (intel17, )) which encodes all branches executed by the program. Compressed traces are available for offline decoding to precisely recover the execution. Moreover, Intel PT incurs a modest overhead of $3$ - $4\%$ (intel17, ; zhang17, ; kasikci15, ; kasikci17, ; ge2017, ).

In our usage model (Fig. 2), Wok decodes Intel PT traces on the server-side. Our tool uses this information to reconstruct precise control flow for the fault currently under investigation. Precise control flow is a necessary ingredient for computing program slices. Knowing which instructions executed during the faulty run helps Wok improve the accuracy of final slices in two ways. First, it allows our tool to filter data flow collected in aggregate at runtime (i.e., in step , Fig. 2) by pruning data dependencies that were not exercised during the faulty execution. Second, it improves the precision of static alias analysis by restricting the scope of the procedure to executed code only (i.e., step , Fig. 2).

3.3. Execution-driven Alias Analysis

The main source of imprecision when computing statistical program slices comes from sampling. In addition, the data flow analysis described in section 3.1 focuses on heap memory (i.e., by tracking pointers) and, up until now, ignores data dependencies that can be inferred statically (e.g., those related to stack variables).

We mitigate these shortcomings by using inter-procedural, flow-insensitive static alias analysis (andersen94, ). Static alias analysis is known to be imprecise, especially in the presence of pointers (weiser81, ; Korel1988, ; mock05, ; zhang05, ; kasikci13, ; nainar10, ; liblit03, ). To improve precision we restrict the scope of the procedure to code executed during the faulty run and retain only must aliases. Specifically, we use the PT trace to prune must aliasing pairs not exercised by the failure ( , Figure 2) when iteratively computing points-to sets. This allows us to perform an otherwise unscalable inter-procedural analysis using control flow information recorded by Intel PT for the faulty execution.

By disregarding may aliases, our procedure can potentially miss data dependencies for heap memory locations. We take this conservative approach in order to preserve soundness with respect to observed program execution. However, we will show that this restriction minimally impacts statistical slice accuracy in our evaluation (section 5).

3.4. Computing Statistical Program Slices

Wok aggregates data dependencies collected cooperatively over multiple runs and combines them with the precise control flow information from the failing run to form an observed dependency graph. Wok transforms each program statement into a node and each control and data dependency observed at runtime into an arc. To account for data dependencies potentially missed during sampling, Wok further complements this graph with must aliases arcs computed by the execution-driven static alias analysis described above (3.3). Finally, Wok performs a backward traversal on the observed dependency graph starting from the seed instruction (horwitz98, ).

We discard dynamic instance information for data dependencies since we cannot reliably order memory accesses from multiple runs. Typically, “flattening” program flow can potentially lead to a significant size increase for the final slice size (agrawal90, ; zhang05, ). However, we mitigate most of these effects by pruning spurious data flow not exercised during the target execution (section 5).

Algorithm 1 describes how Wok computes statistical program slices. Initially, our prototype adds the $seed$ to a queue. At each step it pops an unprocessed element from the queue and adds the relevant dependencies from the complemented observed dependency graph. Using LLVM conventions (lattner2004llvm, ) these dependencies can be either a conditional clause (e.g., if-then-else), a function call/return, a function argument, or a memory access.

slice\leftarrow\emptyset

queue\leftarrow\{seed\}

5 while $queue\neq\emptyset$ do

stmt\leftarrow queue.pop()

7 if $isConditional(stmt)$ then

cond\leftarrow getCondition(stmt)

queue\leftarrow queue\cup getDep(trace,cond)

11 else if $isCall(stmt)$ then

ret\leftarrow getRet(stmt)

queue\leftarrow queue\cup getDep(trace,ret)

15 else if $isArgument(stmt)$ then

args\leftarrow getArgs(stmt)

queue\leftarrow queue\cup getDep(trace,args)

19 else if $readsMemory(stmt)$ then

queue\leftarrow queue\cup getStores(stmt,obs\_dep\_graph)

22 else if $writesMemory(stmt)$ then

queue\leftarrow queue\cup getWriteOpnd(stmt)

queue\leftarrow queue\cup getDep(stmt,trace)

slice\leftarrow slice\cup stmt

28 end while

Algorithm 1 Statistical Slicing (simplified)

4. Implementation

We implement Wok using $2,643$ lines of C and C++ code for the runtime heap memory monitoring library, $342$ lines of Python code for the cooperative framework, $652$ lines of C code for the Intel PT driver (kleen, ), and $412$ lines of C++ code for the scope–restricted static alias analysis. Wok is designed to be lightweight and transparent, making no changes to the underlying target application. Rather, it uses a combination of function wrappers and the hardware memory protection mechanism (i.e., signals) in Linux for data flow tracing, and a loadable kernel driver to capture control flow.

Wok’s static alias analysis and backward slicing (i.e., Algorithm 1) are built on top of the LLVM framework (lattner2004llvm, ). Consequently, Wok relies on LLVM’s code analysis passes to infer static dependencies between instructions (e.g., def-use chains). Since Wok is designed as a development tool, we expect it to have access to the source code of the target program.

Wok uses a dynamically loaded library to wrap heap memory functions (i.e., malloc() and free()) to poison and unpoison heap pointers. Wok shifts valid pointers into the non–canonical address space by tagging their upper $4$ bytes to a non–zero value less than $65,535$ (0xfffff). Thus, it employs the hardware memory protection mechanism which interrupts execution by issuing a SIGSEGV signal to be handled in user–space. Wok intercepts the signal to patch the offending memory operand, record the instruction pointer (i,e., RIP register), the memory address and the type of access. It later uses this information to establish READ–WRITE dependencies between instructions that operate on the same memory location. To allow further monitoring, Wok preserves the tag by emulating in software the most frequently executed x86-64 instructions (e.g., mov, cmp, add, etc.). For the others, Wok lets the CPU execute the patched instruction and enables single–stepping (i.e., TF flag) to reapply the tag at the next instruction. Note that we cannot reliably remove tags after an interrupt without (i) preventing our tool from monitoring future memory accesses made through the newly sanitized pointer and (ii) driving program execution into an inconsistent state by potentially keeping two versions of the same pointer in memory (e.g., tagged and non-tagged).

Intel PT tracing is enabled and disabled via a kernel module that uses the Machine State Register (MSR) interface. Branches are stored in a $128MB$ hardware buffer that is occasionally flushed to RAM. Intel PT can trace both the main binary and library code. For our evaluation purposes we focus on computing slices for the main binary only.

5. Evaluation

In this section we aim to answer the following research questions about statistical slicing: How accurate is our technique when compared to textbook program slicing and does it help developers reconstruct the failure paths for remote, hard-to-reproduce bugs (5.2)? Also, can we compute statistical slices efficiently (5.3)?

5.1. Methodology

Benchmarks. To answer these questions, we evaluate Wok on $\nrApps$ widely utilized open-source applications: Clang (clangurl, ), a C/C++ compiler designed as drop-in replacement for GCC; SQLite (sqliteurl, ), an embedded database used in Chrome, Firefox, and Android; the Python interpreter (pythonurl, ); Cppcheck (cppurl, ), a C++ static analysis tool integrated with Visual Studio and Eclipse; Curl (curlurl, ) a data transfer tool for FTP and HTTP protocols; and Gzip (gzipurl, ), a data compression tool.

Workloads. We collect control and data flow information by running our test applications on a diverse set of inputs. We select inputs that maximize the functionality exercised at runtime (i.e., coverage). Our workloads include test suites, open-source and proprietary benchmarks (th3, ), and regular application-specific inputs collected online (e.g., most $1,000$ popular C/C++ applications on Github). In total, we track $3,873,616$ executions at less than $5\%$ overhead each.

Ground truth. We evaluate Wok on $\nrBugs$ software failures randomly sampled from bug repositories or faults used in previous research studies (swarup13, ; kasikci15, ). To minimize bias, we consider bugs already fixed that can be reproduced. We rely on the bug report to extract the seed instruction and the code patch that indicates the root cause as identified by the developer.

We compare Wokagainst textbook static and dynamic program slices generated by Giri, a state-of-the-art program slicing tool (swarup13, ). We evaluate our technique in terms of recovery rate, slice size and the ability to faithfully reconstruct failure paths.

Environment setup. We run our experiments on a machine with to $8$ -core Intel Xeon E5-2680 CPU with Intel PT support and $32$ GB of RAM running Ubuntu 18.04.01 LTS with kernel version 4.15.0-58.

5.2. Accuracy

We evaluate the accuracy of our technique on three different levels:

Bug ID	Giri — static	Giri — dynamic		Wok — failure only		Wok — cooperative
Bug ID	Giri — static	Slice	Root cause	Slice (%)	Root cause (%)	Slice (%)	Root cause (%)
gzip-46312	771	130	60	130 (100%)	60 (100%)	130 (98%)	58 (95%)
sqlite-be84e3	9,024	4,151	320	4,282 (99%)	318 (98%)	5,789 (94%)	391 (95%)
sqlite-78c0c8	28,278	3,783	178	3,931 (99%)	176 (98%)	5,305 (96%)	201 (98%)
sqlite-32b63d	28,045	2,035	35	2,160 (99%)	40 (100%)	3,423 (94%)	45 (97%)
sqlite-9f2eb3	26,534	3,972	118	4,089 (99%)	120 (100%)	5,623 (96%)	131 (95%)
sqlite-264b97	9,158	1,090	26	1,132 (99%)	27 (100%)	1,913 (91%)	25 (88%)
curl-965	771	130	60	130 (100%)	60 (100%)	130 (98%)	58 (95%)
python-12608	22,103	9,073	229	9,167 (100%)	216 (92%)	15,579 (95%)	232 (92%)
python-29028	63,640	7,226	724	7,431 (100%)	727 (100%)	14,274 (96%)	888 (95%)
python-27867	63,704	7,164	705	7,394 (100%)	708 (100%)	14,146 (97%)	865 (99%)
python-27945	59,899	7,076	39	7,289 (100%)	43 (100%)	18,094 (95%)	46 (96%)
cppchk-5780	43,904	628	48	1,718 (99%)	49 (100%)	3,031 (92%)	61 (94%)
cppchk-5909	59,926	241	45	267 (100%)	45 (100%)	1,097 (90%)	47 (100%)
cppchk-5950	65,123	1,914	9	1,951 (100%)	43 (100%)	3,245 (92%)	48 (100%)
cppchk-6059	53,322	1,189	10	1,217 (100%)	36 (100%)	2,021 (94%)	43 (90%)
cppchk-6106	42,805	1,227	26	1,239 (100%)	26 (100%)	2,960 (91%)	23 (89%)
clang-25156	536,170	3,935	29	5,307 (99%)	25 (78%)	5,336 (90%)	23 (74%)
clang-28116	565,205	6,291	30	9,416 (99%)	27 (90%)	10,061 (89%)	34 (83%)
clang-32638	43,053	5,545	53	6,716 (100%)	50 (87%)	6,922 (90%)	47 (81%)
clang-33082	498,586	4,946	32	6,045 (98%)	26 (90%)	9,652 (90%)	25 (87%)
clang-33471	568,555	6,794	71	7,946 (99%)	65 (90%)	8,883 (90%)	66 (91%)

Table 1. Statistical slicing evaluation. Slice sizes (Slice) and distances to root causes measured in lines of code (Root cause) for static, dynamic and statistical slices. Recovery rates between statistical and dynamic slices are shown in brackets.

Recovery rate. Given a precise dynamic slice as ground truth, we calculate what fraction of the program statements we actually recover. Specifically, if $\sigma_{wok}$ and $\sigma_{giri}$ two sets of unique program statements produced by Wok and Giri respectively, we defined the recovery rate as the ratio between $\big{|}\sigma_{wok}\cap\sigma_{giri}\big{|}$ and $\big{|}\sigma_{giri}\big{|}$

Slice size. Statistical program slices live in the space between static and dynamic slices. Thus, we further evaluate our slices in terms of size when compared to their textbook static and dynamic counterparts.

Failure path reconstruction. Finally, we evaluate how helpful statistical slices are during the fault diagnosis process. Specifically, we measure the number of static program statements developers have to inspect in order to identify the root cause. Similar to other hybrid code analysis tools (kasikci17, ; manu07, ; devecsery18, ), we use a breadth-first traversal to explore instructions in increasing distance from the slicing seed (e.g., the smallest “sphere” (binkley14, ; manu07, ))

Our evaluation works in three stages. First, we run each test program against the training workloads described above. Second, we compute statistical slices for each tested bug using the failing program statements as slice seeds. Finally, we compare the output against the program slices produced by Giri.

Typically, we expect Wok to be configured in an environment where it tracks a large and diverse set of inputs before computing slices. For completeness, we also consider a different, albeit less frequent scenario where a failure with the same symptoms is encountered repeatedly (liblit05, ; kasikci15, ; ChilimbiLMNV09Holmes, ; kasikci13, ). Thus, instead of observing program behavior during normal utilization, we track data flow statistically from a particular faulty execution.

Note that our central findings, discussion and conclusions focus on the former, workload-driven scenario, rather than the latter. However, allowing Wok to monitor the same faulty execution repeatedly is not without merit. Intuitively, our algorithm performs best when reconstructing slices using the minimal set of data dependencies necessary to compute precise dynamic slices. Therefore, such a setup provides us with an “upper bound” on the accuracy and a “lower bound” on the size of statistical slices. This also helps strengthen the hypothesis that as bug-related program flow coverage increases, Wok is able to better approximate dynamic slices (see Table 2).

Table 1 compares program slices, recovery rates and root cause diagnosis capabilities between Wok and Giri. We break measurements down into slice sizes (columns labeled ‘Slice’) and distance to the root cause (columns labeled ‘Root cause’). Recovery rates between statistical and dynamic slices are reported in brackets and calculated using the $\mu$ ratio described above. For each failure, we instruct Wok to operate in both the workload-driven and failure only scenarios descried above. Columns labeled ‘Wok — cooperative’ report numbers after collecting data flow information over the entire set of training inputs. Columns labeled ‘Wok — failure only’ show measurements when operating exclusively over the set of data dependencies exercised during the failing run. Finally, we compare those against the program slices generated by Giri (columns ‘Giri — static’ and ‘Giri — dynamic’).

Wok recovers $\woktAvgRRsliceT$ of the program statements present on a precise dynamic slice (i.e., ‘Wok — cooperative’ columns). Wok further recovers an average of $92\%$ of the instructions linking the symptom to the root cause. In terms of slice sizes, Wok’s output is, on average, $89\%$ larger than that of Giri. This is still $21\times$ smaller when compared to static slices. While larger than textbook dynamic slices, we argue this is an acceptable trade-off between over-approximation and a $\wokAvgOverhead$ runtime overhead (section 5.3).

Our evaluation further reveals that having data flow exclusively from the faulty run enables Wok to recover almost $100\%$ of the corresponding dynamic slices. As expected, having $100\%$ failure-related data flow coverage translates into near-perfect dynamic slice recovery rate. The few program statements missed are stack variable declarations which Wok is unable to detect through static program analysis. We plan to mitigate such side effects by switching to a more powerful alias analysis.

In part, missing program statements are caused by sampling. The other cause is the quality of the static program analysis which is inherently imprecise (devecsery18, ; kasikci17, ). A close inspection revealed that the current alias analysis implementation excludes some stack variable allocation instructions. However, we believe these to be the least critical components of a slice which developers can easily infer from the control flow portion of a statistical slice.

Ultimately, reconstruction accuracy is influenced by the ability of our tool to collect enough relevant data dependencies. As coverage increases, our prototype is able to better approximate dynamic slices.

Program slices are dependent upon the seed statements for which slicing is performed (zhang03, ), potentially impacting measurements in Table 1. To mitigate biases, we also devise a slice-independent evaluation and measure recovery rates between sets of data dependencies. This is further motivated by the fact that the data flow instrumentation is the main source of approximation for statistical slices.

Table 2 compares the set of dynamic data dependencies constructed by Wok and Giri respectively. On average, Wok recovers $82\%$ of the data dependencies located on the precise dynamic slices, while the recovery rate for full slices goes up to $\woktAvgRRsliceT$ . This increase can be attributed to the number of program dependencies linking two statements (typically, $1$ - $4$ in our measurements). Consequently, Wok needs only to capture only one of them to include the two instructions.

Bug ID	Giri	Wok	Overlap (%)
gzip-46312	39	41	39 (100%)
sqlite-be84e3	1,024	1,240	841 (82%)
sqlite-78c0c8	852	1,061	735 (87%)
sqlite-32b63d	781	1,094	648 (83%)
sqlite-9f2eb3	916	1,208	770 (86%)
sqlite-264b97	121	198	107 (88%)
curl-965	512	493	445 (88%)
python-12608	2,376	2,740	2,051 (86%)
python-29028	2,259	2,704	1,944 (87%)
python-27867	2,233	2,656	1,930 (86%)
python-27945	2,223	2,650	1,929 (85%)
cppchk-5780	712	1,617	529 (75%)
cppchk-5909	55	252	45 (80%)
cppchk-5950	612	1,195	507 (83%)
cppchk-6059	219	724	173 (79%)
cppchk-6106	158	218	173 (78%)
clang-25156	710	843	531 (75%)
clang-28116	984	1,663	777 (80%)
clang-32638	922	1113	718 (78%)
clang-33082	811	1019	631 (78%)
clang-33471	967	1303	777 (80%)

Table 2. Slice–independent evaluation. The size and overlap of the dynamic data dependency sets present on Giri and Wok’s dependence graphs, repsectively.

5.3. Efficiency

Figure 4 shows the runtime penalty incurred by Wok. The overhead decreases with the sampling rate from $2.05\times$ , $1.30\times$ , $1.08\times$ , to $1.05\times$ (geometric average). Assuming an overhead budget of $\leq 5\%$ , we chose an optimal sampling rate between $\tfrac{1}{10,000}$ (i.e., $1$ in $10,000$ pointers allocated) and $\tfrac{1}{100,000}$ .

At higher sampling rates, the “lion’s share” of the overhead is due to the hardware interrupts triggered by dereferencing poisoned pointers. On average, Wok incurs a penalty of $1.8$ $\mu$ seconds per dereference. At a close inspection we discovered that $50\%$ of the time is spent in kernel space, inside Linux’s signal handling mechanism (do_signal – $6.5\%$ , sys_rt_sigreturn – $3.3\%$ , and general_protection – $2.7\%$ ).

At lower sampling rates, the overhead is dominated by Intel PT. Intel PT incurs $5.8\%$ geometrical average slowdown with slight performance degradation for short running tasks (i.e., $\leq 1.0$ seconds). This is due to the start-up and tear-down costs that cannot be amortized by the branch tracing itself.

Developers using our tool can perform a similar sensitivity analysis to determine which sampling rate to use. The overhead of pointer poisoning for a given workload can be computed analytically as

exp\_overhead=1+\frac{avg\_itrp\_cost\times exp\_hmem\_acc}{orig\_time}

where $orig\_time$ is the running time of the application without instrumentation, $avg\_trap\_cost$ is the average cost of handling a single interrupt, and $exp\_hmem\_acc$ is the expected number of heap memory accesses for the current workload.

6. Discussion

In this section we address some remaining open questions.

Usage scenarios: We designed Wok to operate primarily in production distributed environment (e.g., a datacenter) where target applications can be monitored continuously across multiple executions. The main goal of our work is to help developers analyze remote failures when they lack the fault-inducing inputs or alternative means to reproduce bugs locally. While our approach addresses program slice without making assumptions about the type of failures or the utilization model, developers can use our tool in a variety of scenarios.

Wok can also be leveraged during in-house testing. Our configurable sampling policy make it particularly suited for adaptively more aggressive data flow instrumentation. Recent works (zhao17, ; cui18, ) suggests that programmers can tolerate up to $50\%$ performance degradation when using integrated test and build frameworks.

We envision statistical program slices as building blocks for additional program analysis for root cause diagnosis. For instance, rank-based fault localization techniques achieve better precision when operating on a near-minimal set of program statements relevant to a failure (ernst19, ; swarup13, ; manu07, ).

Alternatively, developers can also use our technique to perform more targeted monitoring by narrowing the scope of the underlying instrumentation with minimal manual intervention. For example, instead of tracing memory indiscriminately, programmers can easily restrict pointer poisoning to a smaller subset of data structures (e.g., using type information). Similarly, developers can chose which traces to include in the final observed dependency graph (e.g., faulty executions only).

Definition of a root cause: Providing a universal definition for root causes is still an open problem. Bugs are fixed differently by different developers and can potentially exhibit multiple root causes. Throughout this paper, we define the root cause as the set of program statements altered by the developer when patching the bug (swarup13, ). We believe this approach minimizes bias and allows us to compare Wok against a pre-established ground truth, independent from our technique.

Fail-stop bugs: Wok is a program slicing tool that requires an initial program statement to bootstrap its algorithm. If the bugs investigated are not fail-stop, developers need to define custom failure models (e.g., assertions) to establish an appropriate slicing seed. Like similar tools (liblit03, ; ChilimbiLMNV09Holmes, ; kasikci15, ; binkley14, ; madsen16, ; cui18, ), our prototype cannot help analyze latent failures that silently corrupt program state without externally-observable effects.

Availability of Intel PT: Our prototype focuses on Intel architectures and relies on Intel PT, a hardware extension available on the Broadwell microarchitecture (i.e., from 2014 onwards (intel17, )). However, statistical program slicing is not limited to a particular platform. Recent hardware-assisted control flow capabilities were added by other CPU manufacturers (e.g., ARM’s EMT(arm16, )) and existing implementations enable their utilization within virtual environments (kwon18, ; schumilo17, ).

PT trace size: For our evaluation, we configured Intel PT to record control flow in a $2$ MB ring buffer which was sufficient to reconstruct slices with the accuracy and efficiency numbers reported. This is far below the maximum buffer size of $128$ MB supported by the current PT generation. However, for long-running executions such a size may prove insufficient. The alternative proposed by the original system designers is to periodically save the contents of the ring buffer to disk, with a modest performance penalty. Recent works suggest that such cost gets amortized for long-running executions (cui18, ; ge2017, ).

7. Related Work

Program slicing. Static slicing was initially introduced by Weiser (weiser81, ) as a program decomposition technique to help with bug diagnosis. Ottenstein et al. (Ottenstein1984, ) refined the initial algorithm by recasting it as graph reachability problem. Horwitz et al. (Horwitz1990, ) further extended the approach by computing slices inter-procedurally. Despite substantial improvements, static program slicing lacks adequate precision as it still relies on conservative program analysis to maintain soundness (tip1994survey, ; zhang05, ).

Korel et al. (korel88, ) were the first to suggest using dynamic program flow information and coined the term dynamic program slicing. Agrawal et al. (agrawal90, ) explored various several trade-offs between precision and runtime overhead for this new technique. Zhang et al. (zhang03, ; zhang05, ) developed more efficient dynamic slicing algorithms without sacrificing precision. Although state-of-the-art, these techniques are prohibitively expensive and cannot be readily deployed in production.

Recent works use various strategies to prune spurious program statements (mock05, ; manu07, ; swarup13, ) or combine static and dynamic slicing for increased precision (wee10, ; devecsery18, ). Mock et al. (mock05, ) improve accuracy by trimming static slices using dynamic points-to information. Thin slicing (manu07, ) attempts to reduce the size of static slices by including producer statements only, namely program dependencies relevant to track value flow. Dual slicing (wee10, ) finds causal path for concurrent failures by alternating between slicing failing and successful executions. Sahoo et al. (swarup13, ) implement Giri – an efficient dynamic slicer based on (zhang03, ) – and use slicing to narrow down the set of potential root causes, offline. Optimistic Hybrid Analysis (devecsery18, ) improves dynamic slicing by using an underlying unsound static analysis, yet disregards control–flow information. These techniques are orthogonal to statistical program slicing and can be combined for additional benefits.

Statistical slicing can be viewed as an instantiation of observation-based slicing (ORBS) (binkley14, ). Unlike ORBS, our approach operates at the program binaries, constructing slices by collecting program dependencies over multiple unmodified runs, rather than iteratively deleting a maximum set of statements, recompile and check to retain execution behavior.

Adaptive failure path reconstruction. Our work draws inspiration by the Cooperative Bug Isolation framework (CBI) (liblit03, ; liblit2007CBI, ) and refinements (liblit05, ; nainar07, ; liu05, ; jiang07, ; chilimbi09, ; nainar10, ). Typically, CBI-based techniques achieve low overhead by using sparse sampling which, in turn, requires a failure to manifest tens or even hundreds of thousands of times to narrow down the root cause. In contrast, statistical program slicing can help developers diagnose bugs by observing a particular failure only once.

Kasikci et al. (kasikci15, ) attempt to reconstruct the tail end of a dynamic slice relying on hardware breakpoints and Intel Processor Trace, a successor of Intel’s Last Branch Record store (intel17, ). While significant improvements over CBI-based approaches, these techniques rely on the hypothesis that the distance between root causes and symptoms is small. However, assuming short failure propagation paths has proven inaccurate for complex bugs (cui18, ).

In a follow-up work, Kasikci et al. (kasikci17, ) attempt to correct this shortcoming by combining static program analysis with PT-based control flow monitoring to test pseudo–invariants based related to thread interleavings. However, this improved approach is still restrictive as it is tailored for a particular subclass of concurrency bugs that follow a specific timing hypothesis.

Statistical program slicing makes no assumptions about the type of the bug or the length of the failure path.

Hardware Memory Tracing. Pointer poisoning is inspired from behavioral watchpoints (kumar13, ). Similar to behavioral watchpoints, we tag the upper bytes of a pointer to track its behavior. Unlike behavioral watchpoints, we collect information about which instructions access heap memory in user–space to infer read–write relationships between them. We also instrument a small fraction of (user–space) pointers to instrument over multiple runs, in contrast to tagging only certain pointers from infrequently accessed data structures in the kernel.

Devietti et.al. (devietti12, ) also uses hardware instrumentation to monitor memory accesses. However, their approach requires custom hardware support unavailable on commodity hardware. In contrast, support for pointer poisoning (tagging) and hardware-assisted branch tracing are already available off-the-shelf on several platforms (e.g., Intel, ARM, AMD).

8. Conclusions

In this paper we presented statistical program slicing, a novel hybrid static–dynamic program slicing technique which leverages hardware support for control–flow tracing (Intel PT) and a cooperative selective heap memory tracking mechanism (pointer poisoning) for low overhead. We described Wok, a tool that generates statistical program slices.

We tested Wok on $\nrBugs$ failures observed in production from $\nrApps$ real–world applications. We showed that Wok efficiently recovers $\woktAvgRRsliceT$ of the statements typically present on a dynamic program slice and $92\%$ of the statements linking the symptom to the root cause with only $5\%$ instrumentation overhead.

References

[1] The clang project. https://clang.llvm.org/index.html. Accessed: 2020-08-28.
[2] Cppcheck static analysis tool. https://curl.haxx.se. Accessed: 2020-08-28.
[3] Curl bug report #965. https://sourceforge.net/p/curl/bugs/965/. Accessed: 2020-08-28.
[4] Curl data transfer tool. https://curl.haxx.se. Accessed: 2020-08-28.
[5] Gzip gnu tool. https://www.gnu.org/software/gzip/. Accessed: 2020-08-28.
[6] The python interpreter. https://docs.python.org/3/tutorial/interpreter.html. Accessed: 2020-08-28.
[7] The sqlite database engine. https://www.sqlite.org/index.html. Accessed: 2020-08-28.
[8] Th3: Test harnest for sqlite3. https://www.sqlite.org/th3/doc/trunk/www/th3.wiki. Accessed: 2020-08-28.
[9] Hiralal Agrawal and Joseph R. Horgan. Dynamic program slicing. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, PLDI ’90, pages 246–256, New York, NY, USA, 1990. ACM.
[10] Lars Ole Andersen. Program analysis and specialization for the c programming language. Technical report, 1994.
[11] ARM. Technical reference manual (ch. 15). 2016.
[12] Piramanayagam Arumuga Nainar, Ting Chen, Jake Rosin, and Ben Liblit. Statistical debugging using compound boolean predicates. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, ISSTA ’07, pages 5–15, New York, NY, USA, 2007. ACM.
[13] Piramanayagam Arumuga Nainar and Ben Liblit. Adaptive bug isolation. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE ’10, pages 255–264, New York, NY, USA, 2010. ACM.
[14] Earl T. Barr, Christian Bird, and Mark Marron. Collecting a heap of shapes. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA 2013, page 123–133, New York, NY, USA, 2013. Association for Computing Machinery.
[15] David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo. Orbs: Language-independent program slicing. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 109–120, New York, NY, USA, 2014. ACM.
[16] Trishul M. Chilimbi and Martin Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 199–209, New York, NY, USA, 2002. ACM.
[17] Trishul M. Chilimbi, Ben Liblit, Krishna Mehra, Aditya V. Nori, and Kapil Vaswani. Holmes: Effective statistical debugging via efficient path profiling. In Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pages 34–44, Washington, DC, USA, 2009. IEEE Computer Society.
[18] Trishul M. Chilimbi, Ben Liblit, Krishna K. Mehra, Aditya V. Nori, and Kapil Vaswani. HOLMES: effective statistical debugging via efficient path profiling. In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings, pages 34–44, 2009.
[19] Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. REPT: Reverse debugging of failures in deployed software. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 17–32, Carlsbad, CA, October 2018. USENIX Association.
[20] David Devecsery, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. Optimistic hybrid analysis: Accelerating dynamic analysis through predicated static analysis. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’18, pages 348–362, New York, NY, USA, 2018. ACM.
[21] Joseph Devietti, Benjamin P. Wood, Karin Strauss, Luis Ceze, Dan Grossman, and Shaz Qadeer. Radish: Always-on sound and complete ra detection in software and hardware. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA ’12, page 201–212, USA, 2012. IEEE Computer Society.
[22] Xinyang Ge, Weidong Cui, and Trent Jaeger. Griffin: Guarding control flows using intel processor trace. SIGPLAN Not., 52(4):585–598, April 2017.
[23] Matthias Hauswirth and Trishul M. Chilimbi. Low-overhead memory leak detection using adaptive statistical profiling. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XI, pages 156–164, New York, NY, USA, 2004. ACM.
[24] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, PLDI ’88, pages 35–46, New York, NY, USA, 1988. ACM.
[25] Susan Horwitz, Thomas Reps, and David Binkley. Interprocedural slicing using dependence graphs. ACM Trans. Program. Lang. Syst., 12(1):26–60, January 1990.
[26] Intel. Intel 64 and ia-32 architectures software developer’s manual - volume 3b. 2017.
[27] Lingxiao Jiang and Zhendong Su. Context-aware statistical debugging: From bug predictors to faulty control flow paths. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pages 184–193, New York, NY, USA, 2007. ACM.
[28] Baris Kasikci, Weidong Cui, Xinyang Ge, and Ben Niu. Lazy diagnosis of in-production concurrency bugs. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, pages 582–598, New York, NY, USA, 2017. ACM.
[29] Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. Failure sketching: A technique for automated root cause diagnosis of in-production failures. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP ’15, pages 344–360, New York, NY, USA, 2015. ACM.
[30] Baris Kasikci, Cristian Zamfir, and George Candea. Racemob: Crowdsourced data race detection. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 406–422, New York, NY, USA, 2013. ACM.
[31] Andi Kleen. Simple-PT: Simple intel cpu processor tracing on linux. https://github.com/andikleen/simple-pt, 2018.
[32] B. Korel and J. Laski. Dynamic program slicing. Inf. Process. Lett., 29(3):155–163, October 1988.
[33] B. Korel and J. Laski. Dynamic program slicing. Inf. Process. Lett., 29(3):155–163, October 1988.
[34] Akshay Kumar, Peter Goodman, Ashvin Goel, and Angela Demke Brown. Behave or be watched: Debugging with behavioral watchpoints. In Proceedings of the 9th Workshop on Hot Topics in Dependable Systems, HotDep ’13, pages 111–116, New York, NY, USA, 2013. ACM.
[35] Donghyun Kwon, Jiwon Seo, Sehyun Baek, Giyeol Kim, Sunwoo Ahn, and Yunheung Paek. VM-CFI: control flow Integrity for Virtual Machine Kernel Using Intel PT, pages 127–137. 07 2018.
[36] Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, page 75. IEEE Computer Society, 2004.
[37] Ben Liblit. Cooperative bug isolation: winning thesis of the 2005 ACM doctoral dissertation competition, volume 4440. Springer, 2007.
[38] Ben Liblit, Alex Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI ’03, pages 141–154, New York, NY, USA, 2003. ACM.
[39] Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 15–26, New York, NY, USA, 2005. ACM.
[40] Chao Liu, Xifeng Yan, Long Fei, Jiawei Han, and Samuel P. Midkiff. Sober: Statistical model-based bug localization. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 286–295, New York, NY, USA, 2005. ACM.
[41] Magnus Madsen, Frank Tip, Esben Andreasen, Koushik Sen, and Anders Møller. Feedback-directed instrumentation for deployed javascript applications. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 899–910, New York, NY, USA, 2016. ACM.
[42] Markus Mock, Darren C. Atkinson, Craig Chambers, and Susan J. Eggers. Program slicing with dynamic points-to sets. IEEE Trans. Softw. Eng., 31(8):657–678, August 2005.
[43] Gene Novark, Emery D. Berger, and Benjamin G. Zorn. Efficiently and precisely locating memory leaks and bloat. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 397–407, New York, NY, USA, 2009. ACM.
[44] Karl J. Ottenstein and Linda M. Ottenstein. The program dependence graph in a software development environment. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, SDE 1, pages 177–184, New York, NY, USA, 1984. ACM.
[45] Chris Parnin and Alessandro Orso. Are automated debugging techniques actually helping programmers? In Proceedings of the 2011 International Symposium on Software Testing and Analysis, ISSTA ’11, page 199–209, New York, NY, USA, 2011. Association for Computing Machinery.
[46] Swarup Kumar Sahoo, John Criswell, Chase Geigle, and Vikram Adve. Using likely invariants for automated software fault localization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 139–152, New York, NY, USA, 2013. ACM.
[47] Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and Thorsten Holz. kafl: Hardware-assisted feedback fuzzing for OS kernels. In 26th USENIX Security Symposium (USENIX Security 17), pages 167–182, Vancouver, BC, 2017. USENIX Association.
[48] Manu Sridharan, Stephen J. Fink, and Rastislav Bodik. Thin slicing. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 112–122, New York, NY, USA, 2007. ACM.
[49] Venkatesh Srinivasan and Thomas Reps. An improved algorithm for slicing machine code. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, pages 378–393, New York, NY, USA, 2016. ACM.
[50] Frank Tip. A survey of program slicing techniques. Technical report, Amsterdam, The Netherlands, The Netherlands, 1994.
[51] Frank Tip. A survey of program slicing techniques. Centrum voor Wiskunde en Informatica, 1994.
[52] Qianqian Wang, Chris Parnin, and Alessandro Orso. Evaluating the usefulness of ir-based fault localization techniques. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, page 1–11, New York, NY, USA, 2015. Association for Computing Machinery.
[53] Dasarath Weeratunge, Xiangyu Zhang, William N. Sumner, and Suresh Jagannathan. Analyzing concurrency bugs using dual slicing. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 253–264, New York, NY, USA, 2010. ACM.
[54] Mark Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, ICSE ’81, pages 439–449, Piscataway, NJ, USA, 1981. IEEE Press.
[55] Bin Xin, William N. Sumner, and Xiangyu Zhang. Efficient program execution indexing. SIGPLAN Not., 43(6):238–248, June 2008.
[56] Bin Xin and Xiangyu Zhang. Memory slicing. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA ’09, pages 165–176, New York, NY, USA, 2009. ACM.
[57] Tong Zhang, Changhee Jung, and Dongyoon Lee. Prorace: Practical data race detection for production use. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17, pages 149–162, New York, NY, USA, 2017. ACM.
[58] Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. Pruning dynamic slices with confidence. SIGPLAN Not., 41(6):169–180, June 2006.
[59] Xiangyu Zhang, Rajiv Gupta, and Youtao Zhang. Precise dynamic slicing algorithms. In Proceedings of the 25th International Conference on Software Engineering, ICSE ’03, pages 319–329, Washington, DC, USA, 2003. IEEE Computer Society.
[60] Xiangyu Zhang, Rajiv Gupta, and Youtao Zhang. Cost and precision tradeoffs of dynamic data slicing algorithms. ACM Trans. Program. Lang. Syst., 27(4):631–661, July 2005.
[61] Xu Zhao, Kirk Rodrigues, Yu Luo, Michael Stumm, Ding Yuan, and Yuanyuan Zhou. Log20: Fully automated optimal placement of log printing statements under specified overhead threshold. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, page 565–581, New York, NY, USA, 2017. Association for Computing Machinery.
[62] Daming Zou, , Jingjing Liang, Yingfei Xiong, Michael Ernst, and Lu Zhang. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering, pages 1–21, 2019.