BinGo: Pinpointing Concurrency Bugs in Go via Binary Analysis

Chongxin Zhong *
North Carolina State University Qidong Zhao *
North Carolina State University Chongxin and Qidong contributed equally to this work and should be considered co-first authors. Xu Liu
North Carolina State University

Abstract

Golang (also known as Go for short) has become popular in building concurrency programs in distributed systems. As the unique features, Go employs lightweight Goroutines to support highly parallelism in user space. Moreover, Go leverages channels to enable explicit communication among threads. However, recent studies show that concurrency bugs are not uncommon in Go applications. Pinpointing these concurrency bugs in real Go applications is both important and challenging. Existing approaches are mostly based on compiler-aided static or dynamic analysis, which have two limitations. First, existing approaches require the availability and recompilation of the source code, which work well on testing rather than production environments with no source code available for both applications and external libraries. Second, existing approaches work on pure Go code bases only, not programs mixed with Go and other languages. To address these limitations, we develop BinGo, the first tool to identify concurrency bugs in Go applications via dynamic binary analysis. BinGo correlates binary execution with Go semantics and employs novel bug detection algorithms. BinGo is an end-to-end tool that is ready for deployment in the production environment with no modification on source code, compilers, and runtimes in the Go eco-system. Our experiments show that BinGo has a high coverage of concurrency bugs with no false positives. We are able to use BinGo to identify concurrency bugs in real applications with moderate overhead.

1 Introduction

Golang (short as Go) is a modern open-source programming language created by Google in 2007. Go has become increasing popular recently as it supports building fast, reliable, and efficient software at scale [9]. A variety of microservices running on cloud adopt Go as the default language for transaction processing with high concurrency. Popular software written in Go includes CockroachDB [2], Dropbox [4], AresDB [1], gRPC [13], Docker [3], Kubernetes [14], and many more. Leading companies, such as Google, Facebook (Meta), Uber, Microsoft and many others form an active community to support Go.

As the unique features, Go facilitates programming concurrency by introducing go keyword, which can be placed as a function prefix to indicate this function can be run asynchronously. Functions with go as prefix are called Goroutines. Application developers can enjoy the concurrency by creating multiple Goroutines. Compared to traditional OS threads (e.g., pthread), Goroutines are lightweight because the Go runtime system context switches Goroutines in the user space, with no kernel involved. There are two mechanisms to communicate data between Goroutines: (1) shared memory, which is similar to traditional threads, and (2) channels, which support explicit message passing.

Like other parallel programs, concurrency bugs are not uncommon in Go applications. Concurrency bugs occur due to the misuse of synchronization across concurrent Goroutines. Concurrency bugs can show the symptoms of both blocking and non-blocking execution. The blocking bugs, similar to deadlocks, result in no progress of Goroutines, while non-blocking bugs are similar to data races, resulting in incorrect results. We focus our study on blocking bugs in this paper. Recent studies [40, 44] show that blocking bugs in Go applications are not related to only mutex primitives but also channels. These blocking bugs involve multiple synchronization contexts buried deep in code bases, which are difficult to identify.

A variety of tools have been developed to identify concurrency bugs in Go, which mostly use two techniques—static and dynamic analysis. The static analysis leverages compilers to analyze Go source code without the need of executing the code. However, static analysis has a limited coverage in identifying concurrency bugs with a small set of predefined patterns. Moreover, static analysis can introduce many false positives or negatives due to the imprecise analysis raised by aliases or pointers. Such imprecision can be quickly accumulated when analysis scope enlarges in real Go applications.

To overcome the limitations of static analysis, dynamic analysis [10, 11] monitors Go program execution with additional runtime information for accurate analysis on aliases and pointers. With representative or fuzzed inputs, dynamic analysis minimizes false positives and negatives with high bug coverage. However, existing dynamic approaches require the modification of Go compilers or runtimes to collect necessary data. Such requirement largely impedes these tools from the adoption in the production environment with the off-the-shelf Go software stack.

In this paper, we propose BinGo, a new bug detector for Go applications to address challenges in existing static and dynamic tools. The key idea of BinGo is to decouple the data collection and analysis: BinGo collects the runtime information with binary analysis and analyzes concurrency bugs with integrating Go semantics. The binary analysis allows BinGo to work for unmodified, off-the-shelf Go ecosystem and semantic-aware analysis enables BinGo to accurately pinpoint concurrency bugs with no false positives and provide intuitive guidance for big fixes.

BinGo advances existing approaches in three aspects. First, BinGo can handle legacy or commercial Go code packages that are only available as binary code. Second, BinGo can analyze not only pure Go applications but also code hybrid with Go and other languages, such as C via cgo. Third, BinGo requires no code change, specific compilation options, or code recompilation, so BinGo is more suitable for practical deployment.

In the rest of this section, we show a motivating example, highlight the challenges BinGo addresses, and summarize our contributions.

⬇

1go func() { // Goroutine1

2 defer close(wbs.donec)

3 for wb := range wbs.updatec {

4 wbs.coalesce(wb) // Lock, Unlock

5 }

6}()

7func (wbs *watchBroadcasts) coalesce(wb *watchBroadcast) {

8 wbs.mu.Lock()

9 wbs.mu.Unlock()

10}

11func (wbs *watchBroadcasts) stop() { // Goroutine2

12 wbs.mu.Lock()

13 defer wbs.mu.Unlock()

14 close(wbs.updatec)

15 <-wbs.donec

16}

Figure 1: A channel-mutex blocking bug in a real Go application etcd [7]. We omit codes that are not related to synchronization for simplicity.

Motivation

Figure 1 shows an example Go program from etcd [7] with concurrency bugs with both mutex locks and channels involved. With manual code analysis, we find that with one execution order, concurrency bugs happen. Goroutine2 first acquires the mutex lock wbs.mu at line 12 and gets blocked on receiving messages in channel wbs.donec at line 15. At the meanwhile, Goroutine1 calls function wbs.coalesce(), where it also tries to acquire the mutex lock wbs.mu at line 8 before closing wbs.donec at line 2. Thus, Goroutine1 and Goroutine2 are both blocked with no progress. Manual analysis of such concurrency bugs, which require understanding the program semantics of synchronization primitives and exploring every execution order of these primitives, is both tedious and error-prone.

Challenges in analyzing Go binaries

The challenges of enabling binary analysis for Go applications reside in the correlation between the low-level binary instruction behaviors and the high-level Go semantics, such as operations on mutex locks and channels as shown in Figure 1. Analyzing the native binary traces cannot either accurately identify concurrency bugs or provide intuitive guidance for bug fixes. To understand whether concurrency bugs exist, BinGo needs to interpret synchronization primitives (mutexes and channels) and understand the behaviors of Goroutines (e.g., creation, scheduling, and destruction). To provide intuitive guidance, BinGo needs to understand the logical call paths rather than the native call path constructed from the binary. BinGo is able to address these challenges with pure binary analysis by leveraging the open-source Go compilers and runtime systems. Details are in Sections 3 and 4.

Contributions

BinGo advances the state of the arts with the following contributions.

•

BinGo is the first tool that analyzes binary of Go applications. BinGo supports a hierarchy of trace points via investigating Go runtime binaries to associate binary execution with Go semantics.
•

BinGo employs a novel predictive algorithm with the combination of online and offline analysis to detect various concurrency bugs in Go applications that involve Goroutines, channels, and synchronization primitives.
•

BinGo adopts many optimization techniques to reduce its overhead, enhance its scalability, and integrating the bug report in use-friend GUI for applicability. BinGo is an end-to-end bug detector running in off-the-shelf Go ecosystem. BinGo will be open source upon the paper acceptance.
•

BinGo has been evaluated extensively with well-known Go benchmarks. BinGo shows high bug coverage, affordable overhead, and deep insights to guide bug fix. Furthermore, BinGo is deployed to analyze real Go applications; BinGo can identify concurrency bugs and provide intuitive guidance for the fix.

Paper organization

The remaining paper is organized as follows. Section 2 reviews the existing approaches and distinguishes BinGo. Section 3 describes the methodology of BinGo. Section 4 elaborates on the implementation details of BinGo. Section 5 evaluates the coverage and overhead of BinGo. Section 6 studies several cases of concurrency bugs in real Go applications. Section 7 shows some discussions. Finally, Section 8 presents some conclusions of the paper and previews the future work.

2 Related Work

Concurrency bug detection has been extensively studied recently [27, 45, 30, 25, 32]. However, these approaches cannot directly apply to Go applications due to the lack of support for Go language features. In this section, we only review the most related approaches to BinGo, including concurrency bug detection in Go and binary analysis for software bugs.

2.1 Concurrency Bug Detection in Go

Staticcheck [17] and Vet [8] analyze Go source code statically to identify concurrency bugs. These two approaches, based on pattern recognition, have limited coverage of bug detection. Tools [26, 28, 29, 37, 38] based on model checking address the coverage limitation but incur tremendous overhead by investigating each input and modeling the execution of all synchronization primitives, which impedes them from applying to real Go applications.

GCatch [34] is the state-of-the-art static bug detector for Go applications. It uses the Go compiler to analyze source code and build a constraint solver to detect blocking bugs. However, it (1) only detects channel-related bugs, (2) has limited analysis scope, and (3) yields imprecise analysis with many false positives. Section 5 gives more quantitative comparison between BinGo and GCatch.

Moreover, due to the static analysis, these approaches do not provide sufficient information, e.g., calling contexts, to help understand the root causes of concurrency bugs. Confirming and fixing these bugs require substantial manual efforts. Thus, dynamic analysis is proposed as an alternative solution, which BinGo is also built on.

Go runtime by default can detect concurrency bugs in its Goroutine scheduler dynamically [10]. When all Goroutines are blocked, the runtime reports a concurrency bug. However, this scheme only detects a small set of concurrency bugs; it cannot identify blocking that occurs among a subset of Goroutines. Goleak [11], a dynamic tool developed by Uber, can detect Goroutines leaks. Similar to the Go runtime, Goleak can only detects a small set of concurrency bugs. It is worth noting that most concurrency bugs studied in this paper cannot be identified by these tools. Moreover, these tools, leveraging the Go compiler for code instrumentation, target applications written in Go only, but cannot handle applications with hybrid languages such as Go and C (programmed with cgo).

Unlike existing approaches, BinGo adopts dynamic binary analysis with no compiler involved. Moreover, BinGo correlate the binary analysis with Go semantics to yield accurate bug detection with rich insights.

2.2 Binary Analysis for Software Bugs

The dynamic binary analysis has already become the indispensable component in the system software stack nowadays. Binary engines, such as Pin [35], DynamoRIO [5], Dyninst [6], and Valgrind [19], can instrument any instruction of interest for performance characterization and bug detection. The performance characterization tools include redundancy detection [42, 39, 41, 24], data locality measurement [33, 43], cache simulation [23]. The bug detection includes aforementioned concurrency bug analysis, taint analysis [21], reverse engineering [31], and execution replay [36].

However, these tools target applications written in native languages only. As discussed in Section 4, these tools do not apply to Go programs, because they capture the raw measurement only, lacking of Go semantics.

3 Methodology

We describe the methodology of BinGo to answer two research questions: (1) how to associate binary with Go semantics and (2) with the Go semantics, how to identify concurrency bugs in Go with high accuracy and coverage?

3.1 Recovering Go Semantics in Binary

Since most concurrency bugs are related Go semantics, such as the operations on Goroutines and channels [40], it is important for BinGo to identify these semantics when monitoring Go binary execution. However, Go binaries, like other binary languages, typically contain no program semantics, except symbol tables and debugging information if not stripped, which raises unique challenges for BinGo. Fortunately, The mainstream Go ecosystem developed by Google, including compiler and runtime, is open source. BinGo is able to leverage this opportunity to recover Go semantics from binary analysis only. The key idea behind BinGo is to recognize certain semantics via interpreting Go runtime source code. BinGo then intercepts the binary by adding a hierarchy of tracepoints to uncover the semantics.

Semantic Name	Description
$\emph{G}_{Create}$	Goroutine creation
$\emph{G}_{exit}$	Goroutine exit
$\emph{Main}_{start}$	main.main function start
$\emph{Obj}_{create}$	Object creation
$\emph{M}_{l}$	Mutex Lock
$\emph{M}_{u}$	Mutex Unlock
$\emph{RWM}_{rl}$	RWMutex RLock
$\emph{RWM}_{ru}$	RWMutex RUnlock
$\emph{RWM}_{l}$	RWMutex Lock
$\emph{RWM}_{u}$	RWMutex Unlock
$\emph{Chan}_{create}$	Channel creation
$\emph{Chan}_{send}$	Channel send
$\emph{Chan}_{recv}$	Channel receive
$\emph{Chan}_{close}$	Close Channel
Select	Select statement
$\emph{WG}_{add}$	WaitGroup Add
$\emph{WG}_{wait}$	WatiGroup Wait
$\emph{Ctx}_{create}$	Context Creation
$\emph{Ctx}_{cancel}$	Context Cancel
$\emph{Cond}_{wait}$	Cond Wait
$\emph{Cond}_{sig}$	Cond Signal
$\emph{Cond}_{bc}$	Cond Broadcast

Table 1: The necessary semantics in Go runtime for detecting concurrency bugs. The list can be expanded if more semantics are needed.

Recognizing semantics in Go runtime

Table 1 lists the main semantics BinGo needs to recognize for identifying concurrency bugs. The semantics are represented as actions (e.g., Goroutine creation/destruction, lock acquisition/release, channel selection). Moreover, the objects handled by each action should be captured and assigned an ID to understand the correlation among various actions. Such objects include Goroutines, mutexes, and channels, etc. BinGo outputs the function signatures related to these semantics. It is worth noting that BinGo needs to identify the Go version used in the system first and analyze its runtime source code once; all the following-up analyses in this system can reuse this output.

Refer to caption — Figure 2: Overview of tracepoint mechanism. The location of tracepoints are obtained via static analysis on Go runtimes. With the help of dynamic binary instrumentation (DBI) tool, tracepoints can be installed at different levels, such as instructions, functions, and Goroutines.

Recovering semantics via tracepoints

The semantics of an application can be composed of actions (e.g., Goroutine creation) and objects (e.g., mutex locks or channels with their IDs). BinGo develops a mechanism that can freely arm tracepoints in Go binaries to gain actions and objects from the Go runtime. Figure 2 overviews the tracepoint mechanism. The tracepoints can occur on the Goroutine level, function level, or the instruction level. At each tracepoint, BinGo can collect calling context, memory information, and registers states. BinGo then can activate a callback function (also known as tracepoint activation) that handles the information collected at the tracepoint and returns actions and objects. With these actions and objects, BinGo can recover semantic details about the analyzed application for the blocking bug detection.

1 Online analysis for each Goroutine:

2 if tracepoint is activated then

3 record the timestamp

4 record the action

5 record the lock or channel object and maintain them in a trace

6 end if

7if blocking occurs then

8 report the problematic tracepoints

9 end if

11Offline analysis on tracepoint trace of each Goroutine:

12 if blocking potentially occurs then

13 report the problematic tracepoints with prediction

14 end if

Algorithm 1 A high-level algorithm for detecting blocking bugs in BinGo.

3.2 Detecting Blocking Bugs

The blocking bugs can happen among locks, channels, and other synchronization primitives. Prior work [40] categorizes different blocking bugs in Go; BinGo is able to detect all of them with a hybrid online and offline analysis, with the support of the tracepoints.

Algorithm 1 shows a generic algorithm that BinGo uses to handle each action and object. BinGo begins the monitoring at when the tracepoint $Main_{start}$ is triggered, which is the entrance function of Go applications. In the remaining section, we show how BinGo identifies different kinds of blocking bugs.

3.2.1 Blocked Channel Operations

When the tracepoints $Chan_{send}$ , $Chan_{recv}$ , or $Select$ are triggered, BinGo records the context of these tracepoints. If the function enclosing these tracepoints does not return until the whole program execution finishes, BinGo reports a blocked channel operation and uses the context information of these operations to associate the bug with the source code.

3.2.2 Blocked WaitGroup

BinGo installs the tracepoints $WG_{add}$ and $WG_{wait}$ to track the number of the Goroutines in a WaitGroup, If the number is not 0 at the end of program execution, meaning some Goroutines are blocked in WaitGroup. BinGo reports the context of all $WG_{wait}$ tracepoints, which are the potential cause of the blocking.

3.2.3 Blocked Cond

For each Cond variable, BinGo keeps track of its operations. For a Cond variable, if $Cond_{wait}$ is triggered in a Goroutine but no $Cond_{sig}$ or $Cond_{bc}$ triggered in other Goroutines before the program ends, BinGo reports the $Cond_{wait}$ operation is blocked.

3.2.4 Uncanceled Context

Go programs use context to pass values, cancellation signals, and deadlines across different Goroutines. Upon a context creation tracepoint $Ctx_{create}$ , BinGo obtains the context address and its cancel function. This cancel function enables Goroutines blocked on this context to receive messages from the function Done for resuming execution. Thus, if a context is not canceled (i.e., $Ctx_{cancel}$ tracepoint not triggered) when the program finishes, BinGo treats all the Goroutines waiting for this context as blocking bugs.

3.2.5 Missing Unlock and Double Lock

To reduce the runtime overhead and apply the prediction mechanism, BinGo analyzes the tracepoints related to Mutex offline to detect missing unlock and double lock in each Goroutine. Section 3.3 shows more algorithm details.

3.2.6 Goroutine Leak

BinGo keeps track of the status of each Goroutine via tracepoints $G_{create}$ and $G_{exit}$ . Upon $G_{create}$ , the newly created Goroutine (except for the main Goroutine) is added to a list. At $G_{exit}$ , the Goroutine is removed from that list. If the list is not empty after the entire program execution finishes, the Goroutines in the list may be leaked. However, it is possible that the program ends shortly after a Goroutine exits, $G_{exit}$ may not be triggered. This is not a Goroutine leak. Thus, BinGo checks Goroutine leak with other blocked bugs: if a Goroutine has a blocked operation and does not exit before the program ends, BinGo reports a leak.

3.3 Predicting Blocking Bugs

To enlarge the bug coverage, BinGo identifies the blocking bugs due to deadlocks across mutex locks and channels by analyzing the trace, even with the blocking not occurring. We elaborate the algorithms BinGo uses to predict the deadlocks among locks and channels.

3.3.1 Mutex Deadlock

BinGo employs a dynamic deadlock detection mechanism, which is a variance of prior work [22]. Besides adapting the algorithm to Go semantics, BinGo also extends the algorithm to support both traditional Mutex locks and RWMutex reader-writer locks in Go. For simplicity, we refer to both Mutex and RWMutex as mutex in the following texts. We define a lock tuple $\tau=\langle g,m,o,L\rangle$ , which means that the Goroutine $g$ holds all the mutexes in the mutex set $L$ and tries to acquire the mutex $m$ ; $o$ represents the lock type: $M_{l}$ , ${RWM}_{rl}$ , or ${RWM}_{l}$ . We further define the lock sequence ( $d$ ): a sequence of $n(n>1)$ lock tuples, $d=\langle\tau_{1},\tau_{2}...\tau_{n}\rangle$ where $\tau_{i}=\langle g_{i},m_{i},o_{i},L_{i}\rangle$ , $m_{i}\in L_{i+1}$ , …, $m_{k-1}\in L_{k}$ , $g_{i}\neq g_{j}$ , $L_{i}\cap L_{j}=\emptyset$ for $1\leq i$ , $j\leq n(i\neq j)$ . Let the type of the lock operation of $m_{i}(1\leq i<n)$ in $L_{i+1}$ be $p_{i+1}$ and the type of the lock operation of $m_{n}$ in $L_{1}$ be $p_{1}$ . If $m_{n}\in L_{1}$ , $\neg(o_{i}=$ ${RWM}_{rl}$ $\wedge$ $p_{i+1}={RWM}_{rl})$ , and $\neg(o_{n}=$ ${RWM}_{rl}$ $\wedge$ $p_{1}=$ ${RWM}_{rl})$ for $1\leq i<n$ , a potential deadlock in the lock sequence $d$ . Similarly, BinGo also detects missing unlock and double lock in a lock sequence. The correctness of this algorithm is out of the scope. One can refer to the prior work [22] for more details.

1 for each $Chan_{op}$ = $Chan_{send}$ $\vee$ $Chan_{recv}$ $\vee$ $Chan_{close}$ do

2 construct locked and unlocked mutex sets

L

and

U

in the same Goroutine

3 end for

4for each pair of $Chan_{opi}$ , $Chan_{opj}$ $\in$ $Chan_{op}$ of the same channel do

5 if $Chan_{opi}$ , $Chan_{opj}$ in different Goroutines then

6 if $Chan_{opi}$ = $Chan_{send}$ $\wedge$ $Chan_{opj}$ = $Chan_{recv}$ then

7 if $L_{i}\cap L_{j}\neq\emptyset\vee L_{i}\cap U_{j}\neq\emptyset\vee U_{i}\cap L_{j}\neq\emptyset$ then

8 report a deadlock

9 end if

11 end if

12 if $Chan_{opi}$ = $Chan_{recv}$ $\wedge$ $Chan_{opj}$ = $Chan_{close}$ then

13 if $L_{i}\cap L_{j}\neq\emptyset\vee L_{i}\cap U_{j}\neq\emptyset$ then

14 report a deadlock

15 end if

17 end if

19 end if

21 end for

Algorithm 2 Channel-mutex deadlock prediction algorithm used by BinGo.

3.3.2 Channel-Mutex Deadlock

To predict whether blocking bugs exist due to channels, BinGo collects the sequence of both mutex locks and channels along with the timestamps. BinGo then constructs the lock and unlock dependency of each channel operation and detects deadlocks if cyclic dependency exists. Algorithm 2 describes the algorithm elaborates on this algorithm.

4 BinGo Implementation

In this section, we elaborate on the implementation details of BinGo. BinGo is implemented in 6.4k lines of C++ code and maintained as an open-source project. Figure 3 overviews the structure of BinGo. It mainly consists of $4$ components: (1) a semantic analyzer that extracts Go semantics, (2) a Tracepoint manager that activates necessary tracepoints according to Go semantics, (3) a bug detector that identifies blocking bugs, and (4) a visualizer that presents the analysis results in a user-friendly GUI.

4.1 Semantic Analyzer

Semantic Name	Function
$\emph{G}_{Create}$	runtime.newproc1
$\emph{G}_{exit}$	runtime.goexit0
$\emph{Main}_{start}$	main.main
$\emph{Obj}_{create}$	runtime.newobject
$\emph{M}_{l}$	sync.(*Mutex).Lock
$\emph{M}_{u}$	sync.(*Mutex).Unlock
$\emph{RWM}_{rl}$	sync.(*RWMutex).RLock
$\emph{RWM}_{ru}$	sync.(*RWMutex).RUnlock
$\emph{RWM}_{l}$	sync.(*RWMutex).Lock
$\emph{RWM}_{u}$	sync.(*RWMutex).Unlock
$\emph{Chan}_{create}$	runtime.makechan
$\emph{Chan}_{send}$	runtime.chansend
$\emph{Chan}_{recv}$	runtime.chanrecv
$\emph{Chan}_{close}$	runtime.closechan
Select	runtime.selectgo
$\emph{WG}_{add}$	sync.(*WaitGroup).Add
$\emph{WG}_{wait}$	sync.(*WaitGroup).Wait
$\emph{Ctx}_{create}$	context.(*cancelCtx).cancel
$\emph{Ctx}_{cancel}$	context.(*cancelCtx).cancel
$\emph{Cond}_{wait}$	sync.(*Cond).Wait
$\emph{Cond}_{sig}$	sync.(*Cond).Signal
$\emph{Cond}_{bc}$	sync.(*Cond).Broadcast

Table 2: The semantics and their related functions in Go runtime. The list can be expanded if more semantics are needed.

This component identifies all the interesting semantics in Go runtime. To guarantee the high accuracy, we manually analyze the Go runtime source code to recognize all the functions that related to the semantics. It is worth noting that, the manual source code analysis only needs to be done once for each Go release version. Table 2 shows the semantics and their related functions in Go v1.17.3. This table (also known as semantic table) is maintained in a text file for later usage.

4.2 Tracepoint Manager

The tracepoint manager takes the semantic table as the input, and then inserts and activates corresponding tracepoints in the Go binaries under analysis. The foundation of BinGo’s tracepoint manager is DynamoRIO [5], which is the state-of-the-art binary analysis engine maintained by Google. DynamoRIO, working on both Linux and Windows, is able instrument x86_64 and aarch64 instructions, functions, and load modules.

Naive implementation

As a straightforward scheme, the tracepoint manager utilizes DynamoRIO to overload functions in Go runtime in the semantic table. As the built-in capability, DynamoRIO is able to instrument any function by inserting a pre-function callback at the function entrance to capture the function arguments and a post-function callback at the function exit to capture the function return values and the changed arguments. However, this scheme does not work for Go programs because Go has a different calling convention from that of native C/C++ binaries.

Issues raised by Go calling convention

The Go compiler generates a prologue for each Go function to check the stack overflow in the current Goroutine ¹¹1Go functions with the compiler directive “//go:nosplit” can avoid overflow check but it is not commonly used in Go runtime.. If there is insufficient space on the stack to for the function, Go runtime needs to call runtime.morestack() to allocate larger stack space. The Go runtime then moves the function from the old stack to the new space. Such operation triggers an unexpected invocation of the post-function callback earlier than the function exit point. Thus, the tracepoint manager can get a wrong return values of the instrumented Go function. Moreover, if the Goroutine is scheduled back after the runtime.morestack() operation, it rolls back the execution to the function entrance, resulting in the pre-function callbacks triggered twice. It is worth noting that this issue occurs not only in DynamoRIO, but also other popular binary instrumentation engines, such as Pin [35] and Valgrind [19]. Without carefully handling the Go function calling conventions, BinGo can produce incorrect analysis results.

Solution

We address the issues raised by Go function calling convention by designing a new function instrumentation strategy. For each Goroutine, BinGo maintains a shadow stack (named sp-stack) that stores all the stack pointers of currently active wrapped functions (i.e., activation records). For each Go function, BinGo leverages DynamoRIO to instrument its first basic block for function entrance and all the return instructions for function exit. Upon the function entrance, BinGo checks the current stack point with the top of sp-stack of the Goroutine. If they have the same value, it means that the rollback occurs after the runtime.morestack() invocation as the current stack pointer has been already pushed to sp-stack. If it is the case, BinGo does not trigger the pre-function callback of the current Go function because it is already triggered. Otherwise, it means that the current function is executed with no rollback. BinGo then pushes the current stack pointer to sp-stack and triggers pre-function callback.

Upon each return instruction, BinGo compares the current stack pointer with the top of sp-stack in the same Goroutine. If they are the same, it means the function exit matches the function entrance. BinGo pops off the top of sp-stack and triggers the post-function callback. Otherwise, it means that the return happens in a callee of the current function. BinGo just ignores this return and continues the monitoring until captures the return instruction of the current function. This shadow stack approach can always install the function-level tracepoint correctly. Moreover, since sp-stack is per Goroutine, it requires no synchronization, incurring minimum overhead.

BinGo uses goid to index the sp-stack of each Goroutine, except for g0, which is a Goroutine dedicated in each OS thread to execute Go runtime functions within the system stack. BinGo maintains sp-stacks of all g0s in the OS-supported thread local storage. With this technique, BinGo is able to manage the tracepoints correctly and efficiently even with massive amounts of Goroutines executed simultaneously.

Tracepoint installation

With our function-level instrumentation, BinGo can install most of the tracepoints and obtain the necessary semantics via function arguments and return values at runtime. Besides function-level tracepoints, BinGo also supports instruction-level tracepoints, as described in Section 3.1. It is worth noting that BinGo provides extensible APIs to support adding new tracepoints, not only in Go runtime, but also in other native synchronization libraries (e.g., locks, conditional variables, barriers in pthread library) if the program understand analysis is hybridized with Go and other native libraries.

	CockroachDB	etcd	gRPC	Hugo	Istio	Kubernetes	Moby	Knative Serving	Syncthing
Missing Unlock	1	1	2	0	0	0	2	0	0
WaitGroup	2	0	0	0	0	0	1	0	0
Channel	9	1	4	0	2	4	5	0	1
Double Lock	4	2	0	2	0	1	1	0	1
Mutex Deadlock	2	0	0	0	0	2	1	0	0
Channel-mutex Deadlock	0	4	2	0	1	4	1	1	0
Cond	0	0	0	0	0	2	2	0	0
Total	18	8	8	2	3	13	13	1	2

Table 3: The blocking bugs of various categories in real applications collected by GoBench. BinGo is able to identify all these blocking bugs with no false positives or false negatives.

4.3 Bug Detector

To detect the blocking bugs, BinGo first leverages the tracepoints to identify all the synchronization primitives and then applies the algorithms described in Section 3.

Identifying synchronization primitives

BinGo monitors both mutex locks and channels. To capture mutex lock creation, acquisition, and release, BinGo activates the tracepoint at function runtime.newobject() in Go runtime, which creates various objects. BinGo obtains the object type from the argument and the object address from the return value. Such object type information tells whether the object created is a mutex lock and further whether it is a normal lock or a reader-writer lock. If BinGo identifies a mutex lock creation, it adds the lock address as the ID to a set (named mutex-set) for later usage. Furthermore, BinGo activates the tracepoints automatic instructions such as CMPXCHG and XADD with the LOCK prefix to capture lock acquisition and release. We cannot directly use the function-level tracepoints for this purpose because some lock and unlock functions are inlined. To filter out atomic instructions that are not related to mutex locks, BinGo investigates the operands of these instructions. Only the atomic instructions operating on the lock word in the mutex-set are considered lock operations. BinGo further examines the semantics of the instruction to understand whether it is a lock acquisition or a release.

BinGo also monitors the channel operations, such as creation, send, receive, and destruction. Go activates the tracepoint at function runtime.makechan() to capture the channel creation point. BinGo obtains the return value of this function, which is a pointer to the channel structure hchan. BinGo then decodes hchan for the channel ID (using channel address). Moreover, BinGo activates the tracepoint at function runtime.closechan() for closing a channel ²²2This function may be not always called to destruct a channel. A channel can be active until the end of the program execution.. By investigating the function arguments, BinGo can know the ID of the destructed channel. Furthermore, BinGo activates function runtime.chansend() and runtime.chanrecv() to monitor channel send and receive operations. Some complexity rises for the semantics of the select statement, which schedules multiple active channels (specified in case statement) for the current Goroutine. The Go compiler compiles the select statement into the function runtime.selectgo. BinGo can activate the tracepoint at this function to obtain the IDs of all the active channels for selection. BinGo analyzes all these channels. If one channel can result in blocking, BinGo reports a blocking bug, which aligns our prediction algorithm described in Section 3. Moreover, every channel operation has a boolean argument named block, indicating whether the Goroutine can be blocked on this channel operation. BinGo only investigates channels with its block flag set to true and its buffer size set to $0$ .

Devising bug analysis

BinGo leverages a framework solution for high extensibility. Figure 3 illustrates the design principle. BinGo implements the common features used for analyzing different blocking bugs in the framework. Such common features include activating tracepoints, obtaining lock/channel operations, and associating analysis results with source code. BinGo builds clients atop the framework to analyze different types of blocking bugs. Each client can customize the use of tracepoints and devise the algorithm as described in Section 3. If one would like to support a new analysis, he/she only needs to implement a client, which minimizes the coding efforts.

4.4 Analysis Presentation

As an end-to-end tool, BinGo needs to present the analysis results in a user-friendly way. We tightly integrate BinGo into Microsoft Visual Studio Code (vscode) [20], which is one of the most popular code editors for Go. The analysis results, shown as the sequence of synchronization primitives across different Goroutines, are presented in a floating window in the program source code pane. The synchronization primitives involved in the blocking bugs are highlighted and one can easily associate them with source code with a simple mouse click. Figure 5 shows a snapshot of BinGo’s output, which is described in detail in Section 6.1.

5 Evaluation

We evaluate BinGo on an x86 machine, which employs a 3.0 GHz 18-core Intel Xeon W-2295 processor, 32 KB private L1 data cache, 1 MB private L2 cache, 25 MB shared L3 cache, and 256 GB Memory. The OS is Ubuntu 20.04 with Linux kernel 5.11.0 and Go 1.17.3. We evaluate BinGo from two aspects: bug coverage and measurement overhead.

Bug Coverage

We evaluate the coverage of BinGo using GoKer from GoBench [44], which consists of bug kernels from well-known open-source Go applications. Table 3 summarizes all the blocking bugs in GoKer. We exclude non-blocking bugs, such as data races, because they are not the target of BinGo. BinGo is able to identify all the 68 blocking bugs with no false positives and false negatives. We compare BinGo with the state-of-the-art blocking bug detector—GCatch [34]. GCatch can detect blocking bugs related to channels and mutex locks only, but not other blocking bugs. Moreover, we find that GCatch shows up a large number of false positives due to its static analysis, which adds additional burdens to users to identify real bugs.

Go applications

Description

Prometheus [16]

A systems and service monitoring system

etcd [7]

A distributed key-value store system

Go Ethereum [12]

Official Go implementation of the Ethereum protocol

gRPC-Go [13]

The Go implementation of gRPC

TiDB [18]

An open-source NewSQL database

Kubernetes [14]

An open-source system for automating deployment,

scaling, and management of containerized applications

Table 4: The description of real Go applications used in the evaluation.

Overhead Measurement

We quantify the overhead of BinGo using real Go applications in Table 4. The overhead is shown in Figure 4. The runtime overhead, computed as the execution time of online analysis over the native execution, is typically 1.38 $\times$ . We also show the overhead variance of each application if it has multiple test inputs. Similarly, we measure the memory overhead, which is defined as the peak memory consumption of the application execution monitored by BinGo over the peak memory consumption of the native application execution. From the figure, we can see that BinGo typically incurs 1.32 $\times$ memory overhead. The overhead is proportional the number of tracepoints enabled in the execution so the variance for some applications is large. Both runtime and memory overheads are affordable for the production usage.

6 Case Studies

We study three Go applications to show how BinGo can pinpoint blocking of different types and provide intuitive guidance for bug fixing.

6.1 Channel-Mutex Deadlock in etcd

Figure 5 shows the snapshot of BinGo that reports a blocking bug in etcd. As described in Section 4.4, the analysis results are visualized in vscode. Figure 5 shows the source code pane of vscode, where the source code of etcd is presented. BinGo’s analysis results are in a floating window. In the window, only problematic Goroutines with their synchronization primitive sequences are shown. In this report, Goroutine 18 and 19 are involved in a blocking bug. The synchronization primitives highlighted in yellow indicate the locations that the blocking occurs. The synchronization primitives highlighted in red are the reason that causes the blocking. One can click the label of any synchronization primitive to jump to the source code line in the vscode source code pane.

From this figure, we can see that this blocking bug involves both a mutex lock and a channel. The problematic Goroutines 18 and 19 are blocked due to the contention on the same mutex lock at line 38 and line 43, respectively. The channel operations—close at line 29 and receive at line 46—cause the blocking as Goroutine 19 needs to wait for the channel sending or closing events.

6.2 Double Reader Locks in Kubernetes

Figure 6 shows the snapshot of BinGo that reports a blocking bug in Kubernetes. From this figure, we can see that the blocking bug is caused by interleaving two reader locks. The problematic Goroutines 17 and 18 are blocked because the writer lock acquisition at line 57 in Goroutine 18 is in between the two reader locks at lines 33 and 42 in Goroutine 17, respectively. The priority of a writer lock is higher than reader locks, resulting in cyclic waiting between the two Goroutines.

6.3 Blocked Channel in Moby

Figure 7 shows a snapshot of BinGo reporting a blocking bug in Moby. From the figure we can see that Goroutine 19 is blocked on a channel send operation at line 33. The statements highlighted in red shows that there is no corresponding channel receive operation for the blocked channel send operation. Thus, a blocked channel occurs.

7 Discussions

In this section, we discuss some limitations of BinGo. First, BinGo is a dynamic bug detector, so it cannot detect concurrency bugs in paths that are never executed. This limitation is common to all dynamic analysis tools. To increase the bug coverage, one can explore different inputs (i.e., input fuzzing) or different scheduling strategies (i.e., scheduling fuzzing) to increase the path coverage for analysis. Second, BinGo requires some manual efforts to identify the interesting tracepoints in Go runtime, which is tedious and error-prone. Fortunately, such efforts are needed only once for each Go release. We have already maintained the tracepoint list for several recent Go versions, which requires no additional efforts from users to use BinGo. Third, BinGo requires the availability of the Go application’s source code to interpret the root causes of concurrency bugs. It is common for all bug detectors, as the report is usually associated with the source code. However, as a unique advantage, BinGo does not need the availablity of appliaction source code to run the detection because all the measurement and analysis are done on binary. Lastly, BinGo is a bug detector, which does not automatically fix the bug like the prior tool [34]. However, BinGo produces accurate analysis report and its tight integration with vscode provides an opportunity to guide automatic bug fixing.

8 Conclusions and Future Work

This paper presents BinGo, a novel bug detector for Go applications. As its unique feature, BinGo mainly analyzes Go binaries, which is more suitable for deployment in the complex production environment coded with hybrid Go and native languages. BinGo recovers Go semantics, devises efficient bug detection algorithms, and presents analysis results in user friendly GUI. Our experiments show that BinGo has high bug coverage, with no false positives or false negatives. Moreover, BinGo incurs moderate overhead. The case studies show that BinGo provides intuitive guidance for fixing the Go bugs. BinGo will be open source upon the paper acceptance. As our future work, we will extend BinGo to detect non-blocking bugs in Go applications, such as data races. Moreover, we will explore the techniques to feed the analysis results of BinGo into a automatic bug fixing tool.

References

[1] Aresdb. https://github.com/uber/aresdb.
[2] Cockroachdb. https://www.cockroachlabs.com.
[3] Docker. https://www.docker.com.
[4] Dropbox. https://www.dropbox.com.
[5] Dynamorio. https://dynamorio.org.
[6] Dyninst. https://www.dyninst.org.
[7] etcd. https://etcd.io.
[8] Go command vet. https://pkg.go.dev/cmd/vet.
[9] The go programming language. https://go.dev.
[10] Go runtime. https://pkg.go.dev/runtime.
[11] goleak. https://github.com/uber-go/goleak.
[12] Gp ethereum. https://geth.ethereum.org.
[13] grpc-go. https://github.com/grpc/grpc-go.
[14] Kubernetes. https://kubernetes.io.
[15] Moby. https://mobyproject.org.
[16] Prometheus. https://prometheus.io.
[17] Staticcheck. https://github.com/dominikh/go-tools.
[18] Tidb. https://get.pingcap.com/tidb-developer.
[19] Valgrind. https://valgrind.org.
[20] Visual studio code. https://code.visualstudio.com.
[21] David Brumley et al. Towards automatic generation of vulnerability-based signatures. In Proc. of the 2006 IEEE Symp. on Security and Privacy, SP ’06, pages 2–16, 2006.
[22] Yan Cai and W.K. Chan. Magiclock: Scalable detection of potential deadlocks in large-scale multithreaded programs. IEEE Transactions on Software Engineering, 40(3):266–281, 2014.
[23] Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In SC ’11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2011.
[24] Milind Chabbi and John Mellor-Crummey. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO ’12, pages 124–134, New York, NY, USA, 2012. ACM.
[25] Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam. Applying hardware transactional memory for concurrency-bug failure recovery in production runs. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’18, page 837–850, USA, 2018. USENIX Association.
[26] Julia Gabet and Nobuko Yoshida. Static Race Detection and Mutex Safety and Liveness for Go Programs. In Robert Hirschfeld and Tobias Pape, editors, 34th European Conference on Object-Oriented Programming (ECOOP 2020), volume 166 of Leibniz International Proceedings in Informatics (LIPIcs), pages 4:1–4:30, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
[27] Guoliang Jin, Wei Zhang, Dongdong Deng, Ben Liblit, and Shan Lu. Automated concurrency-bug fixing. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, page 221–236, USA, 2012. USENIX Association.
[28] Julien Lange, Nicholas Ng, Bernardo Toninho, and Nobuko Yoshida. Fencing off go: Liveness and safety for channel-based programming. SIGPLAN Not., 52(1):748–761, jan 2017.
[29] Julien Lange, Nicholas Ng, Bernardo Toninho, and Nobuko Yoshida. A static verification framework for message passing in go using behavioural types. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 1137–1148, 2018.
[30] Guangpu Li, Shan Lu, Madanlal Musuvathi, Suman Nath, and Rohan Padhye. Efficient scalable thread-safety-violation detection: Finding thousands of concurrency bugs during testing. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, page 162–180, New York, NY, USA, 2019. Association for Computing Machinery.
[31] Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. Automatic protocol format reverse engineering through context-aware monitored execution. In Network and Distributed System Security, 2008.
[32] Haopeng Liu, Guangpu Li, Jeffrey F. Lukman, Jiaxin Li, Shan Lu, Haryadi S. Gunawi, and Chen Tian. Dcatch: Automatically detecting distributed concurrency bugs in cloud systems. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17, page 677–691, New York, NY, USA, 2017. Association for Computing Machinery.
[33] Xu Liu and John Mellor-Crummey. Pinpointing data locality bottlenecks with low overheads. In Proc. of the 2013 IEEE Intl. Symp. on Performance Analysis of Systems and Software, Austin, TX, USA, April 21-23, 2013.
[34] Ziheng Liu, Shuofei Zhu, Boqin Qin, Hao Chen, and Linhai Song. Automatically detecting and fixing concurrency bugs in go software systems. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021, page 616–629, New York, NY, USA, 2021. Association for Computing Machinery.
[35] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 190–200, New York, NY, USA, 2005. ACM Press.
[36] Satish Narayanasamy, Gilles Pokam, and Brad Calder. Bugnet: continuously recording program execution for deterministic replay debugging. In Proc. of the 32nd Annual Intl. Symposium on Computer Architecture, ISCA ’05, pages 284–295, 2005.
[37] Nicholas Ng and Nobuko Yoshida. Static deadlock detection for concurrent go by global session graph synthesis. In Proceedings of the 25th International Conference on Compiler Construction, CC 2016, page 174–184, New York, NY, USA, 2016. Association for Computing Machinery.
[38] Alceste Scalas, Nobuko Yoshida, and Elias Benussi. Verifying message-passing programs with dependent behavioural types. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, page 502–516, New York, NY, USA, 2019. Association for Computing Machinery.
[39] Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. Redundant Loads: A Software Inefficiency Indicator. In Proceedings of the 41st International Conference on Software Engineering, ICSE ’19, pages 982–993, Piscataway, NJ, USA, 2019. IEEE Press.
[40] Tengfei Tu, Xiaoyu Liu, Linhai Song, and Yiying Zhang. Understanding real-world concurrency bugs in go. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, page 865–878, New York, NY, USA, 2019. Association for Computing Machinery.
[41] S. Wen, X. Liu, and M. Chabbi. Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations. In 2015 International Conference on Parallel Architecture and Compilation (PACT), pages 254–265, Oct 2015.
[42] Shasha Wen, Milind Chabbi, and Xu Liu. REDSPY: Exploring Value Locality in Software. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’17, pages 47–61, New York, NY, USA, 2017. ACM.
[43] Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. HOTL: A higher order theory of locality. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 343–356, New York, NY, USA, 2013. ACM.
[44] Ting Yuan, Guangwei Li, Jie Lu, Chen Liu, Lian Li, and Jingling Xue. Gobench: A benchmark suite of real-world go concurrency bugs. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 187–199, 2021.
[45] Wei Zhang, Junghee Lim, Ramya Olichandran, Joel Scherpelz, Guoliang Jin, Shan Lu, and Thomas Reps. Conseq: Detecting concurrency bugs through sequential errors. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, page 251–264, New York, NY, USA, 2011. Association for Computing Machinery.