This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Separation of concerning things: a simpler basis for defining and programming with the C/C++ memory model (extended version)

Robert J. Colvin Defence Science and Technology Group, and The School of Information Technology and Electrical Engineering, The University of Queensland The University of Queensland BrisbaneAustralia [email protected]
Abstract.

The C/C++ memory model provides an interface and execution model for programmers of concurrent (shared-variable) code. It provides a range of mechanisms that abstract from underlying hardware memory models – that govern how multicore architectures handle concurrent accesses to main memory – as well as abstracting from compiler transformations. The C standard describes the memory model in terms of cross-thread relationships between events, and has been influenced by several research works that are similarly based. In this paper we provide a thread-local definition of the fundamental principles of the C memory model, which, for concise concurrent code, serves as a basis for relatively straightforward reasoning about the effects of the C ordering mechanisms. We argue that this definition is more practical from a programming perspective and is amenable to analysis by already established techniques for concurrent code. The key aspect is that the memory model definition is separate to other considerations of a rich programming language such as C, in particular, expression evaluation and optimisations, though we show how to reason about those considerations in the presence of C concurrency. A major simplification of our framework compared to the description in the C standard and related work in the literature is separating out considerations around the “lack of multicopy atomicity”, a concept that is in any case irrelevant to developers of code for x86, Arm, RISC-V or SPARC architectures. We show how the framework is convenient for reasoning about well-structured code, and for formally addressing unintuitive behaviours such as “out-of-thin-air” writes.

copyright: nonebooktitle: Proceedings

1. Introduction

C/C++ is one of the most widely used programming languages, including for low-level concurrent code with a high imperative for efficiency. The C (weak) memory model, governed by an ISO standard, provides an interface (atomics.h) for instrumenting shared-variable concurrency that abstracts from the multitude of multicore architectures it may be compiled to, each with its own guarantees about and mechanisms for controlling accesses to shared variables.

The C memory model standard is described in terms of a cross-thread “happens before” relationship, relating stores and loads within and between threads, and “release sequences”. The fundamentals of this approach were established by Boehm & Adve (Boehm and Adve, 2008), and the standard has been influenced and improved by various research works (e.g., (Batty et al., 2011; Vafeiadis et al., 2015)). However, because it is cross-thread, verification techniques are often complex to apply, and the resulting formal semantics are often highly specialised and involve global data structures capturing a partial order on events, and rarely cover the full range of features available in C’s atomics.h (Chakraborty and Vafeiadis, 2019; Lahav et al., 2017). In many cases this is because such formalisms attempt to explain how reordering can occur by mixing considerations such as expression optimisations and Power’s cache coherence system, alongside local compiler-induced and processor pipeline reorderings. We instead take a separation-of-concerns approach, where the three fundamental principles involved in C concurrent code – data dependencies, fences, and memory ordering constraints – are specified separately to other aspects of C which may complicate reasoning, such as expression evaluation, expression optimisations, and arbitrary compiler transformations. Provided a programmer steers clear of concurrent code that is subject to these extra factors, reasoning about their code is relatively straightforward in our framework, but if the full richness of C is insisted upon, our framework is also applicable.

Syntactically and semantically the key aspect is the parallelized sequential composition operator, formalising a processor-pipeline concept that has been around since the 1960s, and which has been previously shown to have good explanatory power for most behaviours observed on hardware weak memory models (Colvin, 2021a, b). Reasoning in this setting involves making explicit at the program level what effects the memory model has, either reducing to a sequential form where the use of atomics.h features prevents the compiler or hardware from making problematic reorderings, or making the residual parallelism explicit; in either case, standard techniques (such as Owicki-Gries or rely/guarantee) apply to the reduced program. We cover a significant portion of the C weak memory model, including release/acquire and release/consume synchronisation, and sequentially consistent accesses and fences. We demonstrate verification of some simple behaviours (litmus tests), a spin lock implementation, and also explain how two types of related problematic behaviours – out-of-thin-air stores and read-from-untaken-branch – can be analysed and addressed. We argue that this foundation provides a simpler and more direct basis for discussion of the consequences of choices within the C memory model, and in particular for analysing the soundness of compiler transformations for concurrent code.

We explain the main aspects of the C memory model using a simple list-of-instructions language in Sect. 2, covering all relevant aspects of the C memory model’s principles. We then give the syntax and semantics of a more realistic imperative language (with conditionals and loops) in Sect. 3. We give a set of reduction rules for reasoning about the effects of the memory model in Sect. 4, and explain how standard reasoning techniques can be applied in Sect. 5. We show some illustrative examples in Sect. 6, including the often-discussed “out-of-thin-air” behaviours, showing how in our framework an allowed version of the pattern arises naturally, and a disallowed version is similarly disallowed. In Sections 7, 8, and 9 we extend the language of Sect. 3 with other features of programming in C such as incremental (usually called non-atomic) expression evaluation and instruction execution, expression optimisations, and forwarding of values from earlier instructions to later ones. Crucially, despite the complexities these features introduce, the fundamental principles of the C memory model from Sect. 2 do not change. In Sect. 10 we give a formal discussion of the “read-from-untaken branch behaviour” which exposes the often problematic interactions between standard compiler optimisations and controlling shared-variable concurrency.

2. A simple language with thread-local reordering

In this section we give a simple language formed from primitive actions and “parallelized sequential prefixing”, which serves to explain the crucial parts of reordering due to the C memory model. In Sect. 3 we extend the language to include standard imperative constructs such as conditionals, loops, and composite actions.

2.1. Syntax and semantics of a language with instruction reordering

To focus on the semantic point of reorderings we introduce a very basic language formed from primitive instructions representing assignments, branches, and fences, which are composed solely by a prefix operator that allows reordering (early execution) of later instructions.

e\displaystyle e ::=\displaystyle\mathrel{:\!:\!=} v|x|e|e_1e_2\displaystyle v\mathrel{~{}|~{}}x\mathrel{~{}|~{}}\ominus e\mathrel{~{}|~{}}e_{\_}1\oplus e_{\_}2
𝖿\displaystyle\mathsf{f} ::=\displaystyle\mathrel{:\!:\!=} 𝗌𝗍𝗈𝗋𝖾𝖿𝗇𝖼|𝗅𝗈𝖺𝖽𝖿𝗇𝖼|𝖿𝗎𝗅𝗅𝖿𝗇𝖼|\displaystyle\mathsf{store_{fnc}}\mathrel{~{}|~{}}\mathsf{load_{fnc}}\mathrel{~{}|~{}}\mathsf{full_{fnc}}\mathrel{~{}|~{}}\ldots
α\displaystyle\alpha ::=\displaystyle\mathrel{:\!:\!=} x:=e|e|𝖿\displaystyle x\mathop{{:}{=}}e\mathrel{~{}|~{}}\llparenthesis e\rrparenthesis\mathrel{~{}|~{}}\mathsf{f}
c\displaystyle c ::=\displaystyle\mathrel{:\!:\!=} 𝐧𝐢𝐥|αmc\displaystyle\mathbf{nil}\mathrel{~{}|~{}}\alpha\overset{{{\textsc{m}}}}{\triangleright}c

Expressions ee can be base values vv, variables xx, or involve the usual unary (\ominus) or binary (\oplus) operators. A primitive action α\alpha in this language is either an assignment x:=ex\mathop{{:}{=}}e, where xx is a variable and ee an expression, guard e\llparenthesis e\rrparenthesis, where ee is an expression, or fence 𝖿\mathsf{f}, where 𝖿\mathsf{f} is some fence type, described below. A command can be the terminated command 𝐧𝐢𝐥\mathop{\bf nil}, or a simple prefixing of a primitive instruction α\alpha before command cc, parameterised by some memory model m, written αmc\alpha\overset{{{\textsc{m}}}}{\triangleright}c. As we show elsewhere (Colvin and Smith, 2018; Colvin, 2021b, a), the parameter m can be instantiated to give the behaviour of hardware weak memory models, but in this paper we focus mostly on C’s memory model, denoted formally by ‘c’, and the special cases of sequential and parallel composition. The m\overset{{{\textsc{m}}}}{\triangleright} operator essentially allows the construction of a sequence of instructions that may be reordered under some circumstances, similar to a hardware pipeline.

The semantics of the language is given operationally below. All primitive actions are executed in a single indivisible step.111Normally called atomic but we avoid that term to keep the notion separate from C’s notion of atomics.

(i)αmcαc(ii)cβcα\ext@arrow0055\Leftarrowfill@mβαmcβαmc\displaystyle(i)\quad\alpha\overset{{{\textsc{m}}}}{\triangleright}c\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}c\qquad\qquad(ii)\quad\begin{array}[]{c}c\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\beta}}{{\longrightarrow}}}c^{\prime}\qquad\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}}}\beta\\ \cline{1-1}\cr\alpha\overset{{{\textsc{m}}}}{\triangleright}c\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\beta}}{{\longrightarrow}}}\alpha\overset{{{\textsc{m}}}}{\triangleright}c^{\prime}\end{array}

A command αmc\alpha\overset{{{\textsc{m}}}}{\triangleright}c may either immediately execute α\alpha, (rule (i)(i)) or it may execute some action β\beta of cc provided that β\beta may reorder with α\alpha with respect to m, written α\ext@arrow0055\Leftarrowfill@mβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}}}\beta (rule (ii)(ii)). The conditions under which reordering may occur are specific to the memory model under consideration, and we define these below for C. The memory model parameter is defined pointwise on instruction types. This is a relatively convenient way to express reorderings, especially as it is agnostic about global traces and behaviours. As shown empirically in (Colvin, 2021b) it is suitable for explaining observed behaviours on architectures such as Arm, x86 and RISC-V. It specialises to the notions of sequential and parallel conjunction straightforwardly.

As an example, consider a memory model m such that assignments of values to different variables can be executed in either order (as on Arm, but not on x86, for instance), that is, x:=v\ext@arrow0055\Leftarrowfill@my:=wx\mathop{{:}{=}}v\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}}}y\mathop{{:}{=}}w for xyx\neq y. Then we have two possible terminating traces (traces ending in 𝐧𝐢𝐥\mathop{\bf nil}) for the program x:=1my:=1m𝐧𝐢𝐥x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathop{\bf nil}.

x:=1my:=1m𝐧𝐢𝐥x:=1y:=1m𝐧𝐢𝐥y:=1𝐧𝐢𝐥\displaystyle x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathop{\bf nil}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{x\mathop{{:}{=}}1}$}}y\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathop{\bf nil}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{y\mathop{{:}{=}}1}$}}\mathop{\bf nil}
x:=1my:=1m𝐧𝐢𝐥y:=1x:=1m𝐧𝐢𝐥x:=1𝐧𝐢𝐥\displaystyle x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathop{\bf nil}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{y\mathop{{:}{=}}1}$}}x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathop{\bf nil}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{x\mathop{{:}{=}}1}$}}\mathop{\bf nil}

The first behaviour results from two applications of rule (i)(i) above (as in prefixing in CSP (Hoare, 1985) or CCS (Milner, 1982)). The second behaviour results from applying rule (ii)(ii), noting that by assumption x:=1\ext@arrow0055\Leftarrowfill@my:=1x\mathop{{:}{=}}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}}}y\mathop{{:}{=}}1, and then rule (i)(i).

We define 𝐟𝐞𝐧𝐜𝐞=^𝖿𝗎𝗅𝗅𝖿𝗇𝖼\mathbf{fence}\mathrel{\mathstrut{\widehat{=}}}\mathsf{full_{fnc}}, and subsequently treat it is a full fence in m by defining 𝐟𝐞𝐧𝐜𝐞/\ext@arrow0055\Leftarrowfill@mα\mathbf{fence}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}$}}}}\alpha and α/\ext@arrow0055\Leftarrowfill@m𝐟𝐞𝐧𝐜𝐞\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}$}}}}\mathbf{fence} for all α\alpha (where α/\ext@arrow0055\Leftarrowfill@mβ\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}$}}}}\beta abbreviates ¬(α\ext@arrow0055\Leftarrowfill@mβ)\neg(\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}}}\beta)). Then we have exactly one possible trace in the following circumstance. For convenience below we omit the trailing 𝐧𝐢𝐥\mathop{\bf nil}.

(2.1) x:=1m𝐟𝐞𝐧𝐜𝐞my:=1x:=1𝐟𝐞𝐧𝐜𝐞my:=1fencey:=1y:=1𝐧𝐢𝐥\displaystyle x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathbf{fence}\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{x\mathop{{:}{=}}1}$}}\mathbf{fence}\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\mathbf{fence}}$}}y\mathop{{:}{=}}1\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{y\mathop{{:}{=}}1}$}}\mathop{\bf nil}

The fence has prevented application of the second rule (since by definition both x:=1/\ext@arrow0055\Leftarrowfill@c𝐟𝐞𝐧𝐜𝐞x\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\mathbf{fence} and 𝐟𝐞𝐧𝐜𝐞/\ext@arrow0055\Leftarrowfill@cy:=1\mathbf{fence}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}y\mathop{{:}{=}}1) and hence restored sequential order on the instructions.

The framework admits the definition of standard sequential and parallel composition as (extreme) memory models. Let the “sequential consistency” model sc be the model that prevents all reordering, and introduce a special operator for that case. We define the complement of sc to be par, i.e., the memory model that allows all reordering, which corresponds to parallel execution.

(2.2) α\ext@arrow0055\Leftarrowfill@scβ\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{sc}}}}}}\beta \displaystyle\Leftrightarrow 𝖥𝖺𝗅𝗌𝖾for all α,β\displaystyle\mathsf{False}\quad\mbox{for all $\alpha,\beta$}
(2.3) α\ext@arrow0055\Leftarrowfill@parβ\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{par}}}}}}\beta \displaystyle\Leftrightarrow 𝖳𝗋𝗎𝖾for all α,β\displaystyle\mathsf{True}\quad\mbox{for all $\alpha,\beta$}
(2.4) αc\displaystyle\alpha\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{c} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} αscc\displaystyle\alpha\overset{{{\textsc{sc}}}}{\triangleright}c

Using \mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} for trace equality (defined formally later),

(2.5) x:=1m𝐟𝐞𝐧𝐜𝐞my:=1\displaystyle x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}\mathbf{fence}\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1 \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} x:=1𝐟𝐞𝐧𝐜𝐞y:=1\displaystyle x\mathop{{:}{=}}1\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{\mathbf{fence}\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{y\mathop{{:}{=}}1}}

Without the fence α\alpha and β\beta effectively execute in parallel (under m), that is,

(2.6) x:=1my:=1x:=1y:=1x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1\hskip 5.69054pt\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}\hskip 5.69054ptx\mathop{{:}{=}}1\parallel y\mathop{{:}{=}}1

Whether x:=1my:=1x\mathop{{:}{=}}1\overset{{{\textsc{m}}}}{\triangleright}y\mathop{{:}{=}}1 satisfies some property depends exactly on whether the parallel execution satisfies the property.

2.2. Reordering in C

We now consider the specifics of reordering in the 𝙲{\tt C} memory model, which considers three aspects: (i) variable (data) dependencies; (ii) fences; and (iii) “memory ordering constraints”, that can be used to annotate variables or fences. We cover each of these three aspects in turn.

2.2.1. Data dependencies/respecting sequential semantics

A key concept underpinning both processor pipelines and compiler transformations is that of data dependencies, where one instruction depends on a value being calculated by another. To capture this we write αβ\alpha\rightsquigarrow\beta if instruction β\beta depends on a value that instruction α\alpha writes to. We define a range of foundational syntactic and semantic concepts below. In a concurrent setting we distinguish between local and shared variables, that is, the set 𝖵𝖺𝗋{\sf Var} is divided into two mutually exclusive and exhaustive sets 𝖫𝗈𝖼𝖺𝗅{\sf Local} and 𝖲𝗁𝖺𝗋𝖾𝖽{\sf Shared}. By convention we let x,y,zx,y,z be shared variables and r,r_1,r_2r,r_{\_}1,r_{\_}2\ldots be local variables. For convenience we introduce the syntax s_1s_2s_{\_}1\mathbin{\not\!\!\hskip 0.5pt\cap}s_{\_}2 to mean that sets s_1s_{\_}1 and s_2s_{\_}2 are mutually exclusive.

s_1s_2\displaystyle s_{\_}1\mathbin{\not\!\!\hskip 0.5pt\cap}s_{\_}2 =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} s_1s_2=\displaystyle s_{\_}1\mathbin{\mathstrut{\cap}}s_{\_}2=\varnothing
(2.7) 𝖵𝖺𝗋\displaystyle{\sf Var} =\displaystyle= 𝖫𝗈𝖼𝖺𝗅𝖲𝗁𝖺𝗋𝖾𝖽(with 𝖫𝗈𝖼𝖺𝗅𝖲𝗁𝖺𝗋𝖾𝖽)\displaystyle{\sf Local}\mathbin{\mathstrut{\cup}}{\sf Shared}\qquad\mbox{(with ${\sf Local}\mathbin{\not\!\!\hskip 0.5pt\cap}{\sf Shared}$)}
(2.8) 𝗐𝗏(x:=e)={x}𝗐𝗏(e)\displaystyle{\sf wv}(x\mathop{{:}{=}}e)~{}~{}=~{}~{}\{x\}\qquad{\sf wv}(\llparenthesis e\rrparenthesis) =\displaystyle= 𝗐𝗏(𝖿)=\displaystyle\varnothing\qquad\quad{\sf wv}(\mathsf{f})~{}~{}=~{}~{}\varnothing
(2.9) 𝗋𝗏(x:=e)=𝖿𝗏(e)𝗋𝗏(e)\displaystyle{\sf rv}(x\mathop{{:}{=}}e)~{}~{}=~{}~{}{\sf fv}(e)\qquad{\sf rv}(\llparenthesis e\rrparenthesis) =\displaystyle= 𝖿𝗏(e)𝗋𝗏(𝖿)=\displaystyle{\sf fv}(e)\qquad{\sf rv}(\mathsf{f})~{}~{}=~{}~{}\varnothing
(2.10) 𝖿𝗏(α)\displaystyle{\sf fv}(\alpha) =\displaystyle= 𝗐𝗏(α)𝗋𝗏(α)\displaystyle{\sf wv}(\alpha)\mathbin{\mathstrut{\cup}}{\sf rv}(\alpha)
(2.11) 𝗌𝗏(α)=𝖿𝗏(α)𝖲𝗁𝖺𝗋𝖾𝖽𝗋𝗌𝗏(α)\displaystyle{\sf sv}(\alpha)={\sf fv}(\alpha)\mathbin{\mathstrut{\cap}}{\sf Shared}\qquad{\sf rsv}(\alpha) =\displaystyle= 𝗋𝗏(α)𝖲𝗁𝖺𝗋𝖾𝖽𝗐𝗌𝗏(α)=𝗐𝗏(α)𝖲𝗁𝖺𝗋𝖾𝖽\displaystyle{\sf rv}(\alpha)\mathbin{\mathstrut{\cap}}{\sf Shared}\qquad{\sf wsv}(\alpha)={\sf wv}(\alpha)\mathbin{\mathstrut{\cap}}{\sf Shared}
(2.12) S=^{α|𝗐𝗌𝗏(α)}\displaystyle\mathbb{S}~{}~{}\mathrel{\mathstrut{\widehat{=}}}~{}~{}\{\alpha|{\sf wsv}(\alpha)\neq\varnothing\} L=^{α|𝗋𝗌𝗏(α)}\displaystyle\mathbb{L}~{}~{}\mathrel{\mathstrut{\widehat{=}}}~{}~{}\{\alpha|{\sf rsv}(\alpha)\neq\varnothing\}
(2.13) αβ\displaystyle\alpha\rightsquigarrow\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝗐𝗏(α)𝗋𝗏(β)\displaystyle{\sf wv}(\alpha)\mathbin{\mathstrut{\cap}}{\sf rv}(\beta)\neq\varnothing
(2.14) αβ\displaystyle\alpha\,\,\mathop{\not\!\rightsquigarrow}\,\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} ¬(αβ)\displaystyle\neg(\alpha\rightsquigarrow\beta)
interference free(α,β)\displaystyle interference\leavevmode\vbox{\hrule width=3.99994pt}free(\alpha,\beta) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝗐𝗏(α)𝖿𝗏(β)𝗐𝗏(β)𝖿𝗏(α)\displaystyle{\sf wv}(\alpha)\mathbin{\not\!\!\hskip 0.5pt\cap}{\sf fv}(\beta)\hskip 5.69054pt\mathrel{\mathstrut{\wedge}}\hskip 5.69054pt{\sf wv}(\beta)\mathbin{\not\!\!\hskip 0.5pt\cap}{\sf fv}(\alpha)
=\displaystyle= αββα𝗐𝗏(α)𝗐𝗏(β)\displaystyle\alpha\,\,\mathop{\not\!\rightsquigarrow}\,\beta\mathrel{\mathstrut{\wedge}}\beta\,\,\mathop{\not\!\rightsquigarrow}\,\alpha\mathrel{\mathstrut{\wedge}}{\sf wv}(\alpha)\mathbin{\not\!\!\hskip 0.5pt\cap}{\sf wv}(\beta)
(2.16) load indep(α,β)\displaystyle load\leavevmode\vbox{\hrule width=3.99994pt}indep(\alpha,\beta) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝗋𝗌𝗏(α)𝗋𝗌𝗏(β)\displaystyle{\sf rsv}(\alpha)\mathbin{\not\!\!\hskip 0.5pt\cap}{\sf rsv}(\beta)
(2.21) order indep(α,β)\displaystyle order\leavevmode\vbox{\hrule width=3.99994pt}indep(\alpha,\beta) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝖾𝖿𝖿(α)o9𝖾𝖿𝖿(β)=𝖾𝖿𝖿(β)o9𝖾𝖿𝖿(α)\displaystyle{\sf eff}(\alpha)\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil$\scriptscriptstyle\mathrm{o}$\hfil\cr\hfil$\scriptscriptstyle\mathrm{9}$\hfil}}}{\sf eff}(\beta)={\sf eff}(\beta)\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil$\scriptscriptstyle\mathrm{o}$\hfil\cr\hfil$\scriptscriptstyle\mathrm{9}$\hfil}}}{\sf eff}(\alpha)

The write variables of instructions (written 𝗐𝗏(α){\sf wv}(\alpha)) are collected syntactically (2.8), as are the read variables (written 𝗋𝗏(α){\sf rv}(\alpha)) (2.9), which depend on the usual notion of the free variables in an expression (written 𝖿𝗏(e){\sf fv}(e), defined straightforwardly over the syntax of expressions). The free variables of an instruction are the union of the write and read variables (2.10). Shared and local variables have different requirements from a reordering perspective and so we introduce specialisations of these concepts to just the shared variables (2.11). We define “store” instructions (S\mathbb{S}) as those that write to a shared variable, and “load” instructions (L\mathbb{L}) as those that read shared variables (and hence an instruction such as x:=yx\mathop{{:}{=}}y is both a store and a load).

Using these definitions we can describe various relationships between actions. One of the key notions is that of “data dependence”, where we write αβ\alpha\rightsquigarrow\beta if instruction β\beta references a variable being modified by instruction α\alpha (2.13) (and similarly we write αβ\alpha\,\,\mathop{\not\!\rightsquigarrow}\,\beta if there is no data dependence (2.14)). For instance, x:=1r:=xx\mathop{{:}{=}}1\rightsquigarrow r\mathop{{:}{=}}x but x:=1r:=yx\mathop{{:}{=}}1\,\,\mathop{\not\!\rightsquigarrow}\,r\mathop{{:}{=}}y. The former can be expressed as x:=1x\mathop{{:}{=}}1 “carries a dependency into” r:=xr\mathop{{:}{=}}x. Two instructions are interference free if there is no data dependence in either direction, and they write to different variables (2.2.1).222 This is a well-known property, the earliest example being Hoare’s disjointness (Hoare, 1972, 2002; Apt and Olderog, 2019), and is also called non-interference in separation logic (Brookes, 2007). This condition, formally discussed since the 1960’s, is remarkably powerful for explaining the majority of observed behaviours of basic instructions types on modern hardware, although this author did not find evidence of cross-over in older references ((Thornton, 1964; Tomasulo, 1967), etc). Note that instructions that are interference-free may still load the same variables; we say they are load independent if they access distinct shared variables (2.16). Finally, two instructions are “order independent” if the effect of executing them is independent of the execution order (2.21). The “effect” function 𝖾𝖿𝖿(α){\sf eff}(\alpha) denotes actions as a set of pairs of states in the usual imperative style (defined later in Fig. 4), using ‘o9\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil$\scriptscriptstyle\mathrm{o}$\hfil\cr\hfil$\scriptscriptstyle\mathrm{9}$\hfil}}}’ for relational composition. Note that order-independence is a weaker condition than non-interference, for example, x:=1x\mathop{{:}{=}}1 and x:=1x\mathop{{:}{=}}1 are order-independent but not interference-free. All of these definitions or concepts defined for instructions can be lifted straightforwardly to commands by induction on the syntax (see Appendix A).

The key aspect of interference-freedom is the following.

Theorem 2.1 (Disjointness).
interference free(α,β)order indep(α,β)interference\leavevmode\vbox{\hrule width=3.99994pt}free(\alpha,\beta)\Rightarrow order\leavevmode\vbox{\hrule width=3.99994pt}indep(\alpha,\beta)
Proof.

Straightforward: independent actions do not change each other’s outcomes.

We say β\beta may be reordered before α\alpha with respect to sequential semantics, written α\ext@arrow0055\Leftarrowfill@gβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}\beta, if they are interference-free and load-independent; the latter constraint maintains coherence of loads.

Definition 2.2 (g).

For instruction α,β\alpha,\beta,

(2.22) α\ext@arrow0055\Leftarrowfill@gβ=^interference free(α,β)load indep(α,β)\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}\beta\qquad\mathrel{\mathstrut{\widehat{=}}}\qquad interference\leavevmode\vbox{\hrule width=3.99994pt}free(\alpha,\beta)\mathrel{\mathstrut{\wedge}}load\leavevmode\vbox{\hrule width=3.99994pt}indep(\alpha,\beta)

A syntactic check may be used to validate α\ext@arrow0055\Leftarrowfill@gβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}\beta. Of course one could do a semantic check to establish the weaker property of order-independence, however this is typically not done on a case-by-case basis but rather is used to justify compiler transformations. It is not feasible to check order independence directly for all cases.

We can derive the following, where we assume x,y𝖲𝗁𝖺𝗋𝖾𝖽x,y\in{\sf Shared}, r,r_1,r_2𝖫𝗈𝖼𝖺𝗅r,r_{\_}1,r_{\_}2\in{\sf Local}, and all are distinct, and v,w𝖵𝖺𝗅v,w\in{\sf Val}. We write α/\ext@arrow0055\Leftarrowfill@gβ\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}$}}}}\beta if ¬(α\ext@arrow0055\Leftarrowfill@gβ)\neg(\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}\beta).

(2.25) x:=v\ext@arrow0055\Leftarrowfill@gy:=w\displaystyle x\mathop{{:}{=}}v\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}y\mathop{{:}{=}}w but x:=v/\ext@arrow0055\Leftarrowfill@gx:=w\displaystyle x\mathop{{:}{=}}v\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}$}}}}x\mathop{{:}{=}}w
(2.28) x:=v\ext@arrow0055\Leftarrowfill@gr:=y\displaystyle x\mathop{{:}{=}}v\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}r\mathop{{:}{=}}y but x:=v/\ext@arrow0055\Leftarrowfill@gr:=x\displaystyle x\mathop{{:}{=}}v\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}$}}}}r\mathop{{:}{=}}x
(2.31) r_1:=x\ext@arrow0055\Leftarrowfill@gr_2:=y\displaystyle r_{\_}1\mathop{{:}{=}}x\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}r_{\_}2\mathop{{:}{=}}y but r_1:=x/\ext@arrow0055\Leftarrowfill@gr_2:=x\displaystyle r_{\_}1\mathop{{:}{=}}x\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}$}}}}r_{\_}2\mathop{{:}{=}}x
(2.32) x:=r\ext@arrow0055\Leftarrowfill@gy:=r\displaystyle x\mathop{{:}{=}}r\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}y\mathop{{:}{=}}r
(2.33) r=v\ext@arrow0055\Leftarrowfill@gx:=w\displaystyle\llparenthesis r=v\rrparenthesis\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}x\mathop{{:}{=}}w

This shows that independent stores can be reordered, but not stores to the same variable (2.25); similarly independent stores and loads can be reordered, as can loads, provided they are not referencing the same (shared) variable ((2.28) and (2.31)). Accesses of the same local variable, however, can be reordered (2.32), since no interference is possible. Finally, stores can be reordered before (independent) guards (2.33). Since, as we show later, guards model branch points, this is a significant aspect of c. For hardware weak memory models (2.33) is not allowed, as it implies a “speculative” write that may not be valid if r=vr=v is eventually found not to hold; however at compile-time whether a branch condition will eventually hold may be able to be predetermined.

2.2.2. Respecting fences

Since the reorderings allowed by weak memory models may be problematic for establishing communication protocols between processes such models typically have their own “fence” instruction types, which are artificial constraints on reordering (as opposed to the “natural” data-dependence constraint). C has fences specifically to establish order between stores, between loads, or between either type. We define \ext@arrow0055\Leftarrowfill@fnc\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}} below, relating fences and instructions.

\ext@arrow0055\Leftarrowfill@fnc\displaystyle\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}} \displaystyle\in Fence×Instr(𝖿\ext@arrow0055\Leftarrowfill@fncα=^(𝖿,α)\ext@arrow0055\Leftarrowfill@fnc)\displaystyle Fence\times Instr\qquad(\mathsf{f}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}}\alpha\hskip 5.69054pt\mathrel{\mathstrut{\widehat{=}}}\hskip 5.69054pt(\mathsf{f},\alpha)\in\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}})
(2.36) 𝗌𝗍𝗈𝗋𝖾𝖿𝗇𝖼/\ext@arrow0055\Leftarrowfill@fncα\displaystyle\mathsf{store_{fnc}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}}\alpha \displaystyle\Leftrightarrow αS\displaystyle\alpha\in\mathbb{S}
(2.39) 𝗅𝗈𝖺𝖽𝖿𝗇𝖼/\ext@arrow0055\Leftarrowfill@fncα\displaystyle\mathsf{load_{fnc}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}}\alpha \displaystyle\Leftrightarrow αL\displaystyle\alpha\in\mathbb{L}
(2.42) 𝖿𝗎𝗅𝗅𝖿𝗇𝖼/\ext@arrow0055\Leftarrowfill@fncα\displaystyle\mathsf{full_{fnc}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}}\alpha for all α\alpha

Eq. (2.36) states that a store fence blocks store instructions (recall (2.12)), while (2.39) similarly states that load fences block loads. Eq. (2.42) states that a “full” fence blocks all instruction types.

We use this base definition to define when two instructions can be reordered according to their respective fences (lifting the relation name \ext@arrow0055\Leftarrowfill@fnc\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}}).

Definition 2.3 (Fence reorderings).
α\ext@arrow0055\Leftarrowfill@fncβ\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}}\beta \displaystyle\Leftrightarrow (𝖿|α|𝖿\ext@arrow0055\Leftarrowfill@fncβ)(𝖿|β|𝖿\ext@arrow0055\Leftarrowfill@fncα)\displaystyle(\mathop{\mathstrut{\forall}}\nolimits\mathsf{f}\in|\!\alpha\!|\mathrel{\mathstrut{\bullet}}\mathsf{f}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}}\beta)\hskip 5.69054pt\mathrel{\mathstrut{\wedge}}\hskip 5.69054pt(\mathop{\mathstrut{\forall}}\nolimits\mathsf{f}\in|\!\beta\!|\mathrel{\mathstrut{\bullet}}\mathsf{f}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}}\alpha)

where |α||\!\alpha\!| extracts the fences present in α\alpha, i.e.,

(2.43) |x:=e|=|e|\displaystyle|\!x\mathop{{:}{=}}e\!|~{}~{}=~{}~{}\varnothing\qquad|\!\llparenthesis e\rrparenthesis\!| =\displaystyle= |𝖿|={𝖿}\displaystyle\varnothing\qquad|\!\mathsf{f}\!|~{}~{}=~{}~{}\{\mathsf{f}\}

Hence α\alpha and β\beta can be reordered (considering fences only) if they each respect the others’ fences. Note that all C fences are defined symmetrically so we simply check the pair in each direction.333 For asymmetric fences, such as Arm’s isb instruction, both directions need to be specified (Colvin, 2021b).

Based on these definitions we can determine the following special cases, where x,y𝖲𝗁𝖺𝗋𝖾𝖽x,y\in{\sf Shared} and r,r_i𝖫𝗈𝖼𝖺𝗅r,r_{\_}i\in{\sf Local}.

(2.48) x:=1/\ext@arrow0055\Leftarrowfill@fnc\displaystyle x\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}} 𝗌𝗍𝗈𝗋𝖾𝖿𝗇𝖼\displaystyle\mathsf{store_{fnc}} /\ext@arrow0055\Leftarrowfill@fncy:=1\displaystyle\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}}y\mathop{{:}{=}}1
(2.53) r_1:=x/\ext@arrow0055\Leftarrowfill@fnc\displaystyle r_{\_}1\mathop{{:}{=}}x\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}} 𝗅𝗈𝖺𝖽𝖿𝗇𝖼\displaystyle\mathsf{load_{fnc}} /\ext@arrow0055\Leftarrowfill@fncr_2:=y\displaystyle\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}}r_{\_}2\mathop{{:}{=}}y
(2.58) x:=1/\ext@arrow0055\Leftarrowfill@fnc\displaystyle x\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}} 𝖿𝗎𝗅𝗅𝖿𝗇𝖼\displaystyle\mathsf{full_{fnc}} /\ext@arrow0055\Leftarrowfill@fncr:=y\displaystyle\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}$}}}}r\mathop{{:}{=}}y

Statements of the form α/\ext@arrow0055\Leftarrowfill@cβ/\ext@arrow0055\Leftarrowfill@cγ\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\beta\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\gamma should be read as a shorthand for α/\ext@arrow0055\Leftarrowfill@cβ\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\beta and β/\ext@arrow0055\Leftarrowfill@cγ\beta\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\gamma. Inserting fences limits the number of possible traces of a program, an outcome which obviously may restore intended relationships between variables.

2.3. Memory ordering constraints

C introduces “atomics” in the stdatomic library, which includes several types of “memory ordering constraints” (which we herewith call ordering constraints), which can tag loads and stores of variables declared to be “atomic” (e.g., type atomic   int for a shared integer). These ordering constraints control how atomic variables interact. We start with an informal overview of how ordering constraints are intended to work, and then show how we incorporate them into the syntax of the language and define a reordering relation over them.

For the discussions below we use a more compact notation for stores and loads, as exemplified by the following C equivalents.444 In C++, r = y.load(std::memory   order   relaxed) and y.store(3, std::memory   order   seq   cst)

(2.59) 𝚛=𝚊𝚝𝚘𝚖𝚒𝚌 𝚕𝚘𝚊𝚍 𝚎𝚡𝚙𝚕𝚒𝚌𝚒𝚝(&𝚢,𝚖𝚎𝚖𝚘𝚛𝚢 𝚘𝚛𝚍𝚎𝚛 𝚛𝚎𝚕𝚊𝚡𝚎𝚍)\displaystyle{\tt r=atomic\leavevmode\vbox{\hrule width=3.99994pt}load\leavevmode\vbox{\hrule width=3.99994pt}explicit(\&y,memory\leavevmode\vbox{\hrule width=3.99994pt}order\leavevmode\vbox{\hrule width=3.99994pt}relaxed)} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} r:=yrlx\displaystyle r\mathop{{:}{=}}{y}^{{\textsc{rlx}}}
(2.60) 𝚊𝚝𝚘𝚖𝚒𝚌 𝚜𝚝𝚘𝚛𝚎 𝚎𝚡𝚙𝚕𝚒𝚌𝚒𝚝(&𝚢,𝟹,𝚖𝚎𝚖𝚘𝚛𝚢 𝚘𝚛𝚍𝚎𝚛 𝚜𝚎𝚚 𝚌𝚜𝚝)\displaystyle{\tt atomic\leavevmode\vbox{\hrule width=3.99994pt}store\leavevmode\vbox{\hrule width=3.99994pt}explicit(\&y,3,memory\leavevmode\vbox{\hrule width=3.99994pt}order\leavevmode\vbox{\hrule width=3.99994pt}seq\leavevmode\vbox{\hrule width=3.99994pt}cst)} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} ysc:=3\displaystyle{y}^{{\textsc{sc}}}\mathop{{:}{=}}3

2.3.1. Informal overview of ordering constraints

We describe the six ordering constraints defined in type memoryoder below.

  • Relaxed (rlx, memory   order   relaxed). A relaxed access is the simplest, not adding any extra constraints to the variable access, allowing both the hardware and compiler to potentially reorder independent relaxed accesses. For instance, consider the following program snippet (where ‘;\mathchar 24635\relax\;’ is C’s semicolon).

    (2.61) xrlx:=1;flagrlx:=𝖳𝗋𝗎𝖾{x}^{{\textsc{rlx}}}\mathop{{:}{=}}1\mathchar 24635\relax\;{flag}^{{\textsc{rlx}}}\mathop{{:}{=}}\mathsf{True}

    If the programmer’s intention is that the flagflag variable is set to 𝖳𝗋𝗎𝖾\mathsf{True} after the data xx is set to 1, then they have erred: either the compiler or the hardware may reorder the apparently independent stores to xx and flagflag.

  • Release (rel, memory   order   release). The release tag can be applied to stores, indicating the end of a set of shared accesses (and hence control can be “released” to the system). Typically this tag is used on the setting of a flag variable, e.g., modifying the above case,

    (2.62) xrlx:=1;flagrel:=𝖳𝗋𝗎𝖾{x}^{{\textsc{rlx}}}\mathop{{:}{=}}1\mathchar 24635\relax\;{flag}^{{\textsc{rel}}}\mathop{{:}{=}}\mathsf{True}

    now ensures that any other process that sees flag=𝖳𝗋𝗎𝖾flag=\mathsf{True} knows that the update (xrlx:=1{x}^{{\textsc{rlx}}}\mathop{{:}{=}}1) preceding the change to flagflag has taken effect.

  • Acquire (acq, memory   order   acquire). The acquire tag can be applied to loads, and is the reciprocal of a release: any subsequent loads will see everything the acquire can see. For example, continuing from above, a simple process that reads the flagflag in parallel with the above process may be written as follows:

    (2.63) f:=flagacq;r:=xrlxf\mathop{{:}{=}}{flag}^{{\textsc{acq}}}\mathchar 24635\relax\;r\mathop{{:}{=}}{x}^{{\textsc{rlx}}}

    In this case the loads are kept in order by the acquire constraint, and hence at the end of the program, in the absence of any other interference, f=1r=1f=1\Rightarrow r=1.

  • Consume (con, memory   order   consume). The consume tag is similar to, but weaker than, acq, in that it is intended to be partnered with a rel, but only subsequent loads that are data-dependent on the loaded value are guaranteed to see the change. Hence,

    (2.64) f:=flagcon;r:=xrlx{f\mathop{{:}{=}}flag}^{{\textsc{con}}}\mathchar 24635\relax\;{r\mathop{{:}{=}}x}^{{\textsc{rlx}}}

    does not give f=1r=1f=1\Rightarrow r=1. However, after

    (2.65) f:=flagcon;y:=frlxf\mathop{{:}{=}}{flag}^{{\textsc{con}}}\mathchar 24635\relax\;{y\mathop{{:}{=}}f}^{{\textsc{rlx}}}

    then f=1y=1f=1\Rightarrow y=1. Data-dependence is maintained by the \ext@arrow0055\Leftarrowfill@g\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}} relation, and in that sense the con constraint provides no extra reordering information to that of rlx. However the intention is that a con load indicates to the compiler that it must not lose, via optimisations, data dependencies to later instructions. We return to con in Sect. 8.2, in the context of expression optimisations, but for concision in the rest of the paper we omit consideration of con, which otherwise behaves as a rlx constraint.

  • Sequentially consistent (sc, memory   order   seq   cst). The sequentially consistent constraint is the strongest, forcing order between sc-tagged instructions and any other instructions. For example, the snippet

    (2.66) xsc:=1;flagsc:=𝖳𝗋𝗎𝖾{x}^{{\textsc{sc}}}\mathop{{:}{=}}1\mathchar 24635\relax\;{flag}^{{\textsc{sc}}}\mathop{{:}{=}}\mathsf{True}

    ensures flag=1x=1flag=1\Rightarrow x=1 (in fact only one instruction needs the constraint for this to hold). This is considered a more “heavyweight” method for enforcing order than the acquire/release constraints.

  • Acquire-release (acqrel, memory   order   acq   rel). This constraint is used on instructions that have both an acquire and release component, and as will be seen it is straightforward to combine them in our syntax.

Non-atomics

Additionally shared data can be “non-atomic”, i.e., variables that are not declared of type atomic*, and as such cannot be directly associated with the ordering constraints above. Programs which attempt to concurrently access shared data without any of the ordering mechanisms of atomics.h essentially have no guarantees about the behaviour, and so we ignore such programs. For programs that include shared non-atomic variables which are correctly synchronised, i.e., are associated with some ordering mechanism (e.g., barriers or acquire/release flag variables) they can be treated as if relaxed, and are subject to potential optimisations as outlined in Sect. 8. Distinguishing between correctly synchronised shared non-atomic variables is a syntactic check that compilers carry out, which could be carried over into our syntactic framework, but this is not directly relevant to the question of reorderings.

2.3.2. Formalising memory order constraints

The relationship between the ordering constraints can be thought of in terms of a reordering relationship, \ext@arrow0055\Leftarrowfill@oc\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}, as in the previous sections.

Definition 2.4 (Memory ordering constraints).
\ext@arrow0055\Leftarrowfill@oc\displaystyle\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} {(rlx,rlx),(rlx,acq),(rel,rlx),(rel,acq)}\displaystyle\{({\textsc{rlx}},{\textsc{rlx}}),({\textsc{rlx}},{\textsc{acq}}),({\textsc{rel}},{\textsc{rlx}}),({\textsc{rel}},{\textsc{acq}})\}

We write oc_1\ext@arrow0055\Leftarrowfill@ococ_2{\textsc{oc}}_{\_}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}{\textsc{oc}}_{\_}2 if (oc_1,oc_2)\ext@arrow0055\Leftarrowfill@oc({\textsc{oc}}_{\_}1,{\textsc{oc}}_{\_}2)\in\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}, and as before oc_1/\ext@arrow0055\Leftarrowfill@ococ_2{\textsc{oc}}_{\_}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}$}}}}{\textsc{oc}}_{\_}2 otherwise. Expressing the relation as a negative is perhaps more intuitive, i.e., for all constraints oc, oc/\ext@arrow0055\Leftarrowfill@ocsc/\ext@arrow0055\Leftarrowfill@ococ{\textsc{oc}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}$}}}}{\textsc{sc}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}$}}}}{\textsc{oc}}, oc/\ext@arrow0055\Leftarrowfill@ocrel{\textsc{oc}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}$}}}}{\textsc{rel}}, and acq/\ext@arrow0055\Leftarrowfill@ococ{\textsc{acq}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}$}}}}{\textsc{oc}}. Additionally, the con constraint is equal to a rlx constraint for the purposes of ordering (but see Sect. 8.2), and acqrel is the combination of both acq and rel. An alternative presentation of the relationship of oc_1\ext@arrow0055\Leftarrowfill@ococ_2{\textsc{oc}}_{\_}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}{\textsc{oc}}_{\_}2 is as a grid, i.e.,

(2.67) oc_1oc_2rlxrelacqscrlxΓ\symAMSa058×Γ\symAMSa058×relΓ\symAMSa058×Γ\symAMSa058×acq××××sc××××\begin{array}[]{lcccc}\downarrow\!{\textsc{oc}}_{\_}1\quad{\textsc{oc}}_{\_}2\!\rightarrow&{\textsc{rlx}}&{\textsc{rel}}&{\textsc{acq}}&{\textsc{sc}}\\ {\textsc{rlx}}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{1.,0.,0.}\times}\\ {\textsc{rel}}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{1.,0.,0.}\times}\\ {\textsc{acq}}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{1.,0.,0.}\times}\\ {\textsc{sc}}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{1.,0.,0.}\times}&{\color[rgb]{1.,0.,0.}\times}\end{array}

Note that acquire loads may come before release stores, that is, C follows the rcpc model rather than the the stronger rcsc model in (Gharachorloo et al., 1990) (it is of course straightforward to accommodate either; see also (Colvin, 2021b)).

We extend the syntax of instructions to incorporate ordering constraints, allowing variables and fences to be annotated with a set thereof, as shown in Fig. 1. The reordering relation for instructions, considering only ordering constraints, can be defined as below.

Definition 2.5.

Reordering instructions with respect to ordering constraints

(2.68) α\ext@arrow0055\Leftarrowfill@ocsβ\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{ocs}}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} α×β\ext@arrow0055\Leftarrowfill@oc\displaystyle\lceil\alpha\rceil\times\lceil\beta\rceil\subseteq\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}

where the .\lceil.\rceil function extracts the ordering constraints from the expression and instructions syntax.

v=xocs=ocs\displaystyle\lceil v\rceil~{}~{}=~{}~{}\varnothing\qquad\lceil{x}^{ocs}\rceil~{}~{}=~{}~{}ocs
(2.69) e=ee_1e_2=e_1e_2\displaystyle\lceil\oplus e\rceil~{}~{}=~{}~{}\lceil e\rceil\qquad\lceil e_{\_}1\oplus e_{\_}2\rceil~{}~{}=~{}~{}\lceil e_{\_}1\rceil\mathbin{\mathstrut{\cup}}\lceil e_{\_}2\rceil
xocs:=e=ocsee=e𝖿ocs=ocs\displaystyle\lceil{x}^{ocs}\mathop{{:}{=}}e\rceil~{}~{}=~{}~{}ocs\mathbin{\mathstrut{\cup}}\lceil e\rceil\qquad\lceil\llparenthesis e\rrparenthesis\rceil~{}~{}=~{}~{}\lceil e\rceil\qquad\lceil{\mathsf{f}}^{ocs}\rceil~{}~{}=~{}~{}ocs

For example, xrel:=yacq={rel,acq}\lceil{x}^{{\textsc{rel}}}\mathop{{:}{=}}{y}^{{\textsc{acq}}}\rceil=\{{\textsc{rel}},{\textsc{acq}}\}. Thus α\ext@arrow0055\Leftarrowfill@ocsβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{ocs}}}}}}\beta is checked by comparing point-wise each pair of ordering constraints in α\alpha and β\beta. If α\alpha or β\beta has no ordering constraints then α\ext@arrow0055\Leftarrowfill@ocsβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{ocs}}}}}}\beta vacuously holds.

For example

xrel:=yacq\ext@arrow0055\Leftarrowfill@ocsxrlx:=1\displaystyle{x}^{{\textsc{rel}}}\mathop{{:}{=}}{y}^{{\textsc{acq}}}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{ocs}}}}}}{x}^{{\textsc{rlx}}}\mathop{{:}{=}}1 \displaystyle\Leftrightarrow xrel:=yacq×xrlx:=1\ext@arrow0055\Leftarrowfill@oc\displaystyle\lceil{x}^{{\textsc{rel}}}\mathop{{:}{=}}{y}^{{\textsc{acq}}}\rceil\times\lceil{x}^{{\textsc{rlx}}}\mathop{{:}{=}}1\rceil\subseteq\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}
\displaystyle\Leftrightarrow {rel,acq}×{rlx}\ext@arrow0055\Leftarrowfill@oc\displaystyle\{{\textsc{rel}},{\textsc{acq}}\}\times\{{\textsc{rlx}}\}\subseteq\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}
\displaystyle\Leftrightarrow rel\ext@arrow0055\Leftarrowfill@ocrlxacq\ext@arrow0055\Leftarrowfill@ocrlx\displaystyle{\textsc{rel}}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}{\textsc{rlx}}\mathrel{\mathstrut{\wedge}}{\textsc{acq}}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}{\textsc{rlx}}
\displaystyle\Leftrightarrow 𝖥𝖺𝗅𝗌𝖾by Defn. (2.4)\displaystyle\mathsf{False}\quad\mbox{by Defn.~{}(\ref{defn:roOC})}

Note that a locals-only instructions, such as r_1:=r_22r_{\_}1\mathop{{:}{=}}r_{\_}2*2, does not have any ordering constraints, and so is not affected by them, that is r_1:=r_22=\lceil r_{\_}1\mathop{{:}{=}}r_{\_}2*2\rceil=\varnothing and hence r_1:=r_22\ext@arrow0055\Leftarrowfill@ocsβr_{\_}1\mathop{{:}{=}}r_{\_}2*2\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{ocs}}}}}}\beta for all β\beta.

x𝖵𝖺𝗋ocs𝑃OCx\in{\sf Var}\quad ocs\in\mathop{\mathstrut{\mathbb P}}\nolimits OC

oc ::=\displaystyle\mathrel{:\!:\!=} rlx|rel|acq|con|sc\displaystyle{\textsc{rlx}}\mathrel{~{}|~{}}{\textsc{rel}}\mathrel{~{}|~{}}{\textsc{acq}}\mathrel{~{}|~{}}{\textsc{con}}\mathrel{~{}|~{}}{\textsc{sc}}
acqrel =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} {acq,rel}\displaystyle\{{\textsc{acq}},{\textsc{rel}}\}
e\displaystyle e ::=\displaystyle\mathrel{:\!:\!=} v|xocs|e|e_1e_2\displaystyle v\mathrel{~{}|~{}}{x}^{ocs}\mathrel{~{}|~{}}\ominus e\mathrel{~{}|~{}}e_{\_}1\oplus e_{\_}2
α\displaystyle\alpha ::=\displaystyle\mathrel{:\!:\!=} xocs:=e|e|𝖿ocs\displaystyle{x}^{ocs}\mathop{{:}{=}}e\mathrel{~{}|~{}}\llparenthesis e\rrparenthesis\mathrel{~{}|~{}}{\mathsf{f}}^{ocs}
𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞\displaystyle\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝗌𝗍𝗈𝗋𝖾𝖿𝗇𝖼rel\displaystyle{\mathsf{store_{fnc}}}^{{\textsc{rel}}}
𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞\displaystyle\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝗅𝗈𝖺𝖽𝖿𝗇𝖼acq\displaystyle{\mathsf{load_{fnc}}}^{{\textsc{acq}}}
𝐬𝐜 𝐟𝐞𝐧𝐜𝐞\displaystyle\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝖿𝗎𝗅𝗅𝖿𝗇𝖼sc\displaystyle{\mathsf{full_{fnc}}}^{{\textsc{sc}}}
for r𝖫𝗈𝖼𝖺𝗅r\in{\sf Local}, rr abbreviates r{r}
for x𝖲𝗁𝖺𝗋𝖾𝖽x\in{\sf Shared}, xoc{x}^{{\textsc{oc}}} abbreviates x{oc}{x}^{\{{\textsc{oc}}\}}, and
xx abbreviates xrlx{x}^{{\textsc{rlx}}}
for 𝖿\mathsf{f} a fence, 𝖿oc{\mathsf{f}}^{{\textsc{oc}}} abbreviates 𝖿{oc}{\mathsf{f}}^{\{{\textsc{oc}}\}}
\Description

TODO

Figure 1. Syntax extensions for C ordering constraints

For convenience we define some abbreviations and conventions at the bottom of Fig. 1: we require every reference to a shared variable to have (at least one) ordering constraint, with the default being rlx; when there is exactly one ordering constraint we omit the set comprehension brackets in the syntax, and typically, when the types are clear, we abbreviate xrlx{x}^{{\textsc{rlx}}} to plain xx as rlx is the default. Local variables are by definition never declared “atomic” and hence their set of ordering constraints is always empty; hence, when rr is a local variable, we abbreviate r{r} to rr. Similarly we abbreviate ordering constraints on fences. We can now define release, acquire and sequentially consistent fences as the combination of a fence and ordering constraint. A “release fence” (𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence}) operates according to the rel semantics above, and in addition blocks stores, hence is a combination of a 𝗌𝗍𝗈𝗋𝖾𝖿𝗇𝖼\mathsf{store_{fnc}} and rel ordering, and similarly an “acquire fence” (𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence}) acts as a 𝗅𝗈𝖺𝖽𝖿𝗇𝖼\mathsf{load_{fnc}} and acq ordering. A “sequentially consisitent” fence (𝐬𝐜 𝐟𝐞𝐧𝐜𝐞\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}) is also defined straightforwardly; these fences map to C’s atomicthreadfence(...) definition.

To complete the syntax extension we update the syntax-based definitions for extracting variables from expressions and instructions by defining 𝗐𝗏(xocs:=e)={x}{\sf wv}({x}^{ocs}\mathop{{:}{=}}e)=\{x\} and 𝗋𝗏(xocs)={x}{\sf rv}({x}^{ocs})=\{x\}, that is, read/write variables do not include the ordering constraints (and 𝗐𝗏(𝖿ocs)=𝗋𝗏(𝖿ocs)={\sf wv}({\mathsf{f}}^{ocs})={\sf rv}({\mathsf{f}}^{ocs})=\varnothing).

2.4. The complete reordering relation

We can now define the C memory model as the combination of the three aspects above.

Definition 2.6 (Reordering of instructions in c).
(2.70) α\ext@arrow0055\Leftarrowfill@cβ\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta iff\displaystyle iff (i)  α\ext@arrow0055\Leftarrowfill@gβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}}\beta,     (ii)  α\ext@arrow0055\Leftarrowfill@fncβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{fnc}}}}}}\beta,     and (iii)  α\ext@arrow0055\Leftarrowfill@ocsβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{ocs}}}}}}\beta

Hence reordering of instructions within a C program can occur provided the sequential semantics, fences, and ordering constraints are respected. We show in later sections how this principle does not change for more complex language features, though, of course, the semantics and hence the analysis is correspondingly more complex.

As examples, for distinct x,y𝖲𝗁𝖺𝗋𝖾𝖽x,y\in{\sf Shared} and r,r_i𝖫𝗈𝖼𝖺𝗅r,r_{\_}i\in{\sf Local},

x:=1\ext@arrow0055\Leftarrowfill@cy:=1\displaystyle x\mathop{{:}{=}}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}y\mathop{{:}{=}}1 r_1:=x\ext@arrow0055\Leftarrowfill@cr_2:=y\displaystyle r_{\_}1\mathop{{:}{=}}x\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}r_{\_}2\mathop{{:}{=}}y
y:=1/\ext@arrow0055\Leftarrowfill@cxrel:=1\displaystyle y\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}{x}^{{\textsc{rel}}}\mathop{{:}{=}}1 but\displaystyle but xrel:=1\ext@arrow0055\Leftarrowfill@cr:=y\displaystyle{x}^{{\textsc{rel}}}\mathop{{:}{=}}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}r\mathop{{:}{=}}y
r_1:=xacq/\ext@arrow0055\Leftarrowfill@cr_2:=y\displaystyle r_{\_}1\mathop{{:}{=}}{x}^{{\textsc{acq}}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}r_{\_}2\mathop{{:}{=}}y but\displaystyle but r_1:=y\ext@arrow0055\Leftarrowfill@cr_2:=xacq\displaystyle r_{\_}1\mathop{{:}{=}}y\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}r_{\_}2\mathop{{:}{=}}{x}^{{\textsc{acq}}}
α/\ext@arrow0055\Leftarrowfill@c𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞\displaystyle\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence} 𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞\ext@arrow0055\Leftarrowfill@cr:=x\displaystyle\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}r\mathop{{:}{=}}x
𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞/\ext@arrow0055\Leftarrowfill@cα\displaystyle\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\alpha x:=1\ext@arrow0055\Leftarrowfill@c𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞\displaystyle x\mathop{{:}{=}}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence}

Hence, following the earlier definitions, we have various ways of enforcing program order using the flag set/check (message passing) pattern from earlier. We leave rlx accesses of shared variables xx and flagflag implicit.

(2.75) x:=1c𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞cflag:=𝖳𝗋𝗎𝖾x:=1𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞flag:=𝖳𝗋𝗎𝖾f:=flagc𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞cr:=xf:=flag𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞r:=xx:=1cflagrel:=𝖳𝗋𝗎𝖾x:=1flagrel:=𝖳𝗋𝗎𝖾f:=flagacqcr:=xf:=flagacqr:=x\begin{array}[]{ccc}x\mathop{{:}{=}}1\overset{{{\textsc{c}}}}{\triangleright}\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence}\overset{{{\textsc{c}}}}{\triangleright}flag\mathop{{:}{=}}\mathsf{True}&\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}&x\mathop{{:}{=}}1\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{flag\mathop{{:}{=}}\mathsf{True}}}\\ f\mathop{{:}{=}}flag\overset{{{\textsc{c}}}}{\triangleright}\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence}\overset{{{\textsc{c}}}}{\triangleright}r\mathop{{:}{=}}x&\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}&f\mathop{{:}{=}}flag\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{r\mathop{{:}{=}}x}}\\ x\mathop{{:}{=}}1\overset{{{\textsc{c}}}}{\triangleright}{flag}^{{\textsc{rel}}}\mathop{{:}{=}}\mathsf{True}&\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}&x\mathop{{:}{=}}1\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{{flag}^{{\textsc{rel}}}\mathop{{:}{=}}\mathsf{True}}\\ f\mathop{{:}{=}}{flag}^{{\textsc{acq}}}\overset{{{\textsc{c}}}}{\triangleright}r\mathop{{:}{=}}x&\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}&f\mathop{{:}{=}}{flag}^{{\textsc{acq}}}\mathop{\raisebox{1.0pt}{${{{\blacktriangleright}}}$}}{r\mathop{{:}{=}}x}\end{array}

3. An imperative language with reordering

We now show how reordering according to the C memory model can be embedded into a more realistic imperative language that has conditionals and loops, based on the previously described wide-spectrum language IMP+pseq (Colvin, 2021a; Colvin and Smith, 2018; Colvin, 2021b). We give a small-step operational semantics and define trace equivalence for its notion of correctness.

3.1. Syntax

(3.1) c\displaystyle c ::=\displaystyle\mathrel{:\!:\!=} 𝐧𝐢𝐥αc_1;mc_2c_1c_2\bodycm\displaystyle\mathbf{nil}\mid\vec{\alpha}\mid c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2\mid c_{\_}1\sqcap c_{\_}2\mid\body{c}{{{\textsc{m}}}}
(3.2) τ\displaystyle\tau =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝖳𝗋𝗎𝖾\displaystyle\llparenthesis\mathsf{True}\rrparenthesis
(3.3) c_1;c_2\displaystyle c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}2 =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} c_1;cc_2\displaystyle c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{c}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2
(3.4) c_1c_2\displaystyle c_{\_}1\centerdot c_{\_}2 =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} c_1;scc_2\displaystyle c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{sc}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2
(3.5) c_1c_2\displaystyle c_{\_}1\parallel c_{\_}2 =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} c_1;parc_2\displaystyle c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{par}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2
(3.6) cm0=^𝐧𝐢𝐥\displaystyle c^{0}_{{{\textsc{m}}}}\mathrel{\mathstrut{\widehat{=}}}\mathop{\bf nil}\hskip 5.69054pt cmn+1=^c;mcmn\displaystyle\hskip 5.69054ptc^{n+1}_{{{\textsc{m}}}}\mathrel{\mathstrut{\widehat{=}}}c\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c^{n}_{{{\textsc{m}}}}
(3.7) 𝐢𝐟mb𝐭𝐡𝐞𝐧c_1𝐞𝐥𝐬𝐞c_2\displaystyle\mathrel{\mathbf{if}}^{{{\textsc{m}}}}b\mathrel{\mathbf{then}}c_{\_}1\mathrel{\mathbf{else}}c_{\_}2 =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} b;mc_1¬b;mc_2\displaystyle\llparenthesis b\rrparenthesis\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}1\ \ \sqcap\ \ \llparenthesis\neg b\rrparenthesis\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2
(3.8) 𝐰𝐡𝐢𝐥𝐞mb𝐝𝐨c\displaystyle\mathrel{\mathbf{while}}^{{{\textsc{m}}}}b\mathrel{\mathbf{do}}c =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} \body(b;mc)m;m¬b\displaystyle\body{(\llparenthesis b\rrparenthesis\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c)}{{{\textsc{m}}}}\ \mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}\ \llparenthesis\neg b\rrparenthesis
(3.9) 𝐜𝐚𝐬(x,e,e)\displaystyle\mathbf{cas}(x,e,e^{\prime}) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} x=e,x:=exe\displaystyle\langle\llparenthesis x=e\rrparenthesis~{},~{}x\mathop{{:}{=}}e^{\prime}\rangle\sqcap\llparenthesis x\neq e\rrparenthesis
(3.10) 𝐜𝐚𝐬(x,e,e)acqrel\displaystyle{\mathbf{cas}(x,e,e^{\prime})}^{{\textsc{acqrel}}} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} xacq=e,xrel:=exacqe\displaystyle\langle\llparenthesis{x}^{{\textsc{acq}}}=e\rrparenthesis~{},~{}{x}^{{\textsc{rel}}}\mathop{{:}{=}}e^{\prime}\rangle\sqcap\llparenthesis{x}^{{\textsc{acq}}}\neq e\rrparenthesis
(3.11) r:=𝐜𝐚𝐬(x,e,e)\displaystyle r\mathop{{:}{=}}\mathbf{cas}(x,e,e^{\prime}) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} (x=e,x:=e;r:=𝖳𝗋𝗎𝖾)(xe;r:=𝖥𝖺𝗅𝗌𝖾)\displaystyle(\langle\llparenthesis x=e\rrparenthesis~{},~{}x\mathop{{:}{=}}e^{\prime}\rangle\mathrel{\mathchar 24635\relax}r\mathop{{:}{=}}\mathsf{True})\sqcap(\llparenthesis x\neq e\rrparenthesis\mathrel{\mathchar 24635\relax}r\mathop{{:}{=}}\mathsf{False})
(3.12) r:=𝐟𝐚𝐚(x,e)\displaystyle r\mathop{{:}{=}}\mathbf{faa}(x,e) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} r:=e,x:=x+e\displaystyle\langle r\mathop{{:}{=}}e~{},~{}x\mathop{{:}{=}}x+e\rangle
Figure 2. Syntax of IMP+pseqC(building on Fig. 1)
\Description

TODO

The syntax of IMP+pseqCis given in Fig. 2, with expressions and instructions remaining as shown in Fig. 1.

Commands

The command syntax (defn. (3.1)) includes the terminated command, 𝐧𝐢𝐥\mathop{\bf nil}, a sequence of instructions, α\vec{\alpha} (allowing composite actions to be defined), the parallelized sequential composition of two commands according to some memory model m, c_1;mc_2c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2, a choice between two commands, c_1c_2c_{\_}1\sqcap c_{\_}2, or a parallelized iteration of a command according to some memory model m, \bodycm\body{c}{{{\textsc{m}}}}. From this language we can build an imperative language with conditionals and loops following algebraic patterns (Fischer and Ladner, 1979; Kozen, 2000). We define the special action type τ\tau as a 𝖳𝗋𝗎𝖾\mathsf{True} guard (3.2); this action has no observable effect and is not considered for the purposes of determining (trace) equivalence. We allow the abbreviation c_1;c_2c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}2 for the case where the model parameter is c (3.3). We also introduce abbreviations for strict (traditional) order (3.4) and parallel composition (3.5), based on the memory models sc and par introduced earlier (defns. (2.2) and (2.3)). The (parallelized) iteration of a command a finite number of times is defined inductively in defn. (3.6). Conditionals defn. (3.7) and while loops (3.8) can be constructed in the usual way using guards and iteration. We define an empty 𝖥𝖺𝗅𝗌𝖾\mathsf{False} branch conditional, 𝐢𝐟mb𝐭𝐡𝐞𝐧c\mathrel{\mathbf{if}}^{{{\textsc{m}}}}b\mathrel{\mathbf{then}}c, as 𝐢𝐟mb𝐭𝐡𝐞𝐧c𝐞𝐥𝐬𝐞𝐧𝐢𝐥\mathrel{\mathbf{if}}^{{{\textsc{m}}}}b\mathrel{\mathbf{then}}c\mathrel{\mathbf{else}}\mathop{\bf nil}.

Guards

The use of a guard action type e\llparenthesis e\rrparenthesis allows established encodings of conditionals and loops as described above. However treating a guard as a separate action is useful in considering reorderings as well, and in particular in understanding the interaction of conditionals with compiler optimisations: the fundamentals of reorderings involving guards are based on the principles of preserving sequential semantics (on a single thread) as in Sect. 2.2.1, and these can be lifted to the conditional and loop command types straightforwardly, without needing to deal with them monolithically. Note that if a guard evaluates to 𝖥𝖺𝗅𝗌𝖾\mathsf{False} this represents a behaviour that cannot occur.

Composite actions

Since we allow sequences of instructions α\vec{\alpha} to be the basic building block of the language, with the intention that all instructions in the sequence are executed, in order, as a single indivisible step, we can straightforwardly define complex instructions types such as “compare-and-swap”. For lists we write \langle\rangle for the empty list, Γ\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}} for concatenation, and \langle\ldots\rangle as the list constructor. For notational ease we let a singleton sequence of actions α\langle\alpha\rangle just be written α\alpha where the intended type is clear from the context. For brevity we allow Γ\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}} to accept single elements in place of singleton lists. Note that an instruction such as x:=yx\mathop{{:}{=}}y happens as a single indivisible step, that is, the value for yy is fetched and written into xx in one step. This is not realistic for C, and as such we later (Sect. 7, see also Appendix B) show how instructions can be incrementally executed (in the above case, with the value for xx fetched, and only later updating yy to that value). For now we assume that anything evaluated incrementally is written out to make the granularity explicit, for example, the above assignment becomes tmp:=y;x:=tmptmp\mathop{{:}{=}}y\mathchar 24635\relax\;x\mathop{{:}{=}}tmp, for some fresh identifier tmptmp; see further discussion in Sect. 7.3.

The composite compare-and-swap command 𝐜𝐚𝐬(x,e,e)\mathbf{cas}(x,e,e^{\prime}) is defined as a choice between determining that x=ex=e and updating xx to ee^{\prime} in a single indivisible step, or determining that xex\neq e (defn. (3.9)). This can be generalised to include ordering constraints, for instance, in the acqrel case (3.10). Updates to a local variable can be included to record the result (3.11). It is of course straightforward to define other composite commands, such as fetch-and-add (3.12), which map to C’s inbuilt functions such as atomic   compare   exhange   strong   explicit(…) and atomic   fetch   add   explicit(…). We show these to emphasise that the definition of reordering does not have to change whenever a new instruction type is added; we can easily syntactically extract the required elements. For instance, given the above definitions, we can determine the following.

(3.13) 𝐜𝐚𝐬(x,v,v)\ext@arrow0055\Leftarrowfill@cr:=ybut𝐜𝐚𝐬(x,v,v)/\ext@arrow0055\Leftarrowfill@cr:=xand𝐜𝐚𝐬(x,v,v)acqrel/\ext@arrow0055\Leftarrowfill@cr:=y\mathbf{cas}(x,v,v^{\prime})\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}r\mathop{{:}{=}}y\quad\mbox{but}\quad\mathbf{cas}(x,v,v^{\prime})\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}r\mathop{{:}{=}}x\quad\mbox{and}\quad{\mathbf{cas}(x,v,v^{\prime})}^{{\textsc{acqrel}}}\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}r\mathop{{:}{=}}y

since 𝗐𝗏(𝐜𝐚𝐬(x,v,v))={x}{\sf wv}(\mathbf{cas}(x,v,v^{\prime}))=\{x\}, and acq𝐜𝐚𝐬(x,v,v)acqrel{\textsc{acq}}\in\lceil{\mathbf{cas}(x,v,v^{\prime})}^{{\textsc{acqrel}}}\rceil. We lift the write/read variables of instructions to commands straightforwardly (see Appendix A), and as such reordering on commands can be calculated, for example,

(3.14) (𝐢𝐟cr>0𝐭𝐡𝐞𝐧x:=1𝐞𝐥𝐬𝐞y:=1)\ext@arrow0055\Leftarrowfill@cz:=1(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r>0\mathrel{\mathbf{then}}x\mathop{{:}{=}}1\mathrel{\mathbf{else}}y\mathop{{:}{=}}1)\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}z\mathop{{:}{=}}1

since the assignment to zz is not dependent on anything in the conditional statement.

3.2. Small-step operational semantics

Rule 3.15 (Action).
αα𝐧𝐢𝐥\vec{\alpha}\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\vec{\alpha}}}{{\longrightarrow}}}\mathbf{nil}
Rule 3.16 (Choice).
cdτccdτdc\sqcap d\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}c\qquad c\sqcap d\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}d
Rule 3.17 (Iterate).
\bodycmτcmn\body{c}{{{\textsc{m}}}}\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}c^{n}_{{{\textsc{m}}}}
Rule 3.18 (Parallelized sequential composition).
c_1αc_1c_1;mc_2αc_1;mc_2𝐧𝐢𝐥;mc_2τc_2c_2βc_2c_1\ext@arrow0055\Leftarrowfill@mβc_1;mc_2βc_1;mc_2\begin{array}[]{c}c_{\_}1\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}c_{\_}1^{\prime}\\ \cline{1-1}\cr c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}c_{\_}1^{\prime}\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2\end{array}\qquad\mathop{\bf nil}\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}c_{\_}2\qquad\begin{array}[]{c}c_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\beta}}{{\longrightarrow}}}c_{\_}2^{\prime}\qquad c_{\_}1\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{m}}}}}}\beta\\ \cline{1-1}\cr c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\beta}}{{\longrightarrow}}}c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2^{\prime}\end{array}
Figure 3. Operational semantics
\Description

TODO

The semantics of IMP+pseq is given in Fig. 3 (an adaptation of (Colvin, 2021b)). A step (action) is a sequence of instructions which are considered to be executed together, without interference; in the majority of cases the sequences (actions) are singleton. Rule 3.15 places a list of instructions into the trace as a single list (so a trace is a list of lists). Rule 3.16 states that a nondeterministic choice is resolved silently to either branch (in the rules we let τ\tau abbreviate the action τ\langle\tau\rangle; alternatively we could define τ\tau as the empty sequence). Rule 3.17 nondeterministically picks a finite number (nn) of times to iterate cc, where finite iteration is defined in defn. (3.6).

Rule 3.18 is the interesting rule, which generalises the earlier rule for prefixing: a command c_1;mc_2c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2 can take a step of c_1c_{\_}1, or begin execution of c_2c_{\_}2 if c_1c_{\_}1 is terminated, or execute a step β\beta of c_2c_{\_}2 if β\beta reorders with c_1c_{\_}1. Reordering of an action (list of instructions) with a command is lifted from reordering on instructions straightforwardly (Appendix A).

As an example, from (3.14) we can deduce the following.

(𝐢𝐟cr>0𝐭𝐡𝐞𝐧x:=1𝐞𝐥𝐬𝐞y:=1);z:=1z:=1(𝐢𝐟cr>0𝐭𝐡𝐞𝐧x:=1𝐞𝐥𝐬𝐞y:=1);𝐧𝐢𝐥(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r>0\mathrel{\mathbf{then}}x\mathop{{:}{=}}1\mathrel{\mathbf{else}}y\mathop{{:}{=}}1)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}z\mathop{{:}{=}}1\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{z\mathop{{:}{=}}1}$}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r>0\mathrel{\mathbf{then}}x\mathop{{:}{=}}1\mathrel{\mathbf{else}}y\mathop{{:}{=}}1)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathop{\bf nil}

That is, the assignment to zz can occur before the conditional; this represents the compiler deciding to move the store before the test since it will happen on either path.

3.3. Trace semantics

Given a program c_0c_{\_}0 the operational semantics generates a trace, that is, a finite sequence of steps c_0α_1c_1α_2c_{\_}0\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha_{\_}1}}{{\longrightarrow}}}c_{\_}1\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha_{\_}2}}{{\longrightarrow}}}\ldots where the labels in the trace are actions555 Since infinite traces do not add anything of special interest to the discussion of weak memory models over and above finite traces, we focus on finite traces only to avoid the usual extra complications that infinite traces introduce. . We write c\ext@arrow0359\Rightarrowfill@tcc\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,t\,}$}}c^{\prime} to say that cc executes the actions in trace tt and evolves to cc^{\prime}, inductively constructed below. The base case for the induction is given by c\ext@arrow0359\Rightarrowfill@cc\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,\langle\rangle\,}$}}c.

(3.19) cαc𝗏𝗂𝗌𝗂𝖻𝗅𝖾αc\ext@arrow0359\Rightarrowfill@tc′′\displaystyle c\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}c^{\prime}\mathrel{\mathstrut{\wedge}}{\sf visible}~{}\alpha\mathrel{\mathstrut{\wedge}}c^{\prime}\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,t\,}$}}c^{\prime\prime} \displaystyle\Rightarrow c\ext@arrow0359\Rightarrowfill@αΓtc′′\displaystyle c\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,\alpha\mathbin{\raise 2.41112pt\hbox{$\mathchar 0\relax\@@cat$}}t\,}$}}c^{\prime\prime}
(3.20) cαc𝗌𝗂𝗅𝖾𝗇𝗍αc\ext@arrow0359\Rightarrowfill@tc′′\displaystyle c\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}c^{\prime}\mathrel{\mathstrut{\wedge}}{\sf silent}~{}\alpha\mathrel{\mathstrut{\wedge}}c^{\prime}\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,t\,}$}}c^{\prime\prime} \displaystyle\Rightarrow c\ext@arrow0359\Rightarrowfill@tc′′\displaystyle c\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,t\,}$}}c^{\prime\prime}
(3.21) c=^{t|c\ext@arrow0359\Rightarrowfill@t𝐧𝐢𝐥}cd\displaystyle\llbracket c\rrbracket~{}~{}\mathrel{\mathstrut{\widehat{=}}}~{}~{}\{t|c\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,t\,}$}}\mathop{\bf nil}\}\quad\qquad c\mathrel{\sqsubseteq}d =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} dccd=^cddc\displaystyle\llbracket d\rrbracket\subseteq\llbracket c\rrbracket\quad\qquad c\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}d~{}~{}\mathrel{\mathstrut{\widehat{=}}}~{}~{}c\mathrel{\sqsubseteq}d\mathrel{\mathstrut{\wedge}}d\mathrel{\sqsubseteq}c

Traces of visible actions are accumulated into the trace (using ‘Γ\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}’ for list concatenation) (3.19), and silent actions (such as τ\tau) are discarded (3.20), i.e., we have a “weak” notion of equivalence (Milner, 1982). A visible action is any action with a visible effect, for instance, fences, assignments, and guards with free variables. Silent actions include any guard which is 𝖳𝗋𝗎𝖾\mathsf{True} in any state and contains no free variables; for instance, 0=0\llparenthesis 0=0\rrparenthesis is silent while x=x\llparenthesis x=x\rrparenthesis is not. A third category of actions, 𝗂𝗇𝖿𝖾𝖺𝗌𝗂𝖻𝗅𝖾α{\sf infeasible}~{}\alpha, includes exactly those guards b\llparenthesis b\rrparenthesis where bb evaluates to 𝖥𝖺𝗅𝗌𝖾\mathsf{False} in every state. This includes actions such as xx\llparenthesis x\neq x\rrparenthesis, with the simplest example being 𝖥𝖺𝗅𝗌𝖾\llparenthesis\mathsf{False}\rrparenthesis, which we abbreviate to 𝐦𝐚𝐠𝐢𝐜{\bf magic} (Morgan, 1994). Any behaviour of cc in which an infeasible action occurs does not result in a finite terminating trace, and hence is excluded from consideration. Such behaviours include those where a branch is taken that eventually evaluates to 𝖥𝖺𝗅𝗌𝖾\mathsf{False}.

The meaning of a command cc is its set of all possible terminating behaviours c\llbracket c\rrbracket, leading to the usual (reverse) subset inclusion notion of refinement, where cdc\mathrel{\sqsubseteq}d if every behaviour of dd is a behaviour of cc; our notion of command equivalence is refinement in both directions (3.21).

From the semantics we can derive the usual properties such as cd=cd\llbracket c\sqcap d\rrbracket=\llbracket c\rrbracket\mathbin{\mathstrut{\cup}}\llbracket d\rrbracket and c;d=cΓd\llbracket c\mathrel{\mathchar 24635\relax}d\rrbracket=\llbracket c\rrbracket\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}\llbracket d\rrbracket (overloading ‘Γ\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}’ to mean pairwise concatenation of lists). We can use trace equivalence to define a set of rules for manipulating a program under refinement or equivalence; we elucidate a general set of these in the following section.

4. Reduction rules

Using the notion of trace equivalence the following properties can be derived for the language, and verified in Isabelle/HOL (Colvin, 2021b). The usual properties of commutativity, associativity, etc., for the standard operators of the language hold, and so we focus below on properties involving parallelized sequential composition.

(4.1) c_1;mc_2\displaystyle c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2 \displaystyle\mathrel{\sqsubseteq} c_1c_2\displaystyle c_{\_}1\centerdot c_{\_}2
(4.2) cdc\displaystyle c\sqcap d~{}~{}\mathrel{\sqsubseteq}~{}~{}c cdd\displaystyle c\sqcap d~{}~{}\mathrel{\sqsubseteq}~{}~{}d
(4.3) (αc)dα(cd)\displaystyle(\alpha\centerdot c)\parallel d\hskip 5.69054pt\mathrel{\sqsubseteq}\hskip 5.69054pt\alpha\centerdot(c\parallel d) c(βd)β(cd)\displaystyle c\parallel(\beta\centerdot d)\hskip 5.69054pt\mathrel{\sqsubseteq}\hskip 5.69054pt\beta\centerdot(c\parallel d)
(4.4) (c_1c_2)d\displaystyle(c_{\_}1\sqcap c_{\_}2)\parallel d \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (c_1d)(c_2d)\displaystyle(c_{\_}1\parallel d)\sqcap(c_{\_}2\parallel d)
(4.5) (c_1;mc_2);mc_3\displaystyle(c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2)\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}3 \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} c_1;m(c_2;mc_3)\displaystyle c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}(c_{\_}2\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}3)
(4.8) α/\ext@arrow0055\Leftarrowfill@cβα;β\displaystyle\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\beta\Rrightarrow\hskip 5.69054pt\hskip 5.69054pt\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} αβ\displaystyle\alpha\centerdot\beta
(4.9) α\ext@arrow0055\Leftarrowfill@cβα;β\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta\Rrightarrow\hskip 5.69054pt\hskip 5.69054pt\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} αβ\displaystyle\alpha\parallel\beta
(4.10) α\ext@arrow0055\Leftarrowfill@cβα;(βc)\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta\Rrightarrow\hskip 5.69054pt\hskip 5.69054pt\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\beta\centerdot c) \displaystyle\mathrel{\sqsubseteq} β(α;c)\displaystyle\beta\centerdot(\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c)

Law 4.1 states that a parallelized sequential composition can always be refined to a strict ordering. Law 4.2 states that a choice can be refined to its left branch (a symmetric rule holds for the right branch). Law 4.3 says that the first instruction of either process in a parallel composition can be the first step of the composition as a whole. Such refinement rules are useful for elucidating specific reordering and interleavings of parallel processes that lead to particular behaviours (essentially reducing to a particular trace). Law 4.4 is an equality, which states that if there is nondeterminism in a parallel process the effect can be understood by lifting the nondeterminism to the top level; such a rule is useful for the application of, for instance, Owicki-Gries reasoning (Sect. 5.4). Law 4.5 states that parallelized sequential composition is associative (provided the same model m is used on both instances). Laws 4.8 and 4.9 are special cases where, given two actions α\alpha and β\beta, if they cannot reorder then they are executed in order, and if they can it is as if they are executed in parallel. Law 4.10 straightforwardly promotes action β\beta of βc\beta\centerdot c before α\alpha, and depending on the structure of cc, further of its actions may be reordered before α\alpha.

We can extend these rules to more complex structures.

(4.11) c_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;c_2\displaystyle c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}2 \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} c_1𝐬𝐜 𝐟𝐞𝐧𝐜𝐞c_2\displaystyle c_{\_}1\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot c_{\_}2
(4.12) 𝐢𝐟my0𝐭𝐡𝐞𝐧x:=y\displaystyle\mathrel{\mathbf{if}}^{{{\textsc{m}}}}y\geq 0\mathrel{\mathbf{then}}x\mathop{{:}{=}}y \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} 𝐢𝐟scy0𝐭𝐡𝐞𝐧x:=y\displaystyle\mathrel{\mathbf{if}}^{{{\textsc{sc}}}}y\geq 0\mathrel{\mathbf{then}}x\mathop{{:}{=}}y
b/𝖿𝗏(e)𝖿𝗏(f)\displaystyle b\mathbin{/\!\!\!\in}{\sf fv}(e)\mathbin{\mathstrut{\cup}}{\sf fv}(f)\hskip 5.69054pt\Rightarrow\hskip 5.69054pt\qquad
(4.13) 𝐢𝐟cb𝐭𝐡𝐞𝐧x:=e𝐞𝐥𝐬𝐞y:=f\displaystyle\mathrel{\mathbf{if}}^{{{\textsc{c}}}}b\mathrel{\mathbf{then}}x\mathop{{:}{=}}e\mathrel{\mathbf{else}}y\mathop{{:}{=}}f \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (bx:=e)(¬by:=f)\displaystyle(\llparenthesis b\rrparenthesis\parallel x\mathop{{:}{=}}e)\sqcap(\llparenthesis\neg b\rrparenthesis\parallel y\mathop{{:}{=}}f)

Law 4.11 shows that (full) fences enforce ordering. Law 4.12 gives a special case of a conditional where the 𝖳𝗋𝗎𝖾\mathsf{True} branch depends on a shared variable in the condition, in which case the command is executed in-order (assuming model m respects data dependencies). Law 4.13 elucidates the potentially complex case of reasoning about conditionals in which there are no dependencies: theoretically the compiler could allow inner instructions to appear to be executed before the evaluation of the condition.

Assuming x𝖲𝗁𝖺𝗋𝖾𝖽x\in{\sf Shared}, b𝖫𝗈𝖼𝖺𝗅b\in{\sf Local}, and that xx is independent of commands c_1c_{\_}1 and c_2c_{\_}2 (i.e., x/𝗐𝗏(c)𝗐𝗏(d)x\mathbin{/\!\!\!\in}{\sf wv}(c)\mathbin{\mathstrut{\cup}}{\sf wv}(d)), we can derive the following.

(4.14) (𝐢𝐟cb𝐭𝐡𝐞𝐧c_1𝐞𝐥𝐬𝐞c_2);x:=v\displaystyle(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}b\mathrel{\mathbf{then}}c_{\_}1\mathrel{\mathbf{else}}c_{\_}2)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}v \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (𝐢𝐟cb𝐭𝐡𝐞𝐧c_1𝐞𝐥𝐬𝐞c_2)x:=v\displaystyle(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}b\mathrel{\mathbf{then}}c_{\_}1\mathrel{\mathbf{else}}c_{\_}2)\parallel x\mathop{{:}{=}}v
(4.15) 𝐢𝐟cb𝐭𝐡𝐞𝐧(c_1;x:=v)𝐞𝐥𝐬𝐞(c_2;x:=v)\displaystyle\mathrel{\mathbf{if}}^{{{\textsc{c}}}}b\mathrel{\mathbf{then}}(c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}v)\mathrel{\mathbf{else}}(c_{\_}2\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}v) \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (𝐢𝐟cb𝐭𝐡𝐞𝐧c_1𝐞𝐥𝐬𝐞c_2)x:=v\displaystyle(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}b\mathrel{\mathbf{then}}c_{\_}1\mathrel{\mathbf{else}}c_{\_}2)\parallel x\mathop{{:}{=}}v

These sort of structural rules help elucidate consequences of the memory model for programmers at a level that is easily understood.

The next few laws allow reasoning about “await”-style loops, as used in some lock implementations.

(4.16) \bodyαm\displaystyle\body{\alpha}{{{\textsc{m}}}} \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} \bodyαsc\displaystyle\body{\alpha}{{{\textsc{sc}}}}
(4.17) 𝐰𝐡𝐢𝐥𝐞cb𝐝𝐨𝐧𝐢𝐥\displaystyle\mathrel{\mathbf{while}}^{{{\textsc{c}}}}b\mathrel{\mathbf{do}}\mathop{\bf nil} \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} \bodybc;¬b\displaystyle\body{\llparenthesis b\rrparenthesis}{{{\textsc{c}}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis\neg b\rrparenthesis
(4.18) 𝗌𝗏(b)𝐰𝐡𝐢𝐥𝐞cb𝐝𝐨𝐧𝐢𝐥\displaystyle{\sf sv}(b)\neq\varnothing\hskip 5.69054pt\Rightarrow\hskip 5.69054pt\mathrel{\mathbf{while}}^{{{\textsc{c}}}}b\mathrel{\mathbf{do}}\mathop{\bf nil} \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} 𝐰𝐡𝐢𝐥𝐞scb𝐝𝐨𝐧𝐢𝐥\displaystyle\mathrel{\mathbf{while}}^{{{\textsc{sc}}}}b\mathrel{\mathbf{do}}\mathop{\bf nil}

Law 4.16 states that a sequence of repeated instructions in order, according to any model m, can be treated as executing in order. Using this property, and others like 𝐧𝐢𝐥;mcc\mathop{\bf nil}\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}c (we omit such trivial laws that do not involve reordering) we can deduce Law 4.18, that states that a spin-loop that polls a shared variable can be treated as if executed in strict order .

Lifting to commands

So far we have considered reduction laws that apply to relatively simple cases involving individual actions. Lifting the concepts to commands is nontrivial in general, that is, for c_1;c_2c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}2 there could be arbitrary dependencies between c_1c_{\_}1 and c_2c_{\_}2 which mean they partially overlap perhaps with pre- or post-sequences of non-overlapping instructions. Here associativity (Law 4.5) may help, allowing rearrangement of the text to split into sections, for instance,

(4.19) (c_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;c_2);(c_3;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;c_4)c_1𝐬𝐜 𝐟𝐞𝐧𝐜𝐞(c_2;c_3)𝐬𝐜 𝐟𝐞𝐧𝐜𝐞c_4(c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}2)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(c_{\_}3\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}4)~{}~{}\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}~{}~{}c_{\_}1\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot(c_{\_}2\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}3)\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot c_{\_}4

We now consider the cases where two commands are completely independent, and where one is always blocked by the other. Independence can be established by straightforwardly lifting \ext@arrow0055\Leftarrowfill@c\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}} from instructions to commands, using lifting conventions in Appendix A. The key property is that

(4.20) c\ext@arrow0055\Leftarrowfill@cdc;dcdc\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}d~{}~{}\Rightarrow~{}~{}c\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}d\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}c\parallel d

This follows from partial execution of the semantics: at no point is there an instruction within cc that prevents an instruction in dd from executing (a trivial case is where c=𝐧𝐢𝐥c=\mathop{\bf nil}).

To define the converse case, where cc always prevents dd from executing, consider the following. We write c×dc{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}d to indicate that at no point can dd reorder during the execution of cc (a trivial case is where c=𝖿𝗎𝗅𝗅𝖿𝗇𝖼c=\mathsf{full_{fnc}}). If c×dc{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}d then one can treat them as sequentially composed. We define these concepts with respect to the operational semantics.

(4.25) c×d\displaystyle c\mathrel{\ooalign{\hss{$\,\times$\hss}\cr{$\leftarrow$}}}d =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} α,d(dαd𝗏𝗂𝗌𝗂𝖻𝗅𝖾α)c/\ext@arrow0055\Leftarrowfill@cα\displaystyle\mathop{\mathstrut{\forall}}\nolimits\alpha,d^{\prime}\mathrel{\mathstrut{\bullet}}(d\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}d^{\prime}\mathrel{\mathstrut{\wedge}}{\sf visible}~{}\alpha)\Rightarrow c\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\alpha
(4.30) c×d\displaystyle c{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}d =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} t,c(c\ext@arrow0359\Rightarrowfill@tcc𝐧𝐢𝐥)c×d\displaystyle\mathop{\mathstrut{\forall}}\nolimits t,c^{\prime}\mathrel{\mathstrut{\bullet}}(c\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,t\,}$}}c^{\prime}\mathrel{\mathstrut{\wedge}}c^{\prime}\not\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}\mathop{\bf nil})\Rightarrow c^{\prime}\mathrel{\ooalign{\hss{$\,\times$\hss}\cr{$\leftarrow$}}}d

We write c×dc\mathrel{\ooalign{\hss{$\,\times$\hss}\cr{$\leftarrow$}}}d when all immediate next possible steps of dd are blocked by cc (4.25). Thus c×dc{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}d holds when, after any unfinished partial execution of cc via some trace tt resulting in cc^{\prime}, cc^{\prime} continues to block dd (4.30). We exclude from the set of partial executions the cases where execution is effectively finished, i.e., when cc^{\prime} is 𝐧𝐢𝐥\mathop{\bf nil} or equivalent (otherwise c×dc{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}d would never hold as the final 𝐧𝐢𝐥\mathop{\bf nil} allows reordering). From these we can derive:

(4.33) c×d\displaystyle c{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}d \displaystyle\Rightarrow c;dcd\displaystyle c\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}d\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}c\centerdot d

Such reduction may lift to more complex structures, for instance, the following law is useful for sequentialising loops when each iteration has no overlap with the preceding or succeeding ones.

(4.36) c×c\bodycm\bodycsc\displaystyle c{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}c\Rightarrow\body{c}{{{\textsc{m}}}}\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}\body{c}{{{\textsc{sc}}}}

We now show how the application of reduction laws to eliminate (elucidate) the allowed reorderings in terms of the familiar sequential and parallel composition enables the application of standard techniques for analysis.

5. Applying concurrent reasoning techniques

In this section we show how the reduction rules in the previous section can be used as the precursor to the application of already established techniques for proving properties of concurrent programs. For correct, real algorithms influenced by weak memory models there will be no reordering except where it does not violate whatever the desired property is (be that a postcondition, or a condition controlling allowed interference). In such circumstances our framework supports making the allowed reorderings explicit in the structure of the program, with a corresponding influence on the proofs of correctness. Where a violation of a desired property occurs due to reordering, the framework also supports the construction of the particular reordering that leads to the problematic behaviour.

The IMP+pseq language includes several aspects which do not exist in a standard imperative language, namely, fences and the ordering constraints that annotate variables and fences. We start with a simple syntactic notion of equivalence modulo these features, reducing a program in IMP+pseq into its underlying ‘plain’ equivalent (fences and ordering constraints have no effect on sequential semantics directly). We then explain how techniques such as Owicki-Gries and rely/guarantee may be applied.

5.1. Predicate transformer semantics and weakest preconditions

The action-trace semantics of Sect. 3.3 can be converted into a typical pairs-of-states semantics straightforwardly, as shown in Fig. 4. Let the type Σ\Sigma be the set of total mappings from variables to values, and let the effect function 𝖾𝖿𝖿:Instr𝑃(Σ×Σ){\sf eff}:Instr\rightarrow\mathop{\mathstrut{\mathbb P}}\nolimits(\Sigma\times\Sigma) return a relation on states given an instruction. We let ‘𝗂𝖽{\sf id}’ be the identity relation on states, and given a Boolean expression ee we write σe\sigma\in e if ee is 𝖳𝗋𝗎𝖾\mathsf{True} in state σ\sigma, and eσ{e}_{\sigma} for the evaluation of ee within state σ\sigma (note that ordering constraints are ignored for the purposes of evaluation). The effect of an assignment xocs:=e{x}^{ocs}\mathop{{:}{=}}e is a straightforward update of xx to the evaluation of ee (defn. (5.1)), where σ[x:=v]\sigma\left[x\mathop{{:}{=}}v\right] is σ\sigma overwritten so that xx maps to vv. A guard e\llparenthesis e\rrparenthesis is interpreted as a set of pairs of identical states that satisfy ee (defn. (5.2)), giving trivial cases 𝖾𝖿𝖿(τ)=𝗂𝖽{\sf eff}(\tau)={\sf id} and 𝖾𝖿𝖿(𝐦𝐚𝐠𝐢𝐜)={\sf eff}({\bf magic})=\varnothing. A fence 𝖿\mathsf{f} has no affect on the state (defn. (5.3)). Conceptually, mapping 𝖾𝖿𝖿{\sf eff} onto an action trace tt yields a sequence of relations corresponding to a set of sequences of pairs of states in a standard Plotkin-style treatment (Plotkin, 2004). We can lift 𝖾𝖿𝖿{\sf eff} to traces by composing such a sequence of relations, which is defined recursively in (5.7), and the effect of a command is given by the union of the effect of its traces (5.8).

(5.1) 𝖾𝖿𝖿(xocs:=e)\displaystyle{\sf eff}({x}^{ocs}\mathop{{:}{=}}e) =\displaystyle= {(σ,σ[x:=eσ])}\displaystyle\{(\sigma,\sigma\left[x\mathop{{:}{=}}{e}_{\sigma}\right])\}
(5.2) 𝖾𝖿𝖿(e)\displaystyle{\sf eff}(\llparenthesis e\rrparenthesis) =\displaystyle= {(σ,σ)|σe}\displaystyle\{(\sigma,\sigma)|\sigma\in e\}
(5.3) 𝖾𝖿𝖿(𝖿)\displaystyle{\sf eff}(\mathsf{f}) =\displaystyle= 𝗂𝖽\displaystyle{\sf id}
(5.4) 𝖾𝖿𝖿()\displaystyle{\sf eff}(\langle\rangle) =\displaystyle= 𝗂𝖽\displaystyle{\sf id}
(5.7) 𝖾𝖿𝖿(a#t)\displaystyle{\sf eff}(a\#t) =\displaystyle= 𝖾𝖿𝖿(α)o9𝖾𝖿𝖿(t)\displaystyle{\sf eff}(\alpha)\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil$\scriptscriptstyle\mathrm{o}$\hfil\cr\hfil$\scriptscriptstyle\mathrm{9}$\hfil}}}{\sf eff}(t)
(5.8) 𝖾𝖿𝖿(c)\displaystyle{\sf eff}(c) =\displaystyle= {𝖾𝖿𝖿(t)|tc}\displaystyle\bigcup\{{\sf eff}(t)|t\in\llbracket c\rrbracket\}
(5.9) 𝗐𝗉(c,q)\displaystyle{\sf wp}(c,q) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} {σ|σ(σ,σ)𝖾𝖿𝖿(c)σq}\displaystyle\{\sigma|\mathop{\mathstrut{\forall}}\nolimits\sigma^{\prime}\mathrel{\mathstrut{\bullet}}(\sigma,\sigma^{\prime})\in{\sf eff}(c)\Rightarrow\sigma^{\prime}\in q\}
Figure 4. Sequential semantics
\Description

TODO

The predicate transformer for weakest precondition semantics is given in defn. (5.9). A predicate is a set of states, so that given a command cc and predicate qq, 𝗐𝗉(c,q){\sf wp}(c,q) returns the set of (pre) states σ\sigma where every post-state σ\sigma^{\prime} related to σ\sigma by 𝖾𝖿𝖿(c){\sf eff}(c) satisfies qq (following, e.g., (Dijkstra and Scholten, 1990)). We define Hoare logic judgements with respect to this definition (note that we deal only with partial correctness as we consider only finite traces). From these definitions we can derive the standard rules of weakest preconditions and Hoare logic for commands such as nondeterministic choice and sequential composition, but there are no general compositional rules for parallelized sequential composition.

Trace refinement is related to these notions as follows.

Theorem 5.1 (Refinement preserves sequential semantics).

Assuming ccc\mathrel{\sqsubseteq}c^{\prime} then

𝖾𝖿𝖿(c)𝖾𝖿𝖿(c)and𝗐𝗉(c,q)𝗐𝗉(c,q){\sf eff}(c^{\prime})\subseteq{\sf eff}(c)\qquad and\qquad{\sf wp}(c,q)\Rightarrow{\sf wp}(c^{\prime},q)
Proof.

Straightforward from definitions.

The action-trace semantics can be converted into a typical pairs-of-states semantics straightforwardly, based on the effect function. The relationship with standard Plotkin style operational semantics (Plotkin, 2004) is straightforward, i.e.,

if cαcc\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}c^{\prime} and (σ,σ)𝖾𝖿𝖿(α)(\sigma,\sigma^{\prime})\in{\sf eff}(\alpha) then c,σc,σ\langle c,\sigma\rangle\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}}}{{\longrightarrow}}}\langle c^{\prime},\sigma^{\prime}\rangle

The advantage of our approach is that the syntax of the action α\alpha can be used to reason about allowed reorderings using Rule 3.18, whereas in general one cannot reconstruct or deduce an action from a pair of states.

5.2. Relating to standard imperative languages

The language IMP+pseq extends a typical imperative language with four aspects which allow it to consider weak memory models: ordering constraints on variables; fence instructions; parallelized sequential composition; and parallelized iteration. The first two have no direct effect on the values of variables (the sequential semantics essentially ignores them), while the second two affect the allowed traces and hence indirectly affect the values of variables. However, both parallelized sequential composition and parallelized iteration can be instantiated to correspond to usual notions of execution; hence we consider a plain subset of IMP+pseq which maps to a typical imperative program.

Definition 5.2 (Plain imperative programs).

A command cc of IMP+pseq is plain if:

  1. i

    all instances of parallelized sequential composition in cc are parameterised by either sc or par, i.e., correspond to standard sequential or parallel composition; and

  2. ii

    all instances of parallelized iteration in cc are parameterised by sc, i.e., loops are executed sequentially.

Definition 5.3 (Imperative syntax equivalence).

Given a plain command cc in IMP+pseq, a command cc^{\prime} in a standard imperative language (which does not contain memory ordering constraints on variables or fences), is imperative-syntax-equivalent to cc if cc and cc^{\prime} are structurally and syntactically equivalent except that:

  1. i

    all variable references in cc (which are of the form xocs{x}^{ocs} for some set of ordering constraints ocsocs) appear in cc^{\prime} as simply xx; and

  2. ii

    no fence instructions are present in cc^{\prime}, that is, they can be thought of as no-ops and removed.

We write cplaindc\overset{{plain}}{\approx}d if cc is imperative-syntax-equivalent to dd, or if there exists a cc^{\prime} where ccc\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}c^{\prime} and cc^{\prime} is imperative-syntax-equivalent to dd.

For example, the following two programs are imperative-syntax-equivalent (where ‘;\mathrel{\mathchar 24635\relax}’ in the program on the right should be interpreted as as standard sequential composition, i.e., ‘;sc\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{sc}}}}}$}}{\mathchar 24635\relax}$}}’ or ‘’ in this paper).

(5.10) xrel:=1;sc𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;scyrlx:=1;parr:=zacqplainx:=1;y:=1r:=z{x}^{{\textsc{rel}}}\mathop{{:}{=}}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{sc}}}}}$}}{\mathchar 24635\relax}$}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{sc}}}}}$}}{\mathchar 24635\relax}$}}{y}^{{\textsc{rlx}}}\mathop{{:}{=}}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{par}}}}}$}}{\mathchar 24635\relax}$}}r\mathop{{:}{=}}{z}^{{\textsc{acq}}}\quad\overset{{plain}}{\approx}\quad x\mathop{{:}{=}}1\mathrel{\mathchar 24635\relax}y\mathop{{:}{=}}1\parallel r\mathop{{:}{=}}z

The program on the left is plain because all instances of parallelized sequential composition correspond to sequential or parallel execution.

Note that there is no imperative-syntax-equivalent form unless all instances of parallelized sequential composition have been eliminated/reduced to sequential or parallel forms. A structurally typical case is the following, where cc reduces to a straightforward imperative-syntax-equivalent program.

(5.11) (α_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;α_2)(β_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;β_2)\displaystyle\hskip-14.22636pt(\alpha_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\alpha_{\_}2)\parallel(\beta_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta_{\_}2) =\displaystyle= (α_1𝐬𝐜 𝐟𝐞𝐧𝐜𝐞α_2)(β_1𝐬𝐜 𝐟𝐞𝐧𝐜𝐞β_2)\displaystyle(\alpha_{\_}1\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot\alpha_{\_}2)\parallel(\beta_{\_}1\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot\beta_{\_}2)
(5.12) plain\displaystyle\overset{{plain}}{\approx} (α_1α_2)(β_1β_2)\displaystyle(\alpha_{\_}1\centerdot\alpha_{\_}2)\parallel(\beta_{\_}1\centerdot\beta_{\_}2)

For reasoning purposes one can use the latter program rather than the former.666Strictly speaking we should not use α_1\alpha_{\_}1, etc., on both sides of plain\overset{{plain}}{\approx} because the plain version of α_1\alpha_{\_}1 does not contain ordering constraints on variables; for brevity we ignore the straightforward modifications required in this case. This is because for a plain program in IMP+pseq the semantics correspond to the usual semantic interpretation of imperative programs, ignoring ordering constraints and treating fences as no-ops.

Theorem 5.4 (Reduction to standard imperative constructs).

If cplaincc\overset{{plain}}{\approx}c^{\prime}, then 𝖾𝖿𝖿(c){\sf eff}(c) is exactly equal to the usual denotation of cc^{\prime} as a relation (or sequence of relations) on states; and hence any state-based property of cc based on 𝖾𝖿𝖿(c){\sf eff}(c) can be established for cc^{\prime} that uses a standard denotation for programs.

Proof.

From Defn. (5.3) and Fig. 4 we can see that i) ordering constraints have no effect on states; ii) fence instructions are equivalent to a no-op (in terms of states) and hence can be ignored; iii) Rule 3.18 reduces to the usual rule for sequential composition when reordering is never allowed, and for parallel composition when reordering is always allowed; and iv) Rule 3.17 and defn. (3.6) reduce to the usual sequential execution of a loop when instantiated with sc.

We introduce some syntax to help with describing examples.

Definition 5.5 (Plain interpretation).

Given a command cc in IMP+pseq, which need not be plain, we let c{c}^{-} be the plain interpretation of cc, that is, where i) all instances of parallelized sequential composition and parallelized iteration in cc are instantiated by sc, except for instances of parallelized sequential composition instantiated by par (corresponding to parallel composition) which remain as-is; and ii) fences in cc do not appear in c{c}^{-}.

Establishing properties about programs may proceed as outlined below. Assume for some program cc in IMP+pseq that its plain interpretation c{c}^{-} satisfies a property PP under the usual imperative program semantics according to some method MM. There are three approaches:

  1. (1)

    Reduction to plain equivalence. Show that the ordering constraints (due to variables or fences) within cc are such that it reduces equivalently to c{c}^{-} (ie, no reordering is possible within cc). Hence cc satisfies PP using MM in C.

  2. (2)

    Reduction to some plain form. Reduce cc equivalently to some plain command cc^{\prime} and apply MM (or some other known method) to c{c^{\prime}}^{-} to show that PP holds.

  3. (3)

    Reduce and deny. Refine cc to some plain command cc^{\prime} and apply MM (or some other known method) to c{c^{\prime}}^{-} to show that PP does not hold. This corresponds to finding some new behaviour (due to a new ordering of instructions) that breaks the original property PP; in this paper cc^{\prime} tends to be a particular path and we straightforwardly apply Hoare logic to deny the original property PP.

We now explain how standard notions of correctness can be applied in our framework.

5.3. Hoare logic

We define a Hoare triple using weakest preconditions (because we only consider finite traces we do not deal with potential non-termination).

{p}c{q}\displaystyle\{p\}\,c\,\{q\} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} p𝗐𝗉(c,q)\displaystyle p\Rightarrow{\sf wp}(c,q)

As with Theorem 5.1 refinement preserves Hoare-triple inferences.

(5.13) cc{p}c{q}{p}c{q}c\mathrel{\sqsubseteq}c^{\prime}\Rightarrow\{p\}\,c\,\{q\}\Rightarrow\{p\}\,c^{\prime}\,\{q\}

Given some plain cc^{\prime} that is equivalent to cc we can apply Hoare logic.

cplainc\displaystyle c\overset{{plain}}{\approx}c^{\prime} \displaystyle\Rightarrow {p}c{q}{p}c{q}\displaystyle\{p\}\,c\,\{q\}\Leftrightarrow\{p\}\,c^{\prime}\,\{q\}

Traditional Hoare logic is used for checking every final state satisfies a postcondition, but it also useful to consider reachable states, which can be defined using the conjugate pattern (see, e.g., (Hoare, 1978; Morgan, 1990; Winter et al., 2013); it is related to, but different from, O’Hearn’s concept of incorrectness logic (O’Hearn, 2019) (which is stronger except in the special case when post-state qq is 𝖥𝖺𝗅𝗌𝖾\mathsf{False})).

pcq\displaystyle\langle\!\langle p\rangle\!\rangle\,c\,\langle\!\langle q\rangle\!\rangle =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} ¬{p}c{¬q}\displaystyle\neg\{p\}\,c\,\{\neg q\}

Given these definitions we naturally derive the following properties in terms of a relational interpretation of programs.

(5.14) {p}c{q}\displaystyle\{p\}\,c\,\{q\} \displaystyle\Leftrightarrow (σ,σ)𝖾𝖿𝖿(c)σpσq\displaystyle\mathop{\mathstrut{\forall}}\nolimits(\sigma,\sigma^{\prime})\in{\sf eff}(c)\mathrel{\mathstrut{\bullet}}\sigma\in p\Rightarrow\sigma^{\prime}\in q
(5.15) pcq\displaystyle\langle\!\langle p\rangle\!\rangle\,c\,\langle\!\langle q\rangle\!\rangle \displaystyle\Leftrightarrow (σ,σ)𝖾𝖿𝖿(c)σpσq\displaystyle\mathop{\mathstrut{\exists}}\nolimits(\sigma,\sigma^{\prime})\in{\sf eff}(c)\mathrel{\mathstrut{\bullet}}\sigma\in p\mathrel{\mathstrut{\wedge}}\sigma^{\prime}\in q

Due to its familiarity we use Hoare logic for top-level specifications, although other choices could be made. However Hoare logic is of course lacking in the presence of concurrency (due to parallelism or reordering), and hence we later consider concurrent verification techniques. At the top level we use the following theorems to formally express our desired (or undesired) behaviours.

Theorem 5.6 (Verification).
(5.16) cc\displaystyle c\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}c^{\prime}\hskip 5.69054pt \displaystyle\Rightarrow {p}c{q}{p}c{q}\displaystyle\hskip 5.69054pt\{p\}\,c\,\{q\}\Leftrightarrow\{p\}\,c^{\prime}\,\{q\}
(5.17) cc{p}c{¬q}\displaystyle c\mathrel{\sqsubseteq}c^{\prime}\mathrel{\mathstrut{\wedge}}\{p\}\,c^{\prime}\,\{\neg q\}\hskip 5.69054pt \displaystyle\Rightarrow ¬{p}c{q}\displaystyle\hskip 5.69054pt\neg\{p\}\,c\,\{q\}
(5.18) cc{p}c{q}\displaystyle c\mathrel{\sqsubseteq}c^{\prime}\mathrel{\mathstrut{\wedge}}\{p\}\,c^{\prime}\,\{q\}\hskip 5.69054pt \displaystyle\Rightarrow pcq\displaystyle\hskip 5.69054pt\langle\!\langle p\rangle\!\rangle\,c\,\langle\!\langle q\rangle\!\rangle
Proof.

All are automatic from defn. (3.21) and (5.14) and (5.15).

Theorem 5.16 allows properties of a reduced program (cc^{\prime}) to carry over to the original program (cc). Alternatively by Theorem 5.17 if any behaviour (cc^{\prime}) is found to violate a property then that property cannot hold for the original (cc). Finally by Theorem 5.18 if any behaviour (cc^{\prime}) is found to satisfy a property then it is a possible behaviour of the original (cc).

5.4. Owicki-Gries

The Owicki-Gries method (Owicki and Gries, 1976) can be used to prove a top-level Hoare-triple specification of parallel program cc. If cc involves reordering, then reducing cc to some plain form allows the application of Owicki-Gries; several examples of this approach appear in (Colvin, 2021a).

For instance, recalling (5.11), given a program of the form (α_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;α_2)(β_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;β_2)(\alpha_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\alpha_{\_}2)\parallel(\beta_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta_{\_}2) for which one wants to establish some property specified as a Hoare triple, one can use Owicki-Gries on the plain program (α_1α_2)(β_1β_2)(\alpha_{\_}1\centerdot\alpha_{\_}2)\parallel(\beta_{\_}1\centerdot\beta_{\_}2) by Theorem 5.4: the fences enforce program order, but can be ignored from the perspective of correctness.

Now consider the following slightly more complex case with nested parallelism, where one process uses a form of critical section. Assume α\ext@arrow0055\Leftarrowfill@cβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta.

c\displaystyle c =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} (γ_1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;α;β;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;γ_2)(γ_3;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;γ_4)\displaystyle(\gamma_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\gamma_{\_}2)\parallel(\gamma_{\_}3\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\gamma_{\_}4)
\displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (γ_1𝐬𝐜 𝐟𝐞𝐧𝐜𝐞(αβ)𝐟𝐞𝐧𝐜𝐞γ_2)(γ_3𝐬𝐜 𝐟𝐞𝐧𝐜𝐞γ_4)\displaystyle(\gamma_{\_}1\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot(\alpha\parallel\beta)\centerdot\mathbf{fence}\centerdot\gamma_{\_}2)\parallel(\gamma_{\_}3\centerdot\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\centerdot\gamma_{\_}4)
plain\displaystyle\overset{{plain}}{\approx} (γ_1(αβ)γ_2)(γ_3γ_4)\displaystyle(\gamma_{\_}1\centerdot(\alpha\parallel\beta)\centerdot\gamma_{\_}2)\parallel(\gamma_{\_}3\centerdot\gamma_{\_}4)

Unfortunately the Owicki-Gries method is not directly applicable due to the nested parallelism, but in this simple case we can use the fact that execution of two (atomic) actions in parallel means that either order can be chosen.

(5.20) αβ(αβ)(βα)\alpha\parallel\beta~{}~{}\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}~{}~{}(\alpha\centerdot\beta)\sqcap(\beta\centerdot\alpha)

Given an Owicki-Gries rule for nondeterministic choice this can be reasoned about directly, or alternatively we can lift this choice to the top level. Continuing from (5.4):

(5.21) \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (γ_1αβγ_2)(γ_3γ_4)(γ_1βαγ_2)(γ_3γ_4)\displaystyle(\gamma_{\_}1\centerdot\alpha\centerdot\beta\centerdot\gamma_{\_}2)\parallel(\gamma_{\_}3\centerdot\gamma_{\_}4)\sqcap(\gamma_{\_}1\centerdot\beta\centerdot\alpha\centerdot\gamma_{\_}2)\parallel(\gamma_{\_}3\centerdot\gamma_{\_}4)

Now Owicki-Gries can be applied to both top-level possibilities. Clearly this is not desirable in general, however, and in the next section we show how the compositional rely/guarantee method can be used to handle nested parallelism.

5.5. Rely/guarantee

Rely/guarantee (Jones, 1983a, b) (see also (Coleman and Jones, 2007; Hayes et al., 2012, 2014)) is a compositional proof method for parallel programs. A rely/guarantee quintuple {p,r}c{g,q}\{p,r\}\,c\,\{g,q\} states that program cc establishes postcondition qq provided it is executed from a state satisfying pp and within an environment that satisfies (rely) relation rr on each step; in addition cc also satisfies (guarantee) relation gg on each of its own steps. For instance, {x=0,xx}c{y=y,x10}\{x=0,x\leq x^{\prime}\}\,c\,\{y^{\prime}=y,x\geq 10\} states that cc establishes x10x\geq 10 when it finishes execution, and guarantees that it will not modify yy, provided initially x=0x=0 and the environment only ever increases xx. In the relations above we have used the common convention in relations that primed variables (xx^{\prime}) refer to their value in the post-state and unprimed variables (xx) refer to their value in the pre-state. A top-level Hoare-triple specification {p}c{q}\{p\}\,c\,\{q\} can be related to a rely/guarantee quintuple by noting {p,𝗂𝖽}c{𝖳𝗋𝗎𝖾,q}{p}c{q}\{p,{\sf id}\}\,c\,\{\mathsf{True},q\}\Rightarrow\{p\}\,c\,\{q\}, that is, if pp holds initially in some top-level context that does not modify any variables, then cc establishes qq (with the weakest possible guarantee).

Reasoning using the rely/guarantee method is compositional over parallel conjunction; for instance, without going into detail about rely/guarantee inference rules, consider the plain program from (5.4). We can show a rely/guarantee quintuple holds provided we can find rely/guarantee relations that control the communication between the two subprocesses; we abstract from these nested relations using different names.

{p,r}c{g,q}\displaystyle\{p,r\}\,c\,\{g,q\} \displaystyle\Leftrightarrow {p,r}(γ_1(αβ)γ_2)(γ_3γ_4){g,q}(by (5.4) and Theorem 5.4)\displaystyle\{p,r\}\,(\gamma_{\_}1\centerdot(\alpha\parallel\beta)\centerdot\gamma_{\_}2)\parallel(\gamma_{\_}3\centerdot\gamma_{\_}4)\,\{g,q\}\quad\mbox{(by (\ref{eqn:rg-example}) and Theorem~{}\ref{theorem:imp-syntax-equiv})}
\displaystyle\Leftarrow {p,r_1}(γ_1(αβ)γ_2){g_1,q_1}{p,r_2}(γ_3γ_4){g_2,q_2}\displaystyle\{p,r_{\_}1\}\,(\gamma_{\_}1\centerdot(\alpha\parallel\beta)\centerdot\gamma_{\_}2)\,\{g_{\_}1,q_{\_}1\}\mathrel{\mathstrut{\wedge}}\{p,r_{\_}2\}\,(\gamma_{\_}3\centerdot\gamma_{\_}4)\,\{g_{\_}2,q_{\_}2\}
\displaystyle\Leftarrow {p,r_1}γ_1{g_1,q_1}{q_1,r_1}αβ{g_1,q_1′′}{q_1′′,r_1}γ_2{g_1,q_1}\displaystyle\{p,r_{\_}1\}\,\gamma_{\_}1\,\{g_{\_}1,q_{\_}1^{\prime}\}\mathrel{\mathstrut{\wedge}}\{q_{\_}1^{\prime},r_{\_}1\}\,\alpha\parallel\beta\,\{g_{\_}1,q_{\_}1^{\prime\prime}\}\mathrel{\mathstrut{\wedge}}\{q_{\_}1^{\prime\prime},r_{\_}1\}\,\gamma_{\_}2\,\{g_{\_}1,q_{\_}1\}\mathrel{\mathstrut{\wedge}}
{p,r_2}(γ_3γ_4){g_2,q_2}\displaystyle\{p,r_{\_}2\}\,(\gamma_{\_}3\centerdot\gamma_{\_}4)\,\{g_{\_}2,q_{\_}2\}

This is straightforward application of standard rely/guarantee inference rules (where we leave the format of the predicates and relations unspecified). Reasoning may proceed from this point, in particular, analysing the quintuple containing the nested parallel composition αβ\alpha\parallel\beta. The question becomes whether the guarantee and the intermediate relation q_1′′q_{\_}1^{\prime\prime} is satisfied regardless of the order that α\alpha and β\beta are executed, and if not, a fence or ordering constraints will need to be introduced to enforce order and eliminate the undesirable proof obligation. While completing the proof may or may not be straightforward, the point is it will be matter of the specifics of the example in question, and not of a tailor-made inference system to manage a complex semantic representation. The initial work (as shown in (5.4)) is to elucidate the allowed reorderings as either sequential or parallel composition.

5.6. Linearisability

Linearisability is a correctness condition for concurrent objects (Herlihy and Wing, 1990), which is not an “end-to-end” property such as Hoare triples and rely/guarantee quintuples, but rather requires that operations on some data structure appear to take place atomically, with weakened liveness guarantees. The following abstract program 𝚘𝚙{\tt op} follows a common pattern for operations on lock-free, linearisable data structures (in this case xx), where there may be other processes also executing 𝚘𝚙{\tt op}.

(5.22) 𝐫𝐞𝐩𝐞𝐚𝐭r_1:=x;r_2:=f(r_1);b:=𝐜𝐚𝐬(x,r_1,r_2);𝐮𝐧𝐭𝐢𝐥b\begin{array}[]{l}\mathop{\mathbf{repeat}}\\ \hskip 20.00003ptr_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\\ \hskip 20.00003ptr_{\_}2\mathop{{:}{=}}f(r_{\_}1)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\\ \hskip 20.00003ptb\mathop{{:}{=}}\mathbf{cas}(x,r_{\_}1,r_{\_}2)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\\ \mathop{\mathbf{until}}b\end{array}

The algorithm repeatedly reads the value of xx (into local variable r_1r_{\_}1), calculates a new value for xx (applying some function ff, and storing the result in local variable r_2r_{\_}2), and attempts to atomically update xx to r_2r_{\_}2; if interference is detected (xx has changed since being read into r_1r_{\_}1) then the loop is retried, otherwise the operation may complete.

The natural dependencies arising from the above structure naturally maintain order, that is, r_1:=xr_2:=f(r_1)b:=𝐜𝐚𝐬(x,r_1,r_2)br_{\_}1\mathop{{:}{=}}x\rightsquigarrow r_{\_}2\mathop{{:}{=}}f(r_{\_}1)\rightsquigarrow b\mathop{{:}{=}}\mathbf{cas}(x,r_{\_}1,r_{\_}2)\rightsquigarrow\llparenthesis b\rrparenthesis, hence 𝚘𝚙plain𝚘𝚙{\tt op}\overset{{plain}}{\approx}{{\tt op}}^{-}, and any reasoning about 𝚘𝚙{\tt op} executing under sequential consistency – in a vacuum – applies to the C version of 𝚘𝚙{\tt op}. (Other algorithms may of course have fewer dependencies which will manifest as more complex parallel structures, which must be elucidated via reduction.) Alternative approaches to reasoning about linearisability under weak memory models typically modify the original definition in some way to account for reorderings (Derrick and Smith, 2017; Derrick et al., 2014).

However when considering a calling context reordering must also be taken into account. That is, assume cc is some program which contains a call to 𝚘𝚙{\tt op}, and that cc is just one of many parallel processes which may be calling 𝚘𝚙{\tt op}. Multiple calls to 𝚘𝚙{\tt op} (or other operations on xx) within cc are unlikely to cause a problem because the natural dependencies on xx will prevent problematic reordering; however one must consider the possibility of unrelated instructions in cc reordering with (internal) instructions of 𝚘𝚙{\tt op}. For instance, consider a case where xx implements a queue and 𝚘𝚙{\tt op} places a new value into the queue, and within cc a flag ff is set immediately after calling 𝚘𝚙{\tt op} to indicate the enqueue has happened, i.e.,

c=^𝚘𝚙();flag:=𝖳𝗋𝗎𝖾c\mathrel{\mathstrut{\widehat{=}}}{\tt op}()\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}flag\mathop{{:}{=}}\mathsf{True}

The assignment flag:=𝖳𝗋𝗎𝖾flag\mathop{{:}{=}}\mathsf{True} can be reordered with instructions within 𝚘𝚙{\tt op} and destroy any reasoning that uses flagflag to determine when 𝚘𝚙{\tt op} has been called. Placing a release constraint on the flag update resolves the issue (which can be shown straightforwardly in our framework since 𝚘𝚙();flagrel:=𝖳𝗋𝗎𝖾𝚘𝚙()flagrel:=𝖳𝗋𝗎𝖾{\tt op}()\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}{flag}^{{\textsc{rel}}}\mathop{{:}{=}}\mathsf{True}\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}{\tt op}()\centerdot{flag}^{{\textsc{rel}}}\mathop{{:}{=}}\mathsf{True}, or in other words, cplaincc\overset{{plain}}{\approx}{c}^{-}), but more generally this leaves a question about how to capture dependencies within the specification of 𝚘𝚙{\tt op}. Hence, while one can argue that 𝚘𝚙{\tt op} is “linearisable” in the sense that if it is executed in parallel with other linearisable operations on xx it will operate correctly, whether or not a calling context cc works correctly will still depend on reordering (Batty et al., 2013; Smith and Groves, 2020); this can also be addressed in our framework, using reduction and applying an established technique (e.g., (Schellhorn et al., 2014; Smith, 2016; Filipović et al., 2010))

6. Examples

In this section we explore the behaviour of small illustrative examples from the literature, specifically classic “litmus test” patterns that readily show the weak behaviours of memory models, and also examples taken from the C standard.

To save space we define the following initialisation predicate for x,y,x,y,\ldots a list of variables.

(6.1) 0_x,y,=^x=0y=00_{\_}{x,y,\ldots}\mathrel{\mathstrut{\widehat{=}}}x=0\mathrel{\mathstrut{\wedge}}y=0\mathrel{\mathstrut{\wedge}}\ldots

6.1. Message passing pattern

Consider the message passing (“MP”) communication pattern.

(6.2) 𝚖𝚙=^(x:=1;flag:=1)(f:=flag;r:=x){\tt mp}\mathrel{\mathstrut{\widehat{=}}}(x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}flag\mathop{{:}{=}}1)\parallel(f\mathop{{:}{=}}flag\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}x)

The question is whether in the final state that f=1r=1f=1\Rightarrow r=1, i.e., if the “flag” is observed to have been set, can one assume that the “data” (xx) has been transferred? This is of course expected under a plain interpretation (Defn. (5.5)).

Theorem 6.1 (Message passing under sequential consistency).
(6.3) {0_x,y,r,f}𝚖𝚙{f=1r=1}\{0_{\_}{x,y,r,f}\}\,{{\tt mp}}^{-}\,\{f=1\Rightarrow r=1\}
Proof.

The proof is checked in Isabelle/HOL using Nieto’s encoding of rely/guarantee inference (Nieto, 2003); the key part of the proof is the left process guarantees x=0flag=0x=0\Rightarrow flag=0 and flag=1x=1flag=1\Rightarrow x=1 (translated into relational form).

If we naively code this pattern in C this property no longer holds.

Theorem 6.2 (Naive message passing fails under c ).
¬{0_x,y,r,f}𝚖𝚙{f=1r=1}\neg\{0_{\_}{x,y,r,f}\}\,{\tt mp}\,\{f=1\Rightarrow r=1\}
Proof.

By definition of c (Model 2.6) we have both (x:=1)\ext@arrow0055\Leftarrowfill@c(flag:=1)(x\mathop{{:}{=}}1)\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}(flag\mathop{{:}{=}}1) and (f:=flag)\ext@arrow0055\Leftarrowfill@c(r:=x)(f\mathop{{:}{=}}flag)\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}(r\mathop{{:}{=}}x).

(6.4) 𝚖𝚙\displaystyle{\tt mp} =^(x:=1;flag:=1)(f:=flag;r:=x)\displaystyle\mathrel{\mathstrut{\widehat{=}}}(x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}flag\mathop{{:}{=}}1)\parallel(f\mathop{{:}{=}}flag\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}x) defn. (6.2)
(6.5) x:=1flag:=1f:=flagr:=x\displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}x\mathop{{:}{=}}1\parallel flag\mathop{{:}{=}}1\parallel f\mathop{{:}{=}}flag\parallel r\mathop{{:}{=}}x by Law 4.9
(6.6) flag:=1f:=flagr:=xx:=1\displaystyle\mathrel{\sqsubseteq}flag\mathop{{:}{=}}1\centerdot f\mathop{{:}{=}}flag\centerdot r\mathop{{:}{=}}x\centerdot x\mathop{{:}{=}}1 by Law 4.3

All instructions effectively execute in parallel; in the final step we have picked one particular interleaving that breaks the expected postcondition, that is,

(6.7) {0_x,y,r,f}flag:=1f:=flagr:=xx:=1{f=1r=0}\{0_{\_}{x,y,r,f}\}\,flag\mathop{{:}{=}}1\centerdot f\mathop{{:}{=}}flag\centerdot r\mathop{{:}{=}}x\centerdot x\mathop{{:}{=}}1\,\{f=1\mathrel{\mathstrut{\wedge}}r=0\}

Since f=1r=0¬(f=1r=1)f=1\mathrel{\mathstrut{\wedge}}r=0\Rightarrow\neg(f=1\Rightarrow r=1) we complete the proof by Theorem 5.17.

C’s release/acquire atomics are the recommended way to instrument message passing; we make this explicit below using release-acquire constraints on flagflag (leaving xx relaxed).

(6.8) 𝚖𝚙RA\displaystyle{\tt mp}^{RA} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} (x:=1;flagrel:=1)(f:=flagacq;r:=x)\displaystyle(x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}{flag}^{{\textsc{rel}}}\mathop{{:}{=}}1)\parallel(f\mathop{{:}{=}}{flag}^{{\textsc{acq}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}x)

The new constraints prevent reordering in each branch and therefore the expected outcome is reached.

Theorem 6.3 (Release/acquire message passing maintains sequential consistency).
(6.9) {0_x,y,r_1,r_2}𝚖𝚙RA{f=1r=1}\{0_{\_}{x,y,r_{\_}1,r_{\_}2}\}\,{\tt mp}^{RA}\,\{f=1\Rightarrow r=1\}
Proof.

By definition of c we have both (x:=1)/\ext@arrow0055\Leftarrowfill@c(yrel:=1)(x\mathop{{:}{=}}1)\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}({y}^{{\textsc{rel}}}\mathop{{:}{=}}1) and (r_1:=yacq)/\ext@arrow0055\Leftarrowfill@c(r:=x)(r_{\_}1\mathop{{:}{=}}{y}^{{\textsc{acq}}})\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}(r\mathop{{:}{=}}x). Hence

(6.10) 𝚖𝚙RA=(x:=1yrel:=1)(r_1:=yacqr:=x)plain𝚖𝚙{\tt mp}^{RA}\hskip 5.69054pt=\hskip 5.69054pt(x\mathop{{:}{=}}1\centerdot{y}^{{\textsc{rel}}}\mathop{{:}{=}}1)\parallel(r_{\_}1\mathop{{:}{=}}{y}^{{\textsc{acq}}}\centerdot r\mathop{{:}{=}}x)\hskip 5.69054pt\overset{{plain}}{\approx}\hskip 5.69054pt{{\tt mp}}^{-}

The proof follows immediately from Theorems 6.1 and 5.4.

An alternative approach to restoring order is to insert fences, e.g.,

(6.11) (x:=1;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;flag:=1)(f:=flag;𝐬𝐜 𝐟𝐞𝐧𝐜𝐞;r:=x)\displaystyle(x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}flag\mathop{{:}{=}}1)\parallel(f\mathop{{:}{=}}flag\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{sc\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}x)
(6.12) (x:=1;𝐫𝐞𝐥 𝐟𝐞𝐧𝐜𝐞;flag:=1)(f:=flag;𝐚𝐜𝐪 𝐟𝐞𝐧𝐜𝐞;r:=x)\displaystyle(x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{rel\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}flag\mathop{{:}{=}}1)\parallel(f\mathop{{:}{=}}flag\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\mathbf{acq\leavevmode\vbox{\hrule width=3.99994pt}fence}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}x)

No reordering is possible within each thread and hence these cases also reduce to 𝚖𝚙{{\tt mp}}^{-}. We emphasise that these proofs are as straightforward as one would expect: the programmer has enforced order and so properties of the sequential version of the program carry over. There is no need to appeal to complex global data types that maintain information about fences, orderings, etc..

6.2. Test-and-set lock

We now consider the application of the framework to more realistic code, in this case a lock implementation taken from (Herlihy and Shavit, 2011)(Sect. 7.3). Conceptually the shared lock \ell is represented as a boolean which is 𝖳𝗋𝗎𝖾\mathsf{True} when some process holds the lock, and 𝖥𝖺𝗅𝗌𝖾\mathsf{False} otherwise.

(6.13) 𝐫𝐞𝐩𝐞𝐚𝐭mc𝐮𝐧𝐭𝐢𝐥b\displaystyle{\mathop{\mathbf{repeat}}}^{{{\textsc{m}}}}~{}c\mathop{\mathbf{until}}b =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} c;m\body(¬b;mc)m;mb\displaystyle c\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}\body{(\llparenthesis\neg b\rrparenthesis\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c)}{{{\textsc{m}}}}\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}\llparenthesis b\rrparenthesis
(6.14) r:=x.getAndSet(v)\displaystyle r\mathop{{:}{=}}x.getAndSet(v) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} r:=xacq,xrel:=v\displaystyle\langle r\mathop{{:}{=}}{x}^{{\textsc{acq}}}~{},~{}{x}^{{\textsc{rel}}}\mathop{{:}{=}}v\rangle
(6.15) 𝚕𝚘𝚌𝚔()\displaystyle{\tt lock()} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝐫𝐞𝐩𝐞𝐚𝐭ctaken:=.getAndSet(𝖳𝗋𝗎𝖾)𝐮𝐧𝐭𝐢𝐥¬taken\displaystyle{\mathop{\mathbf{repeat}}}^{{{\textsc{c}}}}~{}taken\mathop{{:}{=}}\ell.getAndSet(\mathsf{True})\mathop{\mathbf{until}}\neg taken
(6.16) 𝚞𝚗𝚕𝚘𝚌𝚔()\displaystyle{\tt unlock()} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} rel:=𝖥𝖺𝗅𝗌𝖾\displaystyle{\ell}^{{\textsc{rel}}}\mathop{{:}{=}}\mathsf{False}

A “repeat-until” command 𝐫𝐞𝐩𝐞𝐚𝐭mc𝐮𝐧𝐭𝐢𝐥b{\mathop{\mathbf{repeat}}}^{{{\textsc{m}}}}~{}c\mathop{\mathbf{until}}b repeatedly executes cc (at least once) until bb is true, under memory model m, which we encode using parallelized iteration (defn. (6.13)). A “get-and-set” command r:=x.getAndSet(e)r\mathop{{:}{=}}x.getAndSet(e) returns the initial value of xx into rr and updates xx to the value of ee, as a single step (defn. (6.14)). The load of xx is defined as an acq access and the update is a rel write. Using these the concurrent 𝚕𝚘𝚌𝚔(){\tt lock()} procedure is defined to repeatedly set \ell to 𝖳𝗋𝗎𝖾\mathsf{True}, and finish when a 𝖥𝖺𝗅𝗌𝖾\mathsf{False} value for \ell is read. If the lock is already held by another process then the get-and-set has no effect and the loop continues, until the case where \ell is 𝖥𝖺𝗅𝗌𝖾\mathsf{False} (as recorded in the local variable takentaken), when \ell is updated to 𝖳𝗋𝗎𝖾\mathsf{True} to indicate the current process now has the lock. Unlocking is implemented by simply setting \ell to 𝖥𝖺𝗅𝗌𝖾\mathsf{False}.

We have the following general rule for spin loops of the form in lock().

(6.17) αb𝐫𝐞𝐩𝐞𝐚𝐭cα𝐮𝐧𝐭𝐢𝐥b𝐫𝐞𝐩𝐞𝐚𝐭scα𝐮𝐧𝐭𝐢𝐥b\alpha\rightsquigarrow\llparenthesis b\rrparenthesis\quad\Rightarrow\quad{\mathop{\mathbf{repeat}}}^{{{\textsc{c}}}}~{}\alpha\mathop{\mathbf{until}}b\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}{\mathop{\mathbf{repeat}}}^{{{\textsc{sc}}}}~{}\alpha\mathop{\mathbf{until}}b

Intuitively, since each iteration of the loop updates a variable that is then checked in the guard, no reordering is possible, and it is as if the loop is executed in sequential order. This holds by the following reasoning.

  • 𝐫𝐞𝐩𝐞𝐚𝐭cα𝐮𝐧𝐭𝐢𝐥b\displaystyle{\mathop{\mathbf{repeat}}}^{{{\textsc{c}}}}~{}\alpha\mathop{\mathbf{until}}b

    =^\displaystyle\mathrel{\mathstrut{\widehat{=}}}  defn. (6.13)
    α;\body(¬b;α)c;b\displaystyle\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\body{(\llparenthesis\neg b\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\alpha)}{{{\textsc{c}}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis b\rrparenthesis

    \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}  Using Law 4.8 by assumption αb\alpha\rightsquigarrow\llparenthesis b\rrparenthesis (and hence also α¬b\alpha\rightsquigarrow\llparenthesis\neg b\rrparenthesis)
    α;\body(¬bα)c;b\displaystyle\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\body{(\llparenthesis\neg b\rrparenthesis\centerdot\alpha)}{{{\textsc{c}}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis b\rrparenthesis

    \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}  Using Law 4.36, since αb\alpha\rightsquigarrow\llparenthesis b\rrparenthesis implies (¬bα)×(¬bα)(\llparenthesis\neg b\rrparenthesis\centerdot\alpha){\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}(\llparenthesis\neg b\rrparenthesis\centerdot\alpha).
    α;\body(¬bα)sc;b\displaystyle\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\body{(\llparenthesis\neg b\rrparenthesis\centerdot\alpha)}{{{\textsc{sc}}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis b\rrparenthesis

    \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}  Similarly using Law 4.33 and α×(¬bα)×b\alpha{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}(\llparenthesis\neg b\rrparenthesis\centerdot\alpha){\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}\llparenthesis b\rrparenthesis.
    α\body(¬bα)scb\displaystyle\alpha\centerdot\body{(\llparenthesis\neg b\rrparenthesis\centerdot\alpha)}{{{\textsc{sc}}}}\centerdot\llparenthesis b\rrparenthesis

    =^\displaystyle\mathrel{\mathstrut{\widehat{=}}}  defn. (6.13)
    𝐫𝐞𝐩𝐞𝐚𝐭scα𝐮𝐧𝐭𝐢𝐥b\displaystyle{\mathop{\mathbf{repeat}}}^{{{\textsc{sc}}}}~{}\alpha\mathop{\mathbf{until}}b

Recalling that 𝚕𝚘𝚌𝚔(){{\tt lock()}}^{-} is the plain version of 𝚕𝚘𝚌𝚔(){\tt lock()}, i.e., ignoring the ordering constraints and using sequential composition only, we have the following.

Theorem 6.4.

𝚕𝚘𝚌𝚔()plain𝚕𝚘𝚌𝚔(){\tt lock()}\overset{{plain}}{\approx}{{\tt lock()}}^{-}

Proof.

𝚕𝚘𝚌𝚔(){\tt lock()} reduces to sequential form by taken:=l.getAndSet(𝖳𝗋𝗎𝖾)takentaken\mathop{{:}{=}}l.getAndSet(\mathsf{True})\rightsquigarrow\llparenthesis taken\rrparenthesis and Law 6.17, which is imperative-syntax-equivalent to 𝚕𝚘𝚌𝚔(){{\tt lock()}}^{-} by Defn. (5.3).

Therefore this c implementation is ‘correct’ if the original is. However, as discussed in Sect. 5.6, one must consider the calling context. Theorem 6.4 does not depend on the ordering constraints in the definition of lock(), as its natural data dependencies maintain order. However this does not also guarantee that for instance, 𝚕𝚘𝚌𝚔();c;𝚞𝚗𝚕𝚘𝚌𝚔()𝚕𝚘𝚌𝚔()c𝚞𝚗𝚕𝚘𝚌𝚔(){\tt lock()}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}{\tt unlock()}\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}{\tt lock()}\centerdot c\centerdot{\tt unlock()}, which may be a desirable property if the behaviour of cc assumes the lock is held. By defn. (6.15) we have 𝚕𝚘𝚌𝚔()={rel,acq}\lceil{\tt lock()}\rceil=\{{\textsc{rel}},{\textsc{acq}}\} and 𝚞𝚗𝚕𝚘𝚌𝚔()={rel}\lceil{\tt unlock()}\rceil=\{{\textsc{rel}}\} which in many cases will be enough to allow sequential reasoning in the calling context; alternatively fences could be inserted around cc.

6.3. Out-of-thin-air behaviours

We now turn our attention to the “out of thin air” problem, where some memory model specifications allow values to be assigned where those values do not appear in the program. Firstly consider the following program, which appears in the 𝙲{\tt C} standard.

(6.18) 𝗈𝗈𝗍𝖺\displaystyle\mathsf{oota} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} r_1:=x;(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=42)r_2:=y;(𝐢𝐟cr_2=42𝐭𝐡𝐞𝐧x:=42)\displaystyle r_{\_}1\mathop{{:}{=}}x\mathrel{\mathchar 24635\relax\;}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}42)~{}\parallel~{}r_{\_}2\mathop{{:}{=}}y\mathrel{\mathchar 24635\relax\;}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}2=42\mathrel{\mathbf{then}}x\mathop{{:}{=}}42)

Under a plain interpretation neither store ever happens: one of the loads must occur, and the subsequent test fail, first, preventing the condition in the other process from succeeding.

Theorem 6.5.

{0_x,y,r_1,r_2}𝗈𝗈𝗍𝖺{x=0y=0}\{0_{\_}{x,y,r_{\_}1,r_{\_}2}\}\,{\mathsf{oota}}^{-}\,\{x=0\mathrel{\mathstrut{\wedge}}y=0\}.

Proof.

Trivial using Owicki-Gries reasoning, checked in Isabelle/HOL (Nipkow and Nieto, 1999).

However, this behaviour is allowed under the C memory model according to the specification (although compiler writers are discouraged from implementing it!). The behaviour is allowed in our framework, that is, it is possible for both r_1r_{\_}1 and r_2r_{\_}2 to read 42. This is because (unlike hardware memory models (Colvin, 2021b)) we allow stores to come before guards (via the \ext@arrow0055\Leftarrowfill@g\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{g}}}}}} relation, e.g., (2.33)).

Theorem 6.6.

0_x,y,r_1,r_2𝗈𝗈𝗍𝖺x=42y=42\langle\!\langle 0_{\_}{x,y,r_{\_}1,r_{\_}2}\rangle\!\rangle\,\mathsf{oota}\,\langle\!\langle x=42\mathrel{\mathstrut{\wedge}}y=42\rangle\!\rangle.

Proof.

We have r_1=42\ext@arrow0055\Leftarrowfill@cy:=42\llparenthesis r_{\_}1=42\rrparenthesis\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}y\mathop{{:}{=}}42, and similarly for xx, hence

  • r_1:=x;(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=42)\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}42)

    \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}  Defn. (3.7)
    r_1:=x;(r_1=42;y:=42)r_142\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\llparenthesis r_{\_}1=42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}42)\sqcap\llparenthesis r_{\_}1\neq 42\rrparenthesis

    \displaystyle\mathrel{\sqsubseteq}  Law 4.2
    r_1:=x;r_1=42;y:=42\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis r_{\_}1=42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}42

    \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}  Law 4.9
    r_1:=x;(r_1=42y:=42)\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\llparenthesis r_{\_}1=42\rrparenthesis\parallel y\mathop{{:}{=}}42)

    \displaystyle\mathrel{\sqsubseteq}  Law 4.3 (taking the right-hand action); Law 4.10 from r_1:=x\ext@arrow0055\Leftarrowfill@cy:=42r_{\_}1\mathop{{:}{=}}x\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}y\mathop{{:}{=}}42
    y:=42r_1:=xr_1=42\displaystyle y\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot\llparenthesis r_{\_}1=42\rrparenthesis

The second process reduces similarly to x:=42r_2:=yr_2=42x\mathop{{:}{=}}42\centerdot r_{\_}2\mathop{{:}{=}}y\centerdot\llparenthesis r_{\_}2=42\rrparenthesis. Interleaving the two processes (Law 4.3) gives the following reduction to a particular execution.

𝗈𝗈𝗍𝖺\displaystyle\mathsf{oota} \displaystyle\mathrel{\sqsubseteq} (y:=42r_1:=xr_1=42)(x:=42r_2:=yr_2=42)\displaystyle(y\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot\llparenthesis r_{\_}1=42\rrparenthesis)\parallel(x\mathop{{:}{=}}42\centerdot r_{\_}2\mathop{{:}{=}}y\centerdot\llparenthesis r_{\_}2=42\rrparenthesis)
\displaystyle\mathrel{\sqsubseteq} y:=42x:=42r_1:=xr_2:=yr_1=42r_2=42\displaystyle y\mathop{{:}{=}}42\centerdot x\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot r_{\_}2\mathop{{:}{=}}y\centerdot\llparenthesis r_{\_}1=42\rrparenthesis\centerdot\llparenthesis r_{\_}2=42\rrparenthesis

Straightforward sequential reasoning gives the following.

(6.19) {0_x,y,r_1,r_2}y:=42x:=42r_1:=xr_2:=yr_1=42r_2=42{x=42y=42}.\{0_{\_}{x,y,r_{\_}1,r_{\_}2}\}\,y\mathop{{:}{=}}42\centerdot x\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot r_{\_}2\mathop{{:}{=}}y\centerdot\llparenthesis r_{\_}1=42\rrparenthesis\centerdot\llparenthesis r_{\_}2=42\rrparenthesis\,\{x=42\mathrel{\mathstrut{\wedge}}y=42\}.

The final state is therefore possible by Theorem 5.18.

Under hardware weak memory models (the observable effects of) writes can not happen before branch points, and so out-of-thin-air behaviours are not possible.

Consider the following variant of 𝗈𝗈𝗍𝖺\mathsf{oota}.

(6.20) 𝗈𝗈𝗍𝖺_𝖣=^r_1:=x;(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=r_1)r_2:=y;(𝐢𝐟cr_2=42𝐭𝐡𝐞𝐧x:=r_2)\mathsf{oota_{\_}D}~{}~{}\mathrel{\mathstrut{\widehat{=}}}~{}~{}r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}r_{\_}1)~{}~{}\parallel~{}~{}r_{\_}2\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}2=42\mathrel{\mathbf{then}}x\mathop{{:}{=}}r_{\_}2)

The inner assignments have changed from y:=42y\mathop{{:}{=}}42 (resp. x:=42x\mathop{{:}{=}}42) to y:=r_1y\mathop{{:}{=}}r_{\_}1 (resp. x:=r_2x\mathop{{:}{=}}r_{\_}2). Arguably the compiler knows that within the true branch of the conditional it must be the case that r_1=42r_{\_}1=42, and thus the assignment y:=r_1y\mathop{{:}{=}}r_{\_}1 can be treated as y:=42y\mathop{{:}{=}}42, reducing to the original 𝗈𝗈𝗍𝖺_𝖠\mathsf{oota_{\_}A}. But this outcome is expressly disallowed, by both the standard and naturally in our framework. That is, we can show that every possible final state satisfies x=y=0x=y=0.

Theorem 6.7.

{0_x,y,r_1,r_2}𝗈𝗈𝗍𝖺_𝖣{x=0y=0}\{0_{\_}{x,y,r_{\_}1,r_{\_}2}\}\,\mathsf{oota_{\_}D}\,\{x=0\mathrel{\mathstrut{\wedge}}y=0\}.

Proof.

The initial load into r_1r_{\_}1 creates a dependency with the rest of the code, i.e., r_1:=x×(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=r_1𝐞𝐥𝐬𝐞)r_{\_}1\mathop{{:}{=}}x{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}r_{\_}1\mathrel{\mathbf{else}}). Hence we can sequence the initial load of xx (into r_1r_{\_}1) with the remaining code.

(6.21) r_1:=x;(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=r_1)r_1:=x(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=r_1)\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}r_{\_}1)~{}~{}\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}~{}~{}r_{\_}1\mathop{{:}{=}}x\centerdot(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}r_{\_}1)

Although the guard and the assignment in the conditional may be reordered with each other (i.e., r_1=42\ext@arrow0055\Leftarrowfill@cy:=r_1\llparenthesis r_{\_}1=42\rrparenthesis\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}y\mathop{{:}{=}}r_{\_}1), the fact that the initial load must happen first means that, similarly to Theorem 6.5, there is no execution of 𝗈𝗈𝗍𝖺_𝖣\mathsf{oota_{\_}D} in which a non-zero value is written to any variable.

Note that the C standard allows the behaviour of 𝗈𝗈𝗍𝖺\mathsf{oota} and forbids the behaviour of 𝗈𝗈𝗍𝖺_𝖣\mathsf{oota_{\_}D}, and both results arise naturally in our semantics framework without the introduction of any ad-hoc mechanisms. We return to the question of whether guards should be allowed to simplify instructions in Sect. 10.

7. Incremental evaluation of code

In this and subsequent sections we consider more complex aspects of the C language and execution where they relate to concurrent behaviour, namely, in this section, non-atomic evaluation of instructions (until now we have assumed all instructions appearing in the text of the program are executed in a single indivisible step), in Sect. 8 optimisations of expressions (reducing expressions to improve efficiency), and in Sect. 9 forwarding (using earlier program text to simplify later instructions). We show how each can be incorporated into the framework straightforwardly without any need for change to the underlying definition of the c memory model, although the consequences for reasoning about particular programs may not be straightforward.

7.1. Incremental evaluation of expressions

In C one cannot assume expressions are evaluated in a single state. That is, programmers are allowed to write “complex” assignments and conditions but these may be compiled into multiple (indivisible) assembler instructions. For instance, the assignment z:=x+yz\mathop{{:}{=}}x+y may compile into (at least) three separation instructions, one to load the value of xx, one to load the value of yy, and one to finally store the result to zz.

We give an operational semantics for incremental expression evaluation in Fig. 5. (We use the term “incremental” rather than the more usual “non-atomic” to avoid terminology clash with C’s atomics.) Recall the syntax of an expression ee from Fig. 1. Each evaluation step of ee either loads a value of a free variable or reduces the expression in some way, and evaluation stops when a single value remains. Rule 7.1 states that a variable access xocs{x}^{ocs} is evaluated to a value vv via the guard xocs=v\llparenthesis{x}^{ocs}=v\rrparenthesis. Any choice of vv that is not the value of xx will result in a false guard, i.e., leads to an infeasible behaviour, which can be ignored. Only the correct value for vv leads to a feasible behaviour. Note that the set of ordering constraints on xx also appears in the label. Rule 7.4 evaluates a unary expression e\ominus e by simply inductively evaluating ee, while Rule 7.9 similarly evaluates each operand of a binary expression, in either order (it is of course straightforward to instead insist on left-to-right evaluation). Rule 7.14 uses a meta-level functor application method 𝖺𝗉𝗉𝗅𝗒(.)\mathsf{apply}(.) to calculate the final value of an expression once all variables have been evaluated.

(7.1) xocsxocs=vv\displaystyle{x}^{ocs}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis{x}^{ocs}=v\rrparenthesis}$}}v
(7.4) eαeeαe\displaystyle\begin{array}[]{c}e\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e^{\prime}\\ \cline{1-1}\cr\ominus e\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}\ominus e^{\prime}\end{array}
(7.9) e_1αe_1e_1e_2αe_1e_2e_2αe_2e_1e_2αe_1e_2\displaystyle\begin{array}[]{c}e_{\_}1\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e_{\_}1^{\prime}\\ \cline{1-1}\cr e_{\_}1\oplus e_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e_{\_}1^{\prime}\oplus e_{\_}2\end{array}\qquad\begin{array}[]{c}e_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e_{\_}2^{\prime}\\ \cline{1-1}\cr e_{\_}1\oplus e_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e_{\_}1\oplus e_{\_}2^{\prime}\end{array}
(7.14) 𝖺𝗉𝗉𝗅𝗒(,v)=vvτv𝖺𝗉𝗉𝗅𝗒(,v_1,v_2)=vv_1v_2τv\displaystyle\begin{array}[]{c}\mathsf{apply}(\ominus,v)=v^{\prime}\\ \cline{1-1}\cr\ominus v\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}v^{\prime}\end{array}\qquad\begin{array}[]{c}\mathsf{apply}(\oplus,v_{\_}1,v_{\_}2)=v^{\prime}\\ \cline{1-1}\cr v_{\_}1\oplus v_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}v^{\prime}\end{array}
Figure 5. Incremental expression evaluation semantics
\Description

TODO

As an example, consider the following possible evaluation of the expression xacq+yrlx{x}^{{\textsc{acq}}}+{y}^{{\textsc{rlx}}}.

(7.15) xacq+yrlxxacq=33+yrlxyrlx=23+2τ5{x}^{{\textsc{acq}}}+{y}^{{\textsc{rlx}}}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis{x}^{{\textsc{acq}}}=3\rrparenthesis}$}}3+{y}^{{\textsc{rlx}}}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis{y}^{{\textsc{rlx}}}=2\rrparenthesis}$}}3+2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}5

The first step applies Rule 7.9 and (inductively) Rule 7.1, and the second step is similar. The choice of values 3 and 2 are arbitrary, and any values could be chosen, reflecting the behaviour of the code in any state. The third and final step follows from (the assumed interpretation) 𝖺𝗉𝗉𝗅𝗒(+,3,2)=5\mathsf{apply}(+,3,2)=5. Note that the (relaxed) load of yy is not restricted by the acquire of xx appearing within the same expression, since we have given a nondeterministic expression evaluation order. Of course, one can change the rule for evaluating binary expressions to evaluate left-to-right and consider constraints appearing “earlier” in the expression.

In practice C is not restricted to “laboriously” evaluating each instruction step by step; in some cases evaluation can be wrapped into a single optimisation. We give such a rule in Sect. 8, which subsumes Rule 7.14.

7.2. Incremental execution of instructions

Now consider the incremental execution of a single instruction (for brevity, in this section we assume a single instruction α\alpha is the base action of the language, rather than a list α\vec{\alpha} as in previous sections (defn. (3.1)); we give a full semantics for incremental execution of lists of instructions in Appendix B). We have the concept of indivisible (𝗂𝗇𝖽𝗂𝗏𝗂𝗌\mathsf{indivis}) actions, which are the only instructions that may be executed directly in the operational semantics. We define an indivisible instruction as one where there are no shared variables to be read, i.e.,

𝗂𝗇𝖽𝗂𝗏𝗂𝗌(α)\displaystyle\mathsf{indivis}(\alpha) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} 𝗋𝗌𝗏(α)=\displaystyle{\sf rsv}(\alpha)=\varnothing

From this we can derive

(7.16) 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(x:=e)𝗌𝗏(e)=𝗂𝗇𝖽𝗂𝗏𝗂𝗌(e)𝗌𝗏(e)=𝗂𝗇𝖽𝗂𝗏𝗂𝗌(𝖿)𝖳𝗋𝗎𝖾\mathsf{indivis}(x\mathop{{:}{=}}e)\Leftrightarrow{\sf sv}(e)=\varnothing\qquad\mathsf{indivis}(\llparenthesis e\rrparenthesis)\Leftrightarrow{\sf sv}(e)=\varnothing\qquad\mathsf{indivis}(\mathsf{f})\Leftrightarrow\mathsf{True}

For instance, x:=1x\mathop{{:}{=}}1 is indivisible, while r:=yr\mathop{{:}{=}}y is not – yy must be separately loaded and the result later stored into rr.

(7.19) eαex:=eαx:=e\displaystyle\begin{array}[]{c}e\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e^{\prime}\\ \cline{1-1}\cr x\mathop{{:}{=}}e\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}x\mathop{{:}{=}}e^{\prime}\end{array}
(7.22) eαeeαe\displaystyle\begin{array}[]{c}e\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}e^{\prime}\\ \cline{1-1}\cr\llparenthesis e\rrparenthesis\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}\llparenthesis e^{\prime}\rrparenthesis\end{array}
(7.25) 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(α)αα𝐧𝐢𝐥\displaystyle\begin{array}[]{c}\mathsf{indivis}(\alpha)\\ \cline{1-1}\cr\alpha\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}\mathop{\bf nil}\end{array}
Figure 6. Incremental instruction execution semantics
\Description

TODO

Incremental execution rules for instructions are given in Fig. 6. Rule 7.19 states that an assignment instruction is evaluated by first evaluating the assigned expression, and similarly Rule 7.22 states that a guard is executed by first incrementally evaluating the expression. Rule 7.25 states that directly executable instructions can be executed as a single action. This rule applies for fences, and when evaluation of assignment or guard instructions has reduced them to an indivisible form. Note that we allow the (final) evaluation steps to include an arbitrary number of local variables. We insist only on shared variables being evaluated in separate steps, as these involve interactions with the memory system.

As an example of instruction evaluation, recalling (7.15), we place this expression evaluation in a release write.

zrel:=xacq+yrlx\ext@arrow0359\Rightarrowfill@xacq=3yrlx=2zrel:=5zrel:=5𝐧𝐢𝐥{z}^{{\textsc{rel}}}\mathop{{:}{=}}{x}^{{\textsc{acq}}}+{y}^{{\textsc{rlx}}}\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,\llparenthesis{x}^{{\textsc{acq}}}=3\rrparenthesis~{}~{}~{}\llparenthesis{y}^{{\textsc{rlx}}}=2\rrparenthesis\,}$}}{z}^{{\textsc{rel}}}\mathop{{:}{=}}5\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{{z}^{{\textsc{rel}}}\mathop{{:}{=}}5}$}}\mathop{\bf nil}

The first two steps are inherited from the expression evaluation semantics (via Rule 7.19). The final step is via Rule 7.25, noting 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(zrel:=5)\mathsf{indivis}({z}^{{\textsc{rel}}}\mathop{{:}{=}}5). The individual evaluation of local variables is not affected by the shared memory system and hence the following incremental execution is also allowed (where we leave implicit rlx accesses),

z:=x+r⦇x=3⦈z:=3+rz:=3+r𝐧𝐢𝐥z\mathop{{:}{=}}x+r\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis x=3\rrparenthesis}$}}z\mathop{{:}{=}}3+r\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{z\mathop{{:}{=}}3+r}$}}\mathop{\bf nil}

Incremental execution makes explicit what a compiler might do with C code involving shared variables. Note that, importantly, the reordering relation did not have to change or be reconsidered at all, even though the scope of C considered was increased (and made more complex to reason about, as is always the case when considering incremental evaluation).

7.3. Reasoning about incremental evaluation

Syntax-directed, general inference rules (e.g., in an Owicki-Gries or rely/guarantee framework) are rare for concurrent languages with incremental evaluation of expressions and instructions, irrespective of weak memory models. For instance, a simple program like x:=x+1x:=x+1x\mathop{{:}{=}}x+1\parallel x\mathop{{:}{=}}x+1 can result in final states where x{1,2}x\in\{1,2\} when executed incrementally, but this is not immediately derivable from typical compositional proof rules (Hayes et al., 2013). The problem is that, in general, syntax-directed proof rules do not directly apply as there are many places where interference may occur, which do not neatly align with syntactic terms. The situation is more complicated again with non-deterministic evaluation order, as specified by the C standard.

From a practical perspective the issue is typically resolved by making the atomicity explicit (possibly requiring the introduction of new temporary variables), i.e., we may rewrite the above program as follows.

(7.26) (r_1:=x;x:=r_1+1)(r_2:=x;x:=r_2+1)\displaystyle(r_{\_}1\mathop{{:}{=}}x\mathchar 24635\relax\;x\mathop{{:}{=}}r_{\_}1+1)\parallel(r_{\_}2\mathop{{:}{=}}x\mathchar 24635\relax\;x\mathop{{:}{=}}r_{\_}2+1)

In this format one can apply standard (non-incrementally evaluated) syntax-directed proof rules, such as Owicki-Gries or rely/guarantee, and show that x{1,2}x\in\{1,2\} in the final state (possibly requiring the further introduction of auxiliary variables or other techniques). Of course specialised rules to handle particular forms of incrementally-evaluated instructions can be derived, and these may be applied in some cases, but in general the intent is to precisely deal with communication/behaviour as written in the text of program.

As the difficulty of reasoning about possibly dynamically changing code structure impacts on reasoning about C programs, especially with reordering and incremental evaluation, we make this clearer by expressing it in terms of a definition and some remarks.

Definition 7.1 (Atomic-syntax-structured code).

A command in standard imperative programming syntax, where all basic building blocks (conditions, assignments) of command cc are evaluated/executed atomically, and execution proceeds in program-order, is atomic-syntax-structured code.

Note that a subset of IMP+pseq can be atomic-syntax-structured, especially if taking the plain subset (Defn. (5.2)) and using the semantics of Sect. 3.2.

Remark 7.1.

Most inference systems for concurrent code work on the basis the code is atomic-syntax-structured; it is non trivial to apply syntax-based approaches if the syntax does not directly map to execution. Often the atomicity is made explicit by introducing temporary variables, or a non-syntax based approach is used for verification, e.g., translating into automaton systems where, again, the atomicity is explicit.

Remark 7.2.

Many other approaches are still applicable to non-atomic-syntax-structured code, for instance, model checking.

We emphasise that C programs are in general not atomic-syntax-structured, and thus complicates analysis by some techniques, regardless of whether or not the c memory model is taken into account.

It is beyond the scope of this paper to develop rules that handle non-atomic-syntax-structured code but, as before, one may apply such rules after reduction. We argue that the level of granularity should be made explicit, or in other words, programmers (who wish to do analysis) should restrict themselves to instructions that are directly executable. For instance, normal assignments that reference as most one shared variable (see e.g., (Hayes et al., 2021) for rules coping with such situations).

If the developer insists on reasoning about code that is not atomic-syntax-structured then some of the reduction rules need provisos under incremental execution. For instance, Law 4.8 holds only when both instructions are indivisible, i.e., it must be updated to ensure the relevant instructions are 𝗂𝗇𝖽𝗂𝗏𝗂𝗌\mathsf{indivis}.

α/\ext@arrow0055\Leftarrowfill@cβ𝗂𝗇𝖽𝗂𝗏𝗂𝗌(α,β)α;βαβ\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}\beta\mathrel{\mathstrut{\wedge}}\mathsf{indivis}(\alpha,\beta)~{}~{}\Rightarrow~{}~{}\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}\alpha\centerdot\beta

If either is not indivisible then there may be parts of β\beta that can be incrementally evaluated before α\alpha, for instance consider:

x:=1;z:=x+yx\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}z\mathop{{:}{=}}x+y

Although x:=1/\ext@arrow0055\Leftarrowfill@cz:=x+yx\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}z\mathop{{:}{=}}x+y due to xx, the load of yy can come before x:=1x\mathop{{:}{=}}1, i.e., it is not the case that x:=1×z:=x+yx\mathop{{:}{=}}1{\color[rgb]{1.,0.,0.}\mathrel{\ooalign{\hss{$\times$\hss}\cr{\kern 0.59998pt$\parallel$}}}}z\mathop{{:}{=}}x+y under incremental execution. The reference to yy can be incrementally evaluated and reordered before the store to xx.

(7.27) x:=1;z:=x+y¯⦇y=3⦈x:=1;z:=x+3¯\ext@arrow0359\Rightarrowfill@x:=1x=1z:=4𝐧𝐢𝐥x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}z\mathop{{:}{=}}x+\underline{y}\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis y=3\rrparenthesis}$}}x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}z\mathop{{:}{=}}x+\underline{3}\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,x\mathop{{:}{=}}1~{}~{}~{}\llparenthesis x=1\rrparenthesis~{}~{}~{}z\mathop{{:}{=}}4\,}$}}\mathop{\bf nil}

As with proof methods, specific reduction rules to handle incremental evaluation can be derived (possibly using a program-level encoding of evaluation as given in (Colvin et al., 2017; Hayes et al., 2019)).

8. Expression optimisations

We now consider a further important factor influencing execution of programs under the C memory model: expression optimisations (we consider structural optimisations for instance, changing loop structures, in Sect. 11.2). There are three principles when considering “optimising” expression ee to ee^{\prime}:

  • Value equality. Expression ee^{\prime} must be equal to ee in all states. However, as we see below, extra contextual information can be used.

  • Lexicographic simplification. We say e𝗅𝖾𝗑ee\overset{{\sf{lex}}}{\succ}e^{\prime} if ee^{\prime} is a “more optimised” expression than ee, in the sense that it is less computationally intensive to evaluate. A precise definition of 𝗅𝖾𝗑\overset{{\sf{lex}}}{\succ} for C is beyond the scope of this work, but we assume it is irreflexive and transitive, and that intuitive properties such as 3+2𝗅𝖾𝗑53+2\overset{{\sf{lex}}}{\succ}5, and 0r𝗅𝖾𝗑00*r\overset{{\sf{lex}}}{\succ}0 hold. An important aspect is that 𝗅𝖾𝗑\overset{{\sf{lex}}}{\succ} may allow the removal of variables (as in 0r𝗅𝖾𝗑00*r\overset{{\sf{lex}}}{\succ}0), and this could have an effect on allowed reorderings according to Model 2.2, g (i.e., one cannot rely on “false dependencies” (Alglave et al., 2014)).

  • Memory ordering constraint simplification. We say e𝗈𝖼ee\overset{{\sf{oc}}}{\succeq}e^{\prime} if ee^{\prime} does not lose any significant memory ordering constraints. For instance, it may be the case that compilers should not “optimise away” an explicit sc constraint, even if valid according to the other optimisations. Again, the precise definition of this is a matter for the C committee, but we explore some of the options and their consequences below in Sect. 8.1.

These three constraints must be satisfied before an expression ee is ‘optimised’ to expression ee^{\prime}, written e𝗈𝗉𝗍ee\overset{{\sf{opt}}}{\succ}e^{\prime}.

(8.1) e𝗈𝗉𝗍e\displaystyle e\overset{{\sf{opt}}}{\succ}e^{\prime} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} e=ee𝗅𝖾𝗑ee𝗈𝖼e\displaystyle e=e^{\prime}\mathrel{\mathstrut{\wedge}}e\overset{{\sf{lex}}}{\succ}e^{\prime}\mathrel{\mathstrut{\wedge}}e\overset{{\sf{oc}}}{\succeq}e^{\prime}

The following operational rule allows optimisations as an expression evaluation step, superseding Rule 7.14 and in some cases removes the need for Rule 7.9, etc.

Rule 8.2 (Optimise expression).
be𝗈𝗉𝗍ee⦇b⦈e\begin{array}[]{c}b\Rrightarrow e\overset{{\sf{opt}}}{\succ}e^{\prime}\\ \cline{1-1}\cr e\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis b\rrparenthesis}$}}e^{\prime}\end{array}

Rule 8.2 states that an expression ee can be optimised to some expression ee^{\prime}, in the process emitting a guard b\llparenthesis b\rrparenthesis, where bb provides the context which makes the optimisation valid (the expression bb is used only to show e=ee=e^{\prime} in defn. (8.1)). The guard acts as a check that the optimisation is valid in the current state; for many optimisations bb will simply be 𝖳𝗋𝗎𝖾\mathsf{True}.

As an example, assuming a definition of 𝗈𝖼\overset{{\sf{oc}}}{\succeq} where e𝗈𝖼ee\overset{{\sf{oc}}}{\succeq}e^{\prime} holds provided ee contains only relaxed or no ordering constraints, and ee^{\prime} has a subset of those, then the following steps are allowed by Rule 8.2.

x:=rrτx:=0\displaystyle x\mathop{{:}{=}}r-r\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}x\mathop{{:}{=}}0 Since 𝖳𝗋𝗎𝖾rr=0rr𝗅𝖾𝗑0rr𝗈𝖼0\mathsf{True}\Rrightarrow r-r=0\mathrel{\mathstrut{\wedge}}r-r\overset{{\sf{lex}}}{\succ}0\mathrel{\mathstrut{\wedge}}r-r\overset{{\sf{oc}}}{\succeq}0
x:=r_1r_2r_1=r_2x:=0\displaystyle x\mathop{{:}{=}}r_{\_}1-r_{\_}2\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis r_{\_}1=r_{\_}2\rrparenthesis}$}}x\mathop{{:}{=}}0 Since r_1=r_2r_1r_2=0r_1r_2𝗅𝖾𝗑0r_1r_2𝗈𝖼0r_{\_}1=r_{\_}2\Rrightarrow r_{\_}1-r_{\_}2=0\mathrel{\mathstrut{\wedge}}r_{\_}1-r_{\_}2\overset{{\sf{lex}}}{\succ}0\mathrel{\mathstrut{\wedge}}r_{\_}1-r_{\_}2\overset{{\sf{oc}}}{\succeq}0
r:=xrlx0τr:=0\displaystyle r\mathop{{:}{=}}{x}^{{\textsc{rlx}}}*0\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}r\mathop{{:}{=}}0 Since 𝖳𝗋𝗎𝖾xrlx0=0xrlx0𝗅𝖾𝗑0xrlx0𝗈𝖼0\mathsf{True}\Rrightarrow{x}^{{\textsc{rlx}}}*0=0\mathrel{\mathstrut{\wedge}}{x}^{{\textsc{rlx}}}*0\overset{{\sf{lex}}}{\succ}0\mathrel{\mathstrut{\wedge}}{x}^{{\textsc{rlx}}}*0\overset{{\sf{oc}}}{\succeq}0

On the other hand, given the above assumption about 𝗈𝖼\overset{{\sf{oc}}}{\succeq}, r:=xsc0r\mathop{{:}{=}}{x}^{{\textsc{sc}}}*0 cannot be optimised to r:=0r\mathop{{:}{=}}0 as this would lose a significant reordering constraint (xsc0𝗈𝖼0{x}^{{\textsc{sc}}}*0\not\overset{{\sf{oc}}}{\succeq}0 and hence xsc0𝗈𝗉𝗍0{x}^{{\textsc{sc}}}*0\not\overset{{\sf{opt}}}{\succ}0). We consider other examples in the subsequent section.

8.1. Defining allowed changes to ordering constraints

One of the tensions in the development of the C memory model is how accepted compiler optimisations interact with memory ordering constraints. Rather than take a particular position, or try to be exhaustive, we show how different options can be expressed and their consequences enumerated formally and (relatively) straightforwardly.

Consider the following five possible definitions for 𝗈𝖼\overset{{\sf{oc}}}{\succeq}. Recall that e\lceil e\rceil extracts the memory ordering constraints from expression ee (defn. (2.69)). We abbreviate acs=^{acq,con,sc}\textsc{acs}\mathrel{\mathstrut{\widehat{=}}}\{{\textsc{acq}},{\textsc{con}},{\textsc{sc}}\}, as these are the significant ordering constraints that may appear in expressions (rel constraints appear only on the left-hand side of assignments and thus are not subject to expression optimisation).

(8.8) e𝗈𝖼e\displaystyle e\overset{{\sf{oc}}}{\succeq}e^{\prime} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} {e=e(a) Do not modify constraintsee{rlx}(b) Simplify/eliminate relaxede{rlx}(c) Strengthening allowedeacs=eacs(d) Never optimise acs away𝖳𝗋𝗎𝖾(e) Do not constrain the compiler\displaystyle\left\{\begin{array}[]{rcll}\lceil e^{\prime}\rceil&=&\lceil e\rceil&\mbox{(a) Do not modify constraints}\\ \lceil e^{\prime}\rceil&\subseteq&\lceil e\rceil~{}~{}\subseteq~{}~{}\{{\textsc{rlx}}\}&\mbox{(b) Simplify/eliminate relaxed}\\ \lceil e\rceil&\subseteq&\{{\textsc{rlx}}\}&\mbox{(c) Strengthening allowed}\\ \lceil e\rceil\mathbin{\mathstrut{\cap}}\textsc{acs}&=&\lceil e^{\prime}\rceil\mathbin{\mathstrut{\cap}}\textsc{acs}&\mbox{(d) Never optimise $\textsc{acs}$ away}\\ &\mathsf{True}&&\mbox{(e) Do not constrain the compiler}\end{array}\right.
  1. Option (a)

    is the most conservative option and simply says the compiler must not change the constraints in ee at all. This would be simple to implement and reason about, but possibly prevents some sensible/expected optimisations.

  2. Option (b)

    says that only relaxed or non-atomic expressions ee may be removed, and the optimised expression ee^{\prime} can either remain relaxed or become non-atomic itself. Stronger constraints (acq, con, and sc) will not be “optimised away”.

  3. Option (c)

    says that only relaxed or non-atomic expressions can be optimised, however, any such constraints can be strengthened (e.g., a rlx access can be strengthen to sc). While more subtle than the other options, it imposes fewer constraints on the compiler writer, and would only have the effect of reducing the number of behaviours of code.

  4. Option (d)

    requires acq and sc constraints to be maintained, if they occur, but other parts of a complex expression can be optimised.

  5. Option (e)

    allows full freedom to the compiler, leaving the programmer unable to rely entirely on memory ordering constraints to enforce order.

We give some examples in tabular form to understand the consequences of these choices.

(a)(b)(c)(d)(e)54𝗈𝖼20Γ\symAMSa058Γ\symAMSa058Γ\symAMSa058Γ\symAMSa058Γ\symAMSa058xrlx0𝗈𝖼0Γ\symAMSa058×Γ\symAMSa058Γ\symAMSa058Γ\symAMSa058xsc0𝗈𝖼0××××Γ\symAMSa058xsc𝗈𝖼xrlx××××Γ\symAMSa058xrlx𝗈𝖼xsc××Γ\symAMSa058×Γ\symAMSa058\begin{array}[]{r@{~}c@{~}lcccccc}~{}&\hfil~{}&&(a)&(b)&(c)&(d)&(e)&\\ 5*4~{}&\overset{{\sf{oc}}}{\succeq}\hfil~{}&20&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\\ {x}^{{\textsc{rlx}}}*0~{}&\overset{{\sf{oc}}}{\succeq}\hfil~{}&0&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\times&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\\ {x}^{{\textsc{sc}}}*0~{}&\overset{{\sf{oc}}}{\succeq}\hfil~{}&0&\times&\times&\times&\times&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\\ {x}^{{\textsc{sc}}}~{}&\overset{{\sf{oc}}}{\succeq}\hfil~{}&{x}^{{\textsc{rlx}}}&\times&\times&\times&\times&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\\ {x}^{{\textsc{rlx}}}~{}&\overset{{\sf{oc}}}{\succeq}\hfil~{}&{x}^{{\textsc{sc}}}&\times&\times&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\times&{\color[rgb]{0.2.,0.5,0.2}\text{$\mathchar 0\relax\symAMSa 05$}8}&\end{array}

As with incremental evaluation of expressions, while it is straightforward to incorporate optimisations into the semantics, programs may not be atomic-syntax-structured (Defn. (7.1)), and hence a programmer interested in serious analysis is well advised to avoid potential simplification of expressions that a compiler may or may not choose to make.

8.2. The consume ordering constraint

Because C’s consume (con) constraint has no reordering constraint beyond that of data dependencies we have for brevity not included it in earlier sections. The intent of a con load is to indicate to the compiler not to lose data dependencies during optimisations. Options (b)-(e) for 𝗈𝖼\overset{{\sf{oc}}}{\succeq} allow rlx variable accesses to be removed, which may also remove a data dependency to some earlier load. This is the situation that a con load is intended to avoid. For instance, the following optimisation should not be allowed.

r:=xcon;y:=r0τr:=xcon;y:=0r\mathop{{:}{=}}{x}^{{\textsc{con}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}r*0\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}r\mathop{{:}{=}}{x}^{{\textsc{con}}}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}0

The flow-on effect is that now y:=0y\mathop{{:}{=}}0 is independent of r:=xconr\mathop{{:}{=}}{x}^{{\textsc{con}}} and may be reordered before it (as con is equivalent to rlx in calculating \ext@arrow0055\Leftarrowfill@oc\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{oc}}}}}}). To faithfully implement this a compiler must track data dependencies, and apparently this has never been implemented; as such all known compilers translate con constraints directly to the stronger acq constraint (which on many architectures results in a more computationally expensive mechanism than necessary). Such syntactic tracking is straightforward, if tedious, to implement in a formal setting. For instance, by tracing data-dependencies (via write and read variables) from all con loads, marking them with some special ordering constraint ‘condep’ (consume-dependent), and requiring the definition of 𝗈𝖼\overset{{\sf{oc}}}{\succeq} to never allow condep constraints to be removed. As always, a programmer is well-advised to minimise the use of con constraints with later code that may allow expression optimisations to break data dependencies.

8.3. Examples

We show how the compiler may “optimise-away” an ordering constraint and hence open up more behaviours than the programmer may expect. In the following assume Option (b) above for the definition of 𝗈𝖼\overset{{\sf{oc}}}{\succeq}. A programmer may choose to enforce order in the 𝚖𝚙{\tt mp} program (defn. (6.2)) using a data dependency.

x:=1;flag:=1+(x0)x\mathop{{:}{=}}1\mathrel{\mathchar 24635\relax}flag\mathop{{:}{=}}1+(x*0)

Although x:=1flag:=1+(x0)x\mathop{{:}{=}}1\rightsquigarrow flag\mathop{{:}{=}}1+(x*0) and thus it seems the assignment to yy must occur after the assignment to xx, after optimising the second instruction the updates can occur in the reverse order. That is, by Rule 8.2 (within Rule 7.19), flag:=1+(x0)τflag:=1flag\mathop{{:}{=}}1+(x*0)\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}flag\mathop{{:}{=}}1, and thus we have the following behaviour.

(8.9) x:=1;flag:=1+(x0)τx:=1;flag:=1\ext@arrow0359\Rightarrowfill@flag:=1x:=1x:=1𝐧𝐢𝐥x\mathop{{:}{=}}1\mathrel{\mathchar 24635\relax}flag\mathop{{:}{=}}1+(x*0)\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}x\mathop{{:}{=}}1\mathrel{\mathchar 24635\relax}flag\mathop{{:}{=}}1\ext@arrow 0359\Rightarrowfill@{}{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\,flag\mathop{{:}{=}}1\,}$}}x\mathop{{:}{=}}1\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{x\mathop{{:}{=}}1}$}}\mathop{\bf nil}

As a consequence, for reasoning about the original code, one must accept the following refinement.

x:=1;flag:=1+(x0)flag:=1x:=1x\mathop{{:}{=}}1\mathrel{\mathchar 24635\relax}flag\mathop{{:}{=}}1+(x*0)~{}~{}\mathrel{\sqsubseteq}~{}~{}flag\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}1

Alternatively a programmer may choose to avoid the dependence on data and instead enforce order using an sc constraint in the flag expression on an unrelated variable. Under option (e) this would be erroneous, since 1+(ysc0)𝗈𝗉𝗍11+({y}^{{\textsc{sc}}}*0)\overset{{\sf{opt}}}{\succ}1, and thus as above,

x:=1;flag:=1+(ysc0)flag:=1x:=1x\mathop{{:}{=}}1\mathrel{\mathchar 24635\relax}flag\mathop{{:}{=}}1+({y}^{{\textsc{sc}}}*0)~{}~{}\mathrel{\sqsubseteq}~{}~{}flag\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}1

9. Forwarding

A key aspect of hardware pipelines is that instructions later in the pipeline can read values from earlier instructions (under certain circumstances). At the microarchitectural level, rather than waiting for an earlier instruction to commit and write a value to a register before reading that value from the register, it may be quicker to read an “in-flight” value directly from an earlier instruction before it commits. This behaviour may be implemented by so-called “reservation stations” or related mechanisms (Tomasulo, 1967). Fortunately there is a straightforward way to capture this in a structured program, by allowing later instructions that are reordered to pick up and use values in the text of earlier instructions. For instance, the rules of the earlier section forbid r:=xr\mathop{{:}{=}}x reordering before x:=1x\mathop{{:}{=}}1, i.e., x:=1/\ext@arrow0055\Leftarrowfill@cr:=xx\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}r\mathop{{:}{=}}x, which is conceptually prevented because reading an earlier value of xx (than 1) is prohibited by coherence-per-location (formally, x:=1/\ext@arrow0055\Leftarrowfill@cr:=xx\mathop{{:}{=}}1\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}r\mathop{{:}{=}}x because 𝗐𝗏(x:=1)={x}𝖿𝗏(r:=x){\sf wv}(x\mathop{{:}{=}}1)=\{x\}\subseteq{\sf fv}(r\mathop{{:}{=}}x)). However processors will commonly just use the value 1 and assign it to rr as well, before any other process has seen the write to xx.

In hardware the mechanism of using earlier values in later instructions is called forwarding. Notationally we write βα\guillemetright{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta to mean the effect of forwarding (assignment) action α\alpha to β\beta, which is just simple substitution (ignoring memory ordering constraints). For instance, in the above situation the value 1 assigned to xx can be “forwarded” to rr, written (rx:=1\guillemetright:=x)=r:=1({}_{x\mathop{{:}{=}}1\mbox{\raisebox{-0.5pt}{\guillemetright}}}r\mathop{{:}{=}}x)=r\mathop{{:}{=}}1, avoiding the need to access main memory.

Definition 9.1 (Forwarding).

Given an assignment instruction α\alpha of the form xocs:=e{x}^{ocs}\mathop{{:}{=}}e, forwarding of α\alpha to an expression ff (written fα\guillemetright{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}f) is standard replacement of instances of xx by ee within ff, ignoring ordering constraints. This is lifted to forwarding to instructions as below.

(yocs:=f)α\guillemetright=yocs:=(fα\guillemetright)bα\guillemetright\displaystyle{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}({y}^{ocs}\mathop{{:}{=}}f)~{}~{}=~{}~{}{y}^{ocs}\mathop{{:}{=}}({}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}f)\qquad{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\llparenthesis b\rrparenthesis =\displaystyle= bα\guillemetright𝖿α\guillemetright=𝖿\displaystyle\llparenthesis{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}b\rrparenthesis\qquad{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\mathsf{f}~{}~{}=~{}~{}\mathsf{f}

Forwarding to/from commands and traces is similarly straightforward; see Appendix A.2. Note that αββα\guillemetright=β\alpha\,\,\mathop{\not\!\rightsquigarrow}\,\beta\Rightarrow{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta=\beta, i.e., forwarding is relevant only if there is a data dependence. The following examples show the relatively straightforward application of forwarding.

r_1=1r_1:=1\guillemetright=1=1=𝖳𝗋𝗎𝖾=τ\displaystyle{}_{r_{\_}1\mathop{{:}{=}}1\mbox{\raisebox{-0.5pt}{\guillemetright}}}\llparenthesis r_{\_}1=1\rrparenthesis=\llparenthesis 1=1\rrparenthesis=\llparenthesis\mathsf{True}\rrparenthesis=\tau
(9.1) r_1=r_2r_1:=r_2\guillemetright=r_2=r_2=𝖳𝗋𝗎𝖾=τ\displaystyle{}_{r_{\_}1\mathop{{:}{=}}r_{\_}2\mbox{\raisebox{-0.5pt}{\guillemetright}}}\llparenthesis r_{\_}1=r_{\_}2\rrparenthesis=\llparenthesis r_{\_}2=r_{\_}2\rrparenthesis=\llparenthesis\mathsf{True}\rrparenthesis=\tau

To capture the potential for an earlier instruction to affect a later one we generalise reordering to a triple, β\guillemetleftα\guillemetleftmβ\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta. Whereas earlier we considered whether α\ext@arrow0055\Leftarrowfill@cβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta directly, we now allow α\alpha to affect β\beta via forwarding, and for the resulting instruction (β\beta^{\prime}) to be considered for the purposes of calculating reordering. More precisely, for instructions α\alpha and β\beta,

(9.2) β\guillemetleftα\guillemetleftmβ=^α\ext@arrow0055\Leftarrowfill@cββ=βα\guillemetrightβ𝗈𝖼β\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\hskip 5.69054pt\mathrel{\mathstrut{\widehat{=}}}\hskip 5.69054pt\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta^{\prime}\mathrel{\mathstrut{\wedge}}\beta^{\prime}={}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta\mathrel{\mathstrut{\wedge}}\beta\overset{{\sf{oc}}}{\succeq}\beta^{\prime}

The reordering triple notationally gives the idea of executing β\beta earlier with respect to α\alpha, with possible modifications due to forwarding. The new instruction β\beta^{\prime} is the result of forwarding α\alpha to β\beta, and it must be reorderable with α\alpha (note, therefore, that reordering is calculated after applying forwarding). For instance, r:=1\guillemetleftx:=1\guillemetleftr:=xr\mathop{{:}{=}}1\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}x\mathop{{:}{=}}1\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}r\mathop{{:}{=}}x expresses that r:=xr\mathop{{:}{=}}x can reorder with x:=1x\mathop{{:}{=}}1, after forwarding is taken into account. Additionally the effect of forwarding must not have significantly altered the ordering constraints, i.e., β𝗈𝖼βα\guillemetright\beta\overset{{\sf{oc}}}{\succeq}{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta (recall the options in Sect. 8.1). For instance, depending on the definition of 𝗈𝖼\overset{{\sf{oc}}}{\succeq}, it may or may not be the case that r:=1\guillemetleftx:=1\guillemetleftr:=xscr\mathop{{:}{=}}1\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}x\mathop{{:}{=}}1\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}r\mathop{{:}{=}}{x}^{{\textsc{sc}}}, as this depends on whether xsc𝗈𝖼1{x}^{{\textsc{sc}}}\overset{{\sf{oc}}}{\succeq}1. The reordering triple lifts to traces and commands straightforwardly; see Appendix A.2.

We use this more general triple in place of binary reordering in Rule 3.18.

Rule 9.3 (Reorder with forwarding).
c_2βc_2β\guillemetleftc_1\guillemetleftmβc_1;mc_2βc_1;mc_2\begin{array}[]{c}c_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\beta}}{{\longrightarrow}}}c_{\_}2^{\prime}\qquad\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}1\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\\ \cline{1-1}\cr c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\beta^{\prime}}}{{\longrightarrow}}}c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2^{\prime}\end{array}
Now, given command c_1;mc_2c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2, and that c_2c_{\_}2 can execute step β\beta, then the composition can execute the modified β\beta^{\prime}, where β\beta^{\prime} takes in to account any forwarding that may occur from instructions in c_1c_{\_}1 to β\beta.

Consider the simple statement x:=r_1r_2x\mathop{{:}{=}}r_{\_}1-r_{\_}2, which can be optimised to x:=0x\mathop{{:}{=}}0 provided r_1=r_2r_{\_}1=r_{\_}2. That is, by Rule 8.2, x:=r_1r_2r_1=r_2x:=0x\mathop{{:}{=}}r_{\_}1-r_{\_}2\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{\llparenthesis r_{\_}1=r_{\_}2\rrparenthesis}$}}x\mathop{{:}{=}}0. Let us consider this statement immediately following the assignment r_1:=r_2r_{\_}1\mathop{{:}{=}}r_{\_}2. By Defn. (9.1) and by (9.1) we have τ\scaleobj0.8<<r_1:=r_2\scaleobj0.8<<r_1=r_2\tau\,\scaleobj{0.8}{<\!\!<}\,r_{\_}1\mathop{{:}{=}}r_{\_}2\,\scaleobj{0.8}{<\!\!<}\,\llparenthesis r_{\_}1=r_{\_}2\rrparenthesis. Hence by applying Rule 9.3 this program can take an initial silent step, representing an optimisation of the compiler, to simplify the assignment to xx.

(9.4) r_1:=r_2;x:=r_1r_2τr_1:=r_2;x:=0r_{\_}1\mathop{{:}{=}}r_{\_}2\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r_{\_}1-r_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}r_{\_}1\mathop{{:}{=}}r_{\_}2\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}0

From here the assignment to xx can proceed first, despite the data dependence r_1:=r_2x:=r_1r_2r_{\_}1\mathop{{:}{=}}r_{\_}2\rightsquigarrow x\mathop{{:}{=}}r_{\_}1-r_{\_}2, that is,

r_1:=r_2;x:=r_1r_2x:=0r_1:=r_2r_{\_}1\mathop{{:}{=}}r_{\_}2\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r_{\_}1-r_{\_}2\mathrel{\sqsubseteq}x\mathop{{:}{=}}0\centerdot r_{\_}1\mathop{{:}{=}}r_{\_}2

9.1. Reduction with forwarding

We update some of the reduction rules from Sect. 4 to include forwarding via Rule 9.3, essentially replacing α\ext@arrow0055\Leftarrowfill@cβ\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta with the more general β\guillemetleftα\guillemetleftcβ\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{c}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta. We assume 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(α)\mathsf{indivis}(\alpha) and 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(β)\mathsf{indivis}(\beta). Note that if αβ\alpha\,\,\mathop{\not\!\rightsquigarrow}\,\beta then β\guillemetleftα\guillemetleftcβ(α\ext@arrow0055\Leftarrowfill@cββ=β)\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{c}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\Leftrightarrow(\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta\mathrel{\mathstrut{\wedge}}\beta^{\prime}=\beta), and hence in many cases the original rules apply.

(9.5) β\guillemetleftα\guillemetleftcβα;β\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{c}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\hskip 5.69054pt\Rrightarrow\hskip 5.69054pt\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta \displaystyle\mathrel{\sqsubseteq} βα\displaystyle\beta^{\prime}\centerdot\alpha
(9.6) β\guillemetleftα\guillemetleftcβα;β\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{c}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\hskip 5.69054pt\Rrightarrow\hskip 5.69054pt\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} (α;β)(β;α)\displaystyle(\alpha\mathrel{\mathchar 24635\relax}\beta)\sqcap(\beta^{\prime}\mathrel{\mathchar 24635\relax}\alpha)
(9.7) β\guillemetleftc\guillemetleftcβc;β\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{c}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\hskip 5.69054pt\Rrightarrow\hskip 5.69054ptc\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta \displaystyle\mathrel{\sqsubseteq} βc\displaystyle\beta^{\prime}\centerdot c
(9.10) α/\ext@arrow0055\Leftarrowfill@cβα\guillemetrightα;β\displaystyle\alpha\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta\hskip 5.69054pt\Rrightarrow\hskip 5.69054pt\alpha\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\beta \displaystyle\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}} αβ\displaystyle\alpha\centerdot\beta

Law 9.5 expresses the reordering of β\beta earlier than α\alpha but with any forwarding from α\alpha to β\beta taken into account in the promoted instruction, while Law 9.6 is the corollary of Law 4.9. Law 9.7 is the generalisation of Law 9.5 to a command on the left. Law 9.10 applies when reordering is not possible even taking into account the effects of forwarding, replacing Law 4.8. The presence of forwarding therefore complicates reduction, however the derived properties for reasoning (Sect. 5) still apply to any reduced program.

A subtlety of a chain of dependent instructions with forwarding is that associativity can be lost. For instance, consider the program x:=1;r_1:=x;r_2:=xx\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}2\mathop{{:}{=}}x. The behaviours are different depending on how this is bracketed: p_1=^(x:=1;r_1:=x);r_2:=xp_{\_}1\mathrel{\mathstrut{\widehat{=}}}(x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}1\mathop{{:}{=}}x)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}2\mathop{{:}{=}}x or p_2=^x:=1;(r_1:=x;r_2:=x)p_{\_}2\mathrel{\mathstrut{\widehat{=}}}x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}2\mathop{{:}{=}}x). We have p_1r_2:=1r_1:=1x:=1p_{\_}1\mathrel{\sqsubseteq}r_{\_}2\mathop{{:}{=}}1\centerdot r_{\_}1\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}1 by Law 9.7, however p_2\r_2:=1r_1:=1x:=1p_{\_}2\mathrel{\ooalign{\hss{$\backslash$\hss}\cr{$\mathrel{\sqsubseteq}$}}}r_{\_}2\mathop{{:}{=}}1\centerdot r_{\_}1\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}1 because r_1:=x/\ext@arrow0055\Leftarrowfill@cr_2:=xr_{\_}1\mathop{{:}{=}}x\mathrel{{\color[rgb]{1.,0.,0.}\ooalign{\hss{/\hss}\cr{$\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}$}}}}r_{\_}2\mathop{{:}{=}}x. Hence p_1\p_2p_{\_}1\mathrel{\ooalign{\hss{$\backslash$\hss}\cr{$\mathrel{\hbox{\hbox to0.0pt{$\sqsubseteq$\hss}$\sqsupseteq$}}$}}}p_{\_}2, and so associativity has been broken. (A more complete discussion of associativity and monotonicity is given in (Colvin, 2021b).)

The addition of forwarding also explains why a compiler transformation such as “sequentialisation” (Kang et al., 2017a) is not valid. While it is straightforward that cdcdc\parallel d\mathrel{\sqsubseteq}c\centerdot d, i.e., enforcing a strict order between two parallel processes, it is not the case in general that cdc;dc\parallel d\mathrel{\sqsubseteq}c\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}d, due to forwarding. The simplest case is x:=1r:=xx\mathop{{:}{=}}1\parallel r\mathop{{:}{=}}x, which has exactly two possible traces (interleavings), however x:=1;r:=xr:=1x:=1x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}x\mathrel{\sqsubseteq}r\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}1 by Law 9.5, where rr receives the value 1 before it is assigned to xx, which is not possible when executing in parallel.

9.2. Example

Forwarding admits some perhaps unexpected behaviours, for instance, in the following code, although r:=xr\mathop{{:}{=}}x and x:=1x\mathop{{:}{=}}1 are strictly ordered locally, it is possible for r:=xr\mathop{{:}{=}}x to receive that later value, ostensibly breaking local coherence.

Theorem 9.2.

0_x,y,r(r:=x;x:=1;y:=x)x:=yr=1\langle\!\langle 0_{\_}{x,y,r}\rangle\!\rangle\,(r\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}x)\parallel x\mathop{{:}{=}}y\,\langle\!\langle r=1\rangle\!\rangle

Proof.

Call the left and right programs p_1p_{\_}1 and p_2p_{\_}2, respectively. Note that in p_1p_{\_}1 although r:=xr\mathop{{:}{=}}x precedes the assignment x:=1x\mathop{{:}{=}}1 (the only occurrence of the value 1 in the program), and x:=1x\mathop{{:}{=}}1 cannot be reordered before r:=xr\mathop{{:}{=}}x, this theorem states that r:=xr\mathop{{:}{=}}x can read that value. Firstly note that y:=1\guillemetleftr:=x;x:=1\guillemetlefty:=xy\mathop{{:}{=}}1\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}r\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}1\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}y\mathop{{:}{=}}x, that is, y:=xy\mathop{{:}{=}}x can read x:=1x\mathop{{:}{=}}1 and be reordered before the preceding instructions, and hence by Law 9.7 we have

p_1y:=1r:=xx:=1p_{\_}1\mathrel{\sqsubseteq}y\mathop{{:}{=}}1\centerdot r\mathop{{:}{=}}x\centerdot x\mathop{{:}{=}}1

Hence interleaving p_1p_{\_}1 and p_2p_{\_}2 so that x:=yx\mathop{{:}{=}}y in p_2p_{\_}2 becomes the second instruction executed gives the following.

p_1p_2y:=1x:=yr:=xx:=1p_{\_}1\parallel p_{\_}2\mathrel{\sqsubseteq}y\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}y\centerdot r\mathop{{:}{=}}x\centerdot x\mathop{{:}{=}}1

This reordering and interleaving satisfies {0_x,y,r}y:=1x:=yr:=xx:=1{r=1}\{0_{\_}{x,y,r}\}\,y\mathop{{:}{=}}1\centerdot x\mathop{{:}{=}}y\centerdot r\mathop{{:}{=}}x\centerdot x\mathop{{:}{=}}1\,\{r=1\}, and thus the outcome r=1r=1 is a possible final state of p_1p_2p_{\_}1\parallel p_{\_}2 by Theorem 5.18.

It appears that coherence has been broken (indirectly through a concurrent process). However, the intuition is that the compiler (and/or the processor) decides that the programmer intended y:=1y\mathop{{:}{=}}1 when they wrote y:=xy\mathop{{:}{=}}x, and it is this value that is read by r:=xr\mathop{{:}{=}}x via the second process. To make this clearer, consider changing the trailing assignment y:=xy\mathop{{:}{=}}x in the first process to y:=x+1y\mathop{{:}{=}}x+1. In this case a possible final state is r=2r=2 (but not r=1r=1), meaning that the initial load of xx has read the (arguably independent) write to yy, not the write to xx.

Forwarding of assignments is considered standard, and behaviours such as that shown by this example are accepted (in real code, typically a programmer will try to avoid loading a variable that has already been locally calculated, so this pattern, while certainly valid and has its uses, is not necessarily widespread). However, the situation is more complicated if one wishes to forward guards as well as assignments to later instructions, which we explore in Sect. 10.

9.3. Combining forwarding with optimisation/simplification

To keep the discussion of forwarding relatively simple we separated it from optimisation, but we can generalise defn. (9.2) to incorporate optimisations as well.

(9.11) β\guillemetleftα\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} α\ext@arrow0055\Leftarrowfill@cββ𝗈𝗉𝗍βα\guillemetright\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta^{\prime}\mathrel{\mathstrut{\wedge}}\beta^{\prime}\overset{{\sf{opt}}}{\succ}{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta

This definition allows any “optimised” version of β\beta to be reordered before α\alpha, provided forwarding is taken into account. It can become the basis for the application of Rule 9.3, and thus the derived reduction rules in Sect. 9.1. For instance, in comparison with the example in Sect. 9, we can reorder, forward and optimise in a single step since, by defn. (9.11), x:=0\guillemetleftr_1:=r_2\guillemetleftx:=r_1r_2x\mathop{{:}{=}}0\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}r_{\_}1\mathop{{:}{=}}r_{\_}2\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}x\mathop{{:}{=}}r_{\_}1-r_{\_}2. The corresponding deduction rules allow us to show, for instance, the following allowed reordering and simplification of code by the compiler.

(9.12) x:=y;z:=xyz:=0x:=yx\mathop{{:}{=}}y\mathrel{\mathchar 24635\relax}z\mathop{{:}{=}}x-y\mathrel{\sqsubseteq}z\mathop{{:}{=}}0\centerdot x\mathop{{:}{=}}y

because τ\guillemetleftx:=y\guillemetleftx=y\tau\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}x\mathop{{:}{=}}y\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\llparenthesis x=y\rrparenthesis. This answers the question “can z:=xyz\mathop{{:}{=}}x-y be reordered before x:=yx\mathop{{:}{=}}y?”: provided all references are relaxed then z:=0z\mathop{{:}{=}}0 can be executed earlier than the update to xx. Of course, structured reasoning about code which is susceptible to such compiler transformations may be nontrivial as it is not atomic-syntax-structured.

10. Read-from-untaken-branch (RFUB)/Self-fulfilling prophecies (SFPs)

In this section we separately consider a problematic situation which is debated by the C committee, where what is considered reasonable compiler optimisation leads to complex behaviour which is difficult to reason about. We show how the problematic behaviour is disallowed by the semantics we have given so far, and a small modification – which we call allowing self-fulfilling prophecies (SFP) – can be made which then allows the problematic behaviour; and we also show that allowing SFPs contradicts other, simpler cases which are expressly forbidden. We believe this provides a firm, accessible basis on which to assess which compiler optimisations should be allowed in the presence of shared-variable concurrency (i.e., C’s atomics). Importantly the framework gives a step-based, relatively intuitive, explanation for the different possible behaviours, and flexibility to accommodate different decisions.

10.1. Read-from-untaken-branch behaviour

Consider the following program, which, for particular compiler optimisations, exposes a “read from untaken branch” (rfub) behaviour.

(10.1) 𝗋𝖿𝗎𝖻\displaystyle{\sf{rfub}} =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} (r:=y;(𝐢𝐟cr42𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=42);x:=r)y:=x\displaystyle(r\mathop{{:}{=}}y\mathchar 24635\relax\;(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathchar 24635\relax\;r\mathop{{:}{=}}42)\mathchar 24635\relax\;x\mathop{{:}{=}}r)\hskip 5.69054pt\parallel\hskip 5.69054pty\mathop{{:}{=}}x

Taking a plain, sequential execution of 𝗋𝖿𝗎𝖻{\sf{rfub}}, starting with ¬b\neg b and all other variables 0, then both rr and xx are 42 in the final state, and yy is either 0 or 42 depending on when the right-hand process interleaves. The 𝖳𝗋𝗎𝖾\mathsf{True} branch of the conditional is always executed.

Theorem 10.1.
{0_x,y,r¬b}\displaystyle\{0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\}\, 𝗋𝖿𝗎𝖻{x=42r=42b(y=0y=42)}\displaystyle{{\sf{rfub}}}^{-}\,\{x=42\mathrel{\mathstrut{\wedge}}r=42\mathrel{\mathstrut{\wedge}}b\mathrel{\mathstrut{\wedge}}(y=0\mathrel{\mathstrut{\vee}}y=42)\}
 and hence ¬0_x,y,r¬b\displaystyle\mbox{ and hence }\quad\neg\langle\!\langle 0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle\, 𝗋𝖿𝗎𝖻x=y=r=42¬b\displaystyle{{\sf{rfub}}}^{-}\,\langle\!\langle x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle
Proof.

Straightforward: the only assignment of 42 is within the conditional, after the read of yy, and hence at that point yy cannot be 42. The condition r42r\neq 42 always holds and the 𝖳𝗋𝗎𝖾\mathsf{True} path is taken, setting rr to 42 and bb to 𝖳𝗋𝗎𝖾\mathsf{True}, and finally setting xx to 42. The final value of yy depends on whether the assignment y:=xy\mathop{{:}{=}}x occurs before or after the assignment to xx. By corollary it is therefore not possible to reach a final state where all variables are 42 and ¬b\neg b.

Given this, in a sequential setting a compiler may choose to optimise the conditional as follows, letting ‘\mathrel{\ooalign{\hss{$\rightarrow$\hss}\cr{$\hookrightarrow$}}}’ stand for a compiler transformation.

(10.2) (𝐢𝐟cr42𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=42)b:=(r42);r:=42(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathchar 24635\relax\;r\mathop{{:}{=}}42)\mathrel{\ooalign{\hss{$\rightarrow$\hss}\cr{$\hookrightarrow$}}}b\mathop{{:}{=}}(r\neq 42)\mathchar 24635\relax\;r\mathop{{:}{=}}42

The conditional is eliminated, and now bb will be 𝖳𝗋𝗎𝖾\mathsf{True} if r42r\neq 42 when the branch would have been entered, and r=42r=42 in the final state. More precisely, if we let condcond be the original conditional and condcond^{\prime} be the optimised version, they both satisfy the following assuming ¬b\neg b initially, and r_initr_{\_}{init} is the initival value of rr.

{¬b}cond{r=42(br_init42))}and{¬b}cond{r=42(br_init42))}\{\neg b\}\,cond\,\{r=42\mathrel{\mathstrut{\wedge}}(b\Leftrightarrow r_{\_}{init}\neq 42))\}\quad and\quad\{\neg b\}\,cond^{\prime}\,\{r=42\mathrel{\mathstrut{\wedge}}(b\Leftrightarrow r_{\_}{init}\neq 42))\}

The transformed code preserves the expected behaviour of 𝗋𝖿𝗎𝖻{\sf{rfub}}.

Theorem 10.2.

Let 𝗋𝖿𝗎𝖻{\sf{rfub}}^{\prime} be 𝗋𝖿𝗎𝖻{\sf{rfub}} using the transformed conditional from (10.2). Then

¬0_x,y,r¬b𝗋𝖿𝗎𝖻x=y=r=42¬b\neg\langle\!\langle 0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle\,{{\sf{rfub}}^{\prime}}^{-}\,\langle\!\langle x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle
Proof.

Straightforward reasoning as in Theorem 10.1.

If we consider reordering according to the c model and semantics we have given so far, the above state is still not reachable.

Theorem 10.3.
¬0_x,y,r¬b𝗋𝖿𝗎𝖻x=y=r=42¬b\neg\langle\!\langle 0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle\,{\sf{rfub}}\,\langle\!\langle x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle
Proof.

More behaviours are possible, but it is still straightforward in our step-based semantics: if the 𝖳𝗋𝗎𝖾\mathsf{True} branch is never taken there is no way for the value 42 to be forwarded to xx and so b=𝖳𝗋𝗎𝖾b=\mathsf{True}. Making the paths explicit (defn. (3.7)) the left process of rfub is equal to:

(r:=y;r42;b:=𝖳𝗋𝗎𝖾;r:=42;x:=r)(r:=y;r=42;x:=r)(r\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis r\neq 42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r)\hskip 5.69054pt\sqcap\hskip 5.69054pt(r\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis r=42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r)

The first case allows x=y=r=42x=y=r=42, following reasoning from Theorem 10.1, but clearly bb holds in any such final state. In the second case x:=rx\mathop{{:}{=}}r is blocked by r:=yr\mathop{{:}{=}}y (but not by r=42\llparenthesis r=42\rrparenthesis), and since there are no (out-of-thin-air) assignments of 42 to yy in the concurrent process the guard will never be 𝖳𝗋𝗎𝖾\mathsf{True}, and so this branch can never be taken. Although reordering x:=42x\mathop{{:}{=}}42 before the guard is possible in the first branch (due to forwarding/optimisations), the guard r42\llparenthesis r\neq 42\rrparenthesis stays in the program as an “oracle”, preventing inconsistent behaviour. The compiler/hardware can allow the store to go ahead, but it must be checked for validity later. The assignment to xx depends on rr which, in the 𝖥𝖺𝗅𝗌𝖾\mathsf{False} branch, depends on yy, and the only way that xx can receive 42 in that case is if yy receives 42, but this is not possible as the only instance of 42 is in the 𝖳𝗋𝗎𝖾\mathsf{True} branch, which is already ruled out.

However, the compiler transformation makes the state reachable.

Theorem 10.4.

Recalling 𝗋𝖿𝗎𝖻{\sf{rfub}}^{\prime} is 𝗋𝖿𝗎𝖻{\sf{rfub}} using the transformed conditional in (10.2),

0_x,y,r¬b𝗋𝖿𝗎𝖻x=y=r=42¬b\langle\!\langle 0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle\,{\sf{rfub}}^{\prime}\,\langle\!\langle x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle
Proof.

In the left-hand process of 𝗋𝖿𝗎𝖻{\sf{rfub}}^{\prime} the assignment x:=rx\mathop{{:}{=}}r can be reordered to be the first instruction executed, which by forwarding becomes x:=42x\mathop{{:}{=}}42. More formally, by Law 9.7 we get the following.

(10.3) r:=y;b:=(r42);r:=42;x:=rx:=42(r:=y;b:=(r42);r:=42)\displaystyle r\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}b\mathop{{:}{=}}(r\neq 42)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r~{}~{}\mathrel{\sqsubseteq}~{}~{}x\mathop{{:}{=}}42\centerdot(r\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}b\mathop{{:}{=}}(r\neq 42)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)

Now consider the effect of interleaving y:=xy\mathop{{:}{=}}x from the second process immediately after this assignment.

{0_x,y,r¬b}x:=42y:=xr:=yb:=(r42)r:=42{x=y=r=42¬b}\{0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\}\,x\mathop{{:}{=}}42\centerdot y\mathop{{:}{=}}x\centerdot r\mathop{{:}{=}}y\centerdot b\mathop{{:}{=}}(r\neq 42)\centerdot r\mathop{{:}{=}}42\,\{x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\}

Because r:=yr\mathop{{:}{=}}y reads 42 the value assigned to bb is 𝖥𝖺𝗅𝗌𝖾\mathsf{False}, and thus ¬b\neg b holds in the final state. The proof is completed by Theorem 5.18.

The difficulty with this result is that, since ¬b\neg b in the final state, the 𝖳𝗋𝗎𝖾\mathsf{True} branch of the original conditional from 𝗋𝖿𝗎𝖻{\sf{rfub}} could not have been taken (since it contains b:=𝖳𝗋𝗎𝖾b\mathop{{:}{=}}\mathsf{True}), therefore the 𝖥𝖺𝗅𝗌𝖾\mathsf{False} branch was taken, i.e., the condition r42r\neq 42 fails, and hence r=42r=42. But the only way for r=42r=42 to hold is if the 𝖳𝗋𝗎𝖾\mathsf{True} branch is/was executed, containing r:=42r\mathop{{:}{=}}42, but this has already been ruled out.

Since 𝗋𝖿𝗎𝖻{\sf{rfub}}^{\prime} has a behaviour which 𝗋𝖿𝗎𝖻{\sf{rfub}} does not, it cannot be the case that 𝗋𝖿𝗎𝖻𝗋𝖿𝗎𝖻{\sf{rfub}}\mathrel{\sqsubseteq}{\sf{rfub}}^{\prime}, which suggests the compiler transformation is not valid. Essentially, the transformation eliminates a guard and hence breaks a dependency-cycle check.

The question remains, however, whether the compiler transformation (10.2) is reasonable in the presence of the c memory model, and if so what the implications for the memory model are. We show how we can tweak the framework (specifically, increasing the circumstances under which the concept of forwarding can apply) to allow the possible state using the original version of 𝗋𝖿𝗎𝖻{\sf{rfub}} (thus justifying the compiler transformation). However, there are other consequences. We can straightforwardly accommodate either outcome in this framework, and can do so with a clear choice that has enumerable consequences to other code: can guards be used to simplify expressions?

10.2. Self-fulfilling prophecies (forwarding guards)

In Defn. (9.1) we defined βα\guillemetright{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta, for α\alpha an assignment, to update expressions in β\beta according to the assignment. If α\alpha is a guard (or fence) then βα\guillemetright=β{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta=\beta. However, it is reasonable, in the sequential world at least, to allow guards to assist in transformations.

We introduce an extended version of forwarding, where α_\guillemetright\guillemetrightβ{}_{\_}{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}\beta returns a set of possible outcomes.

(10.4) x_:=e\guillemetright\guillemetrightα\displaystyle{}_{\_}{x\mathop{{:}{=}}e\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}\alpha =\displaystyle= {αx:=e\guillemetright}\displaystyle\{{}_{x\mathop{{:}{=}}e\mbox{\raisebox{-0.5pt}{\guillemetright}}}\alpha\}
(10.5) b_\guillemetright\guillemetrighte\displaystyle{}_{\_}{\llparenthesis b\rrparenthesis\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}\llparenthesis e\rrparenthesis =\displaystyle= {e|b(ee)}\displaystyle\{\llparenthesis e^{\prime}\rrparenthesis|b\Rightarrow(e^{\prime}\Leftrightarrow e)\}
(10.6) b_\guillemetright\guillemetrightx:=e\displaystyle{}_{\_}{\llparenthesis b\rrparenthesis\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}x\mathop{{:}{=}}e =\displaystyle= {x:=e|b(e=e)}\displaystyle\{x\mathop{{:}{=}}e^{\prime}|b\Rightarrow(e^{\prime}=e)\}

If α\alpha is an assignment it returns a singleton set containing just βα\guillemetright{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta (10.4), e.g., r_:=42\guillemetright\guillemetrightx:=r={x:=42}{}_{\_}{r\mathop{{:}{=}}42\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}x\mathop{{:}{=}}r=\{x\mathop{{:}{=}}42\}. However, if α\alpha is a guard b\llparenthesis b\rrparenthesis then bb can be used as context to modify (optimise) β\beta, e.g., r=42_\guillemetright\guillemetrightx:=r={x:=42,}{}_{\_}{\llparenthesis r=42\rrparenthesis\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}x\mathop{{:}{=}}r=\{x\mathop{{:}{=}}42,\ldots\} by (10.5) and (10.6).

We modify the reordering triple defn. (9.2) to accommodate α_\guillemetright\guillemetrightβ{}_{\_}{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}\beta in place of βα\guillemetright{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta.

(10.7) β\guillemetleftα\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} α\ext@arrow0055\Leftarrowfill@cββα_\guillemetright\guillemetrightββ𝗈𝖼β\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta^{\prime}\mathrel{\mathstrut{\wedge}}\beta^{\prime}\in{}_{\_}{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright\!\guillemetright}}}\beta\mathrel{\mathstrut{\wedge}}\beta\overset{{\sf{oc}}}{\succeq}\beta^{\prime}

Thus we incorporate using the guards in conditionals to justify reordering via Rule 9.3. For instance, following from above, x:=42\guillemetleftr=42\guillemetleftx:=rx\mathop{{:}{=}}42\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\llparenthesis r=42\rrparenthesis\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}x\mathop{{:}{=}}r.

The use of defn. (10.7) for the reordering triple used in Rule 9.3, and the derived laws such as Law 9.7, allows behaviours that weren’t possible before, for instance,

r=42;x:=r\displaystyle\llparenthesis r=42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r \displaystyle\mathrel{\sqsubseteq} x:=42r=42\displaystyle x\mathop{{:}{=}}42\centerdot\llparenthesis r=42\rrparenthesis

Since we also have

(r42;b:=𝖳𝗋𝗎𝖾;r:=42);x:=r\displaystyle(\llparenthesis r\neq 42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r \displaystyle\mathrel{\sqsubseteq} x:=42(r42;b:=𝖳𝗋𝗎𝖾;r:=42)\displaystyle x\mathop{{:}{=}}42\centerdot(\llparenthesis r\neq 42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)

(which holds under defn. (9.2) as well as defn. (10.7)) we can show that a trailing assignment can use information from a preceding conditional for simplification.

(10.8) (𝐢𝐟cr42𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=42);x:=rx:=42(𝐢𝐟cr42𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=42)(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r~{}~{}\mathrel{\sqsubseteq}~{}~{}x\mathop{{:}{=}}42\centerdot(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathchar 24635\relax\;r\mathop{{:}{=}}42)

The trailing assignment x:=rx\mathop{{:}{=}}r can be executed first, using the value 42 for rr, since both branches imply that will always be the value assigned to xx. We call this use of a guard to simplify a later assignment as a self-fulfilling prophecy, for reasons which will become clear below.

10.3. Behaviour of read-from-untaken-branch with self-fulfilling prophecies

We now return to the behaviour of 𝗋𝖿𝗎𝖻{\sf{rfub}}, this time allowing self-fulfilling prophecies via forwarding (defn. (10.7)). To highlight the impact this has we note refinement steps that are allowed under SFPs (using defn. (10.7) for the antecedent of Rule 9.3) using sfp\overset{{sfp}}{\mathrel{\sqsubseteq}}, and steps that are allowed normally (using defn. (9.2) for the antecedent of Rule 9.3) as \mathrel{\sqsubseteq}. The following theorem stands in contradiction to Theorem 10.3.

Theorem 10.5.

Allowing self-fulfilling prophecies,

(10.9) 0_x,y,r¬b𝗋𝖿𝗎𝖻x=y=r=42¬b\langle\!\langle 0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle\,{\sf{rfub}}\,\langle\!\langle x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\rangle\!\rangle
Proof.

Let 𝗋𝖿𝗎𝖻_1{\sf{rfub}}_{\_}1 (resp. 𝗋𝖿𝗎𝖻_2{\sf{rfub}}_{\_}2) refer to the first (resp. second) process of 𝗋𝖿𝗎𝖻{\sf{rfub}} (defn. (10.1)).

(10.10) 𝗋𝖿𝗎𝖻_1\displaystyle{\sf{rfub}}_{\_}1 =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} r:=y;(𝐢𝐟cr42𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=42);x:=42\displaystyle r\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}42
(10.11) sfp\displaystyle\overset{{sfp}}{\mathrel{\sqsubseteq}} x:=42r:=y;(𝐢𝐟cr42𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=42)by (10.8)\displaystyle x\mathop{{:}{=}}42\centerdot r\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)\quad\mbox{by (\ref{eqn:rfub-if-reorder})}
(10.12) \displaystyle\mathrel{\sqsubseteq} x:=42r:=yr=42by defn. (3.7), Law 4.2\displaystyle x\mathop{{:}{=}}42\centerdot r\mathop{{:}{=}}y\centerdot\llparenthesis r=42\rrparenthesis\quad\mbox{by defn.~{}(\ref{eqndefn:if}), Law~{}\ref{law:chooseL}}

Now we fix the interleaving.

(10.13) 𝗋𝖿𝗎𝖻sfp(x:=42r:=yr=42)y:=xx:=42y:=xr:=yr=42\displaystyle{\sf{rfub}}\hskip 5.69054pt\overset{{sfp}}{\mathrel{\sqsubseteq}}\hskip 5.69054pt(x\mathop{{:}{=}}42\centerdot r\mathop{{:}{=}}y\centerdot\llparenthesis r=42\rrparenthesis)\parallel y\mathop{{:}{=}}x\hskip 5.69054pt\mathrel{\sqsubseteq}\hskip 5.69054ptx\mathop{{:}{=}}42\centerdot y\mathop{{:}{=}}x\centerdot r\mathop{{:}{=}}y\centerdot\llparenthesis r=42\rrparenthesis

This interleaving reaches the specified state.

(10.14) {0_x,y,r¬b}x:=42y:=xr:=yr=42{x=y=r=42¬b}\{0_{\_}{x,y,r}\mathrel{\mathstrut{\wedge}}\neg b\}\,x\mathop{{:}{=}}42\centerdot y\mathop{{:}{=}}x\centerdot r\mathop{{:}{=}}y\centerdot\llparenthesis r=42\rrparenthesis\,\{x=y=r=42\mathrel{\mathstrut{\wedge}}\neg b\}

The proof is completed by Theorem 5.18.

This theorem shows that the debated outcome of rfub, which is possible under a seemingly reasonable compiler transformation, arises naturally if conditionals are allowed to modify future instructions. Indeed, this is essentially what is used to justify the transformation itself: if the 𝖥𝖺𝗅𝗌𝖾\mathsf{False} branch is taken then already r=42r=42, so any later use of rr can be assumed to use the value 42. However, in a concurrent, reordering setting, there may be unexpected consequences.777 Forbidding branches from simplifying later calculations does not prevent all reasonable compiler optimisations, for instance, (𝐢𝐟cb𝐭𝐡𝐞𝐧;r:=42𝐞𝐥𝐬𝐞;r:=42);x:=r\displaystyle(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}b\mathrel{\mathbf{then}}\ldots\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42\mathrel{\mathbf{else}}\ldots\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}r \displaystyle\mathrel{\sqsubseteq} x:=42(𝐢𝐟cr42𝐭𝐡𝐞𝐧𝐞𝐥𝐬𝐞);r:=42\displaystyle x\mathop{{:}{=}}42\centerdot(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq 42\mathrel{\mathbf{then}}\ldots\mathrel{\mathbf{else}}\ldots)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}42 This is because there is a definite assignment to rr in both branches.

We do not intend to take a stand about what is ultimately the best choice for C; rather we show that whichever choice is taken can be linked to clear principles about whether or not guards should be allowed to simplify assignments in a concurrent setting. Note that we can explain it in a step-based semantics.

10.4. Behaviour of out-of-thin-air with self-fulfilling prophecies

Recall the out-of-thin-air example 𝗈𝗈𝗍𝖺_𝖣\mathsf{oota_{\_}D} (defn. (6.20)).

(10.15) 𝗈𝗈𝗍𝖺_𝖣=^r_1:=x;(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=r_1)r_2:=y;(𝐢𝐟cr_2=42𝐭𝐡𝐞𝐧x:=r_2)\mathsf{oota_{\_}D}~{}~{}\mathrel{\mathstrut{\widehat{=}}}~{}~{}r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}r_{\_}1)~{}~{}\parallel~{}~{}r_{\_}2\mathop{{:}{=}}y\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}2=42\mathrel{\mathbf{then}}x\mathop{{:}{=}}r_{\_}2)

Its behaviour, which is intended to be forbidden, is allowed under SFPs, in contrast to Theorem 6.7.

Theorem 10.6.

Under SFPs, 0_x,y,r_1,r_2𝗈𝗈𝗍𝖺_𝖣x=42y=42\langle\!\langle 0_{\_}{x,y,r_{\_}1,r_{\_}2}\rangle\!\rangle\,\mathsf{oota_{\_}D}\,\langle\!\langle x=42\mathrel{\mathstrut{\wedge}}y=42\rangle\!\rangle.

Proof.

Focus on the left-hand process, and in particular the behaviour when the 𝖳𝗋𝗎𝖾\mathsf{True} branch is taken. The key step depends on y:=42\guillemetleftr_1=42\guillemetlefty:=r_1y\mathop{{:}{=}}42\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\llparenthesis r_{\_}1=42\rrparenthesis\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}y\mathop{{:}{=}}r_{\_}1, that is, the guard can be used to simplify the inner assignment.

(10.16) r_1:=x;(𝐢𝐟cr_1=42𝐭𝐡𝐞𝐧y:=r_1𝐞𝐥𝐬𝐞)\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r_{\_}1=42\mathrel{\mathbf{then}}y\mathop{{:}{=}}r_{\_}1\mathrel{\mathbf{else}}) \displaystyle\mathrel{\sqsubseteq} r_1:=x;r_1=42;y:=r_1\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}\llparenthesis r_{\_}1=42\rrparenthesis\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}r_{\_}1
(10.17) sfp\displaystyle\overset{{sfp}}{\mathrel{\sqsubseteq}} r_1:=x;y:=42r_1=42\displaystyle r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}y\mathop{{:}{=}}42\centerdot\llparenthesis r_{\_}1=42\rrparenthesis
(10.18) \displaystyle\mathrel{\sqsubseteq} y:=42r_1:=xr_1=42\displaystyle y\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot\llparenthesis r_{\_}1=42\rrparenthesis

Using symmetric reasoning in the second process and interleaving we have the following behaviour.

(10.19) 𝗈𝗈𝗍𝖺_𝖣sfpx:=42y:=42r_1:=xr_2:=yr_1=42r_2=42\mathsf{oota_{\_}D}\overset{{sfp}}{\mathrel{\sqsubseteq}}x\mathop{{:}{=}}42\centerdot y\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot r_{\_}2\mathop{{:}{=}}y\centerdot\llparenthesis r_{\_}1=42\rrparenthesis\centerdot\llparenthesis r_{\_}2=42\rrparenthesis

Here both stores have been promoted to the first instruction executed, which are then read into local registers and subsequently satisfy the guards (a “satisfaction cycle” (Batty et al., 2013)). The postcondition is straightforwardly reached.

(10.20) {0_x,y,r_1,r_2}x:=42y:=42r_1:=xr_2:=yr_1=42r_2=42{x=42y=42}\{0_{\_}{x,y,r_{\_}1,r_{\_}2}\}\,x\mathop{{:}{=}}42\centerdot y\mathop{{:}{=}}42\centerdot r_{\_}1\mathop{{:}{=}}x\centerdot r_{\_}2\mathop{{:}{=}}y\centerdot\llparenthesis r_{\_}1=42\rrparenthesis\centerdot\llparenthesis r_{\_}2=42\rrparenthesis\,\{x=42\mathrel{\mathstrut{\wedge}}y=42\}

The proof is completed by Theorem 5.18.

This demonstrates the problematic nature of allowing SFPs, and unifies the known underlying problem with these two related patterns (𝗈𝗈𝗍𝖺\mathsf{oota} and rfub).

11. Further extensions

In this section we discuss some other extensions to the model that may be of interest in some domains; however we emphasise that the definition of the memory model does not need to change, we simply make the language and its execution model richer.

11.1. Power and non-multicopy atomicity

IBM’s multicore Power architecture (Maranget et al., 2012; Sarkar et al., 2011; Mador-Haim et al., 2012) (one of the first commercially available) has processor pipeline reorderings similar to Arm (Flur et al., 2016), but in addition has a cache coherence system that provides weaker guarantees than that of Arm (and x86): (cached) memory operations can appear in different orders to different processes. For instance, although on ARM the instructions x:=1x\mathop{{:}{=}}1 and y:=1y\mathop{{:}{=}}1 may be reordered in the pipeline of a particular processor, whichever order they occur in will be the same for every other process in the system. On Power, however, one process may see them in one order and another may see the modification in the reverse order (assuming no fences are used). This extra layer of complexity is introduced through cache interactions which allow one processor that shares a cache with another to read writes early, before the memory operation is fully committed to central storage (Sarkar et al., 2011).

Programmers targetting implementations on Power architectures must take these extra behaviours into account (C accommodates the lack of multicopy atomicity to be compatible with this hardware). A formalisation of the Power cache system, separate from the processor-level reordering, and compatible with the framework in this paper, is given in (Colvin and Smith, 2018), which is based on an earlier formalisation of the Power memory subsystem given in (Sarkar et al., 2011). The cache subsystem, and its formalisation in (Colvin and Smith, 2018) sits conceptually above the processes within the system, and stores and loads interact with it rather than with the traditional variable-to-value mapping. As such, local reorderings, and the influence of ordering constraints and fences are still relevant. However under the influence of such a system, no code can be assumed to be atomic-syntax-structured (Defn. (7.1)), and thus traditional syntax-based reasoning techniques will not directly apply; however by instituting the system of (Colvin and Smith, 2018) the consequence on behaviours can be determined. We do not devote more time to this feature of the C model in this paper for several reasons.

  • Power is still subject to processor reorderings which are captured by the concepts herein; the memory subsystem is governed by separate mechanisms;

  • Power is the only commercially available hardware that exhibits such behaviours. Arm once allowed them, but no manufacturer ever implemented them on a real chip, and Arm has since removed these behaviours from its model (Pulte et al., 2017). A large reason for omitting the possibility of such weak behaviours was due to the complexity of the induced axiomatic models (Alglave et al., 2014), which were difficult to specify and reason about. Arm is generally considered one of the faster architectures, and does not appear to suffer a significant penalty due to not having a similarly weak cache coherence system. (Intel’s Itanium architecture, and mooted memory model (e.g.(Yang et al., 2003)), was also intended to allow these weak behaviours; however it ceased production in 2020.)

  • In practice, Power’s weak behaviours are so difficult to reason about that developers make heavy use of fences to eliminate the effects; and as a result whatever performance gains there may have been are eliminated.

  • The behaviours that arise are entirely to do with that micro-architectural feature and not due to compiler transformations or processor reorderings. Reasoning about C memory model features that affect local code (for, say, x86) architectures carries over identically to reasoning about local code in Power.

  • Although, theoretically, under the C memory model a compiler could instrument code transformations that mimic the behaviour of Power’s cache system on architectures that do not exhibit them natively, this seems highly unlikely; therefore, only developers targetting Power will ever experience the resulting complexity (which arguably can and must be removed by the insertion of fences).

Outside of such considerations, the C model as we have presented it is relatively straightforward; the behaviour introduced by Power’s cache system is completely separate to pipeline reordering and compiler transformations. It seems unfortunate to complicate the entirety of the C memory model to accommodate one architecture, when its effects can be captured in a separate memory subsystem specification (which can be ignored by developers targetting different architectures). Hence we recommend keeping the extra behaviours induced by this separate mechanism, peculiar to a single currently-in-production architecture, separately specified within the model; we point to (Sarkar et al., 2011; Pichon-Pharabod and Sewell, 2016; Colvin and Smith, 2018) as examples of how this can be done, which fulfil similar roles to time-stamped event structures in (Kang et al., 2017a) and the shared action graph of (Wright et al., 2021).

11.2. Incorporating compiler transformations

Any possible compiler transformation can be straightforwardly incorporated into our framework as an operational rule. For instance, if patternpattern is some generic code pattern that can be transformed into patternpattern^{\prime} then this can become a rule patternτpatternpattern\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}pattern^{\prime}, i.e., a silent transformation that can occur at any time (an example of such a transformation might be to refactor a loop). The downside of including such transformations within the semantics is, of course, depending on the structural similarity of patternpattern and patternpattern^{\prime}, one cannot expect to reason about patternpattern using syntax-directed proof rules (one has broken the principle of atomic-syntax-structured code, Defn. (7.1)). Of course, trace-based approached can still be applied.

As an example, consider a transformation discussed in Sect. 10.1, which simplifies and removes a conditional. This can be expressed as an operational rule that applies in some circumstances.

Rule 11.1 (Eliminate conditional).
(𝐢𝐟crn𝐭𝐡𝐞𝐧b:=𝖳𝗋𝗎𝖾;r:=n)τ(b:=(r=n);r:=n)(\mathrel{\mathbf{if}}^{{{\textsc{c}}}}r\neq n\mathrel{\mathbf{then}}b\mathop{{:}{=}}\mathsf{True}\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}n)\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}(b\mathop{{:}{=}}(r=n)\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r\mathop{{:}{=}}n)

As discussed in Sect. 10.1, allowing this transformation has significant effects on behaviours in a wider context. Consider also transformations to eliminate redundant loads or stores, the simplest instrumentation of which are given by the following rules.

Rule 11.2 (Load coalescing).
r_1:=x;r_2:=xτr_1:=x;r_2:=r_1r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}2\mathop{{:}{=}}x\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}r_{\_}1\mathop{{:}{=}}x\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}r_{\_}2\mathop{{:}{=}}r_{\_}1
Rule 11.3 (Write coalescing).
x:=e_1;x:=e_2τx:=e_2x\mathop{{:}{=}}e_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}x\mathop{{:}{=}}e_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\tau}}{{\longrightarrow}}}x\mathop{{:}{=}}e_{\_}2

These transformations eliminate an interaction with main memory. Rule 11.2 reduces the number of behaviours of the program since r_2r_{\_}2 will always take the same final value as r_1r_{\_}1, whereas in the original code it is possible for them to receive different final values. The transformation encoded in Rule 11.3 reduces the overall behaviours of the system as a parallel process can never receive the value of e_1e_{\_}1 (of course, these transformations could be made dependent on memory ordering constraints using, for instance, the 𝗈𝖼\overset{{\sf{oc}}}{\succeq} relation). We believe it should be outside of the scope of the definition of C memory model, and certainly outside the scope of this work, to consider every possible transformation of every possible compiler; however we have provided a framework in which the consequences of a particular transformation can be relatively straightforwardly assessed, separately to the specification and principles of the memory model itself. In particular this may feed in to the development of verified compilers, and the development of a bespoke set of transformations that are valid within concurrent systems.

12. Related work

The development of the C (and C++) memory model is ongoing and governed by an ISO committee (but heavily influenced by compiler writers (Memarian et al., 2016)), covering the vast range of features of the language itself and including a semi-formal description of the memory model. Boehm and Adve (Boehm and Adve, 2008) were highly influential initially, building on earlier work on memory model specifications (e.g., (Lamport, 1979; Dubois et al., 1986; Shasha and Snir, 1988; Gharachorloo et al., 1990; Jones et al., 1998; Arvind and Maessen, 2006; Fox and Harman, 2000)). Since then many formal approaches have been taken to further understand the implications of the model and tighten its specification, especially the works of Batty et. al (Batty et al., 2011, 2013; Jeffrey et al., 2022) and Vafeiadis et. al (Vafeiadis et al., 2015; Kang et al., 2017b; Lahav et al., 2016, 2017). The model abstracts away from the various hardware memory models that C is designed to run on (e.g., Arm, x86, Power, RISC-V, SPARC), leaving it to the compiler writer to translate the higher-level program code into assembler-level primitives that will provide the behaviour specified by the C model. This behaviour is described with respect to a cross-thread “happens-before” order, which is influenced by so called “release sequences” of events within the system as a whole. As a formalism this is difficult to reason about, and in particular, it removes the ability to think thread-locally – whether or not enough fences or constraints have been placed in a thread requires an analysis of all other events in the system, even though, with the exception of the Power architecture (see Sect. 11.1), any weak behaviours are due to local compiler transformations or instruction-level parallelisation at the assembler level. Our observations of a myriad of discussions on programming discussion boards online is that programmers tend to think in terms of local reorderings, and appealing to cross-thread concepts is not a convenient abstraction. The framework we present here is based on a close abstraction of instruction-level parallelisation that occurs in real pipelined processors or instruction shuffling that a compiler may perform. For simple programs, where enough synchronisations have been inserted to maintain order on the key instructions, reasoning can be shown to reduce to a well-known sequential form; or specific reorderings that are problematic can be elucidated. The underlying semantics model is the standard interleaving of relations on states (mappings from variables to values). We argue this is simpler and intuitive – as far as instruction reorderings can be considered so – than the current description in the C standard. Part of the complication of the standard arises from handling the complexity of the Power cache coherence system, which cannot be encoded in a thread-local manner; however those complications can be treated separately (Sarkar et al., 2011; Pichon-Pharabod and Sewell, 2016; Colvin and Smith, 2018), and in any case, the Power model also involves instruction-level parallelism governed by the same principles as the other major architectures (Arm, x86, RISC-V, SPARC).

In comparison with other formalisms in the literature (Chakraborty and Vafeiadis, 2019; Ou and Demsky, 2018; Nienhuis et al., 2016), many use an axiomatic approach (as exemplified by (Alglave et al., 2014)), which are necessarily cross-thread specifications based on the happens-before relationship, many use a complex operational semantics, and many combine the two. The Promising semantics (Kang et al., 2017a; Lee et al., 2020) is operational in flavour but has a complex semantics (Chakraborty and Vafeiadis, 2019) involving time-stamped operations and several abstract global data structures for managing events. In these frameworks reasoning about even simple cases is problematic, whereas in our framework the effects of reordering can be analysed by reduction rules at the program-level (Sect. 4), and our underlying model is that of the typical relations on states and so admits standard techniques (Sect. 5). A recent advance in reasoning for the C memory model is that of Wright et. al (Wright et al., 2021), but that framework appeals to a shared event graph structure, and does not consider memory fences or sc constraints. Very few of the formalisms surveyed have a simple way of address consume (con) constraints (Sect. 8.2); we also argue that our formal approach to understanding pathological behaviours such as out-of-thin-air (Sect. 6.3) and read-from-untaken-branch (Sect. 10.1) provides a clearer framework for understanding and addressing their consequences.

An overarching philosophy for our approach has been that of a separation-of-concerns, meaning that a programmer can consider which mechanisms for enforcing order should suffice for their concurrent code; of course, if their ordering mechanisms are embedded in complex expressions that may be optimised (Sect. 8) or incrementally evaluated (Sect. 7) then the picture becomes more complex, but this can be analysed separately, and without regard to how such features interact with cross-thread, complex, abstract data structures controlling events. The reordering framework we present here is based on earlier work (Colvin, 2021a, b; Colvin and Smith, 2018), which provides a model checker (written in Maude (Clavel et al., 2002)) and machine-checked refinement rules (in Isabelle/HOL (Paulson, 1994; Nipkow et al., 2002)) for the language in Sect. 3.2 (i.e., ignoring the possibility of incremental evaluation and optimisations) with forwarding (Sect. 9). We straightforwardly encoded the definition of the C memory model (Model 2.6) in the model checker and used it on the examples in this paper as well as those provided by the Cerberus project (et. al, 2022). The framework has provided the basis for other analyses involving security (Colvin and Winter, 2020; Smith et al., 2019; Winter et al., 2021) and automated techniques (Coughlin et al., 2021; Singh et al., 2021)

13. Conclusions

We have given a definition of the C memory model which keeps the fundamental concepts involved (fences, ordering constraints, and data dependencies) separated from other aspects of the language such as expression evaluation and optimisations, which are present regardless of whether or not the memory model is considered. Provided programmers keep to a reasonable discipline of programming that aids analysis of concurrent programs, i.e., program statements are generally indivisible (at most one shared variable per-expression), our framework facilitates structured reasoning about the code. This involves elucidating the effect of orderings on the code as-written, and then applying existing techniques for establishing the desired properties. We argue that our framework more closely expresses how programmers of concurrent code think about memory model effects than the abstraction given in the current standard (cross-thread happens-before relationships and release sequences). This is largely because the effects that a C programmer needs to consider are compiler reorderings of instructions for efficiency reasons, or architecture-level reorderings in instruction pipelines. The C language is rich in features and any claim to a full semantics of arbitrary C code with concurrency requires a full semantics of C in general, and as far as we are aware this has not been fully realised as yet; but we intend that our approach can be relatively straightforwardly incorporated into such by virtue of its separation of concerns - the fundamental properties of the memory model are universal and consistent even in the presence of complicating factors.

We note that the difficulties that arise in the attempt to formalise the C memory model stem from the tension between well-established compiler transformations as well as the need to support a multitude of hardware-level memory models seamlessly versus the well-known intricacies of programming correct shared-variable algorithms (Lau et al., 2019). This will be an ongoing balancing act that involves many competing factors, especially and including efficiency, and, increasingly, safety and security (Liu et al., 2021); if we were to take a position, it would be that sections of code – hopefully, relatively small and localised – can be protected from arbitrary transformations from compiler theory and practice. For instance, a C scoping construct concurrent { \ldots } which is compiled with minimal optimisations or reorderings, regardless of command-line flags such as -O. The C standard may then be able to provide guarantees about executions for such delimited code, while also allowing the programmer flexibility to make use of optimisations where they have determined they are applicable.

References

  • (1)
  • Alglave et al. (2014) Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory. ACM Trans. Program. Lang. Syst. 36, 2, Article 7 (July 2014), 74 pages. https://doi.org/10.1145/2627752
  • Apt and Olderog (2019) Krzysztof R Apt and Ernst-Rüdiger Olderog. 2019. Fifty years of Hoare’s logic. Formal Aspects of Computing 31, 6 (2019), 751–807.
  • Arvind and Maessen (2006) Arvind and Jan-Willem Maessen. 2006. Memory Model = Instruction Reordering + Store Atomicity. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA ’06). IEEE Computer Society, USA, 29–40. https://doi.org/10.1109/ISCA.2006.26
  • Batty et al. (2013) Mark Batty, Mike Dodds, and Alexey Gotsman. 2013. Library Abstraction for C/C++ Concurrency. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Rome, Italy) (POPL ’13). Association for Computing Machinery, New York, NY, USA, 235–248. https://doi.org/10.1145/2429069.2429099
  • Batty et al. (2011) Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ Concurrency. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) (POPL ’11). Association for Computing Machinery, New York, NY, USA, 55–66. https://doi.org/10.1145/1926385.1926394
  • Boehm and Adve (2008) Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ Concurrency Memory Model (PLDI ’08). ACM, 68–78. https://doi.org/10.1145/1375581.1375591
  • Brookes (2007) Stephen Brookes. 2007. A semantics for concurrent separation logic. Theoretical Computer Science 375, 1-3 (2007), 227 – 270. https://doi.org/10.1016/j.tcs.2006.12.034
  • Chakraborty and Vafeiadis (2019) Soham Chakraborty and Viktor Vafeiadis. 2019. Grounding Thin-Air Reads with Event Structures. Proc. ACM Program. Lang. 3, POPL, Article 70 (jan 2019), 28 pages. https://doi.org/10.1145/3290383
  • Clavel et al. (2002) Manuel Clavel, Francisco Duran, Steven Eker, Patrick Lincoln, Narciso Marti-Oliet, José Meseguer, and José F. Quesada. 2002. Maude: specification and programming in rewriting logic. Theoretical Computer Science 285, 2 (2002), 187 – 243. https://doi.org/10.1016/S0304-3975(01)00359-0
  • Coleman and Jones (2007) Joey W. Coleman and Cliff B. Jones. 2007. A Structural Proof of the Soundness of Rely/guarantee Rules. J. Log. Comput. 17, 4 (2007), 807–841.
  • Colvin (2021a) Robert J. Colvin. 2021a. Parallelized Sequential Composition and Hardware Weak Memory Models. In Software Engineering and Formal Methods, Radu Calinescu and Corina S. Păsăreanu (Eds.). Springer, Cham, 201–221.
  • Colvin (2021b) Robert J. Colvin. 2021b. Parallelized sequential composition, pipelines, and hardware weak memory models. CoRR abs/2105.02444 (2021). arXiv:2105.02444 https://arxiv.org/abs/2105.02444
  • Colvin et al. (2017) Robert J. Colvin, Ian J. Hayes, and Larissa A. Meinicke. 2017. Designing a semantic model for a wide-spectrum language with concurrency. Formal Aspects of Computing 29, 5 (01 Sep 2017), 853–875. https://doi.org/10.1007/s00165-017-0416-4
  • Colvin and Smith (2018) Robert J. Colvin and Graeme Smith. 2018. A Wide-Spectrum Language for Verification of Programs on Weak Memory Models. In Formal Methods, Klaus Havelund, Jan Peleska, Bill Roscoe, and Erik de Vink (Eds.). Springer, Cham, 240–257. https://doi.org/10.1007/978-3-319-95582-7_14
  • Colvin and Winter (2020) Robert J. Colvin and Kirsten Winter. 2020. An Abstract Semantics of Speculative Execution for Reasoning About Security Vulnerabilities. In E. Sekerinski et al., editors, Formal Methods 2019 International Workshops, pages 323–341, Springer.
  • Coughlin et al. (2021) Nicholas Coughlin, Kirsten Winter, and Graeme Smith. 2021. Rely/Guarantee Reasoning for Multicopy Atomic Weak Memory Models. In Formal Methods, M. Huisman, C. Păsăreanu, and N. Zhan (Eds.). Springer, 292–310.
  • Derrick and Smith (2017) John Derrick and Graeme Smith. 2017. An Observational Approach to Defining Linearizability on Weak Memory Models. In Formal Techniques for Distributed Objects, Components, and Systems, Ahmed Bouajjani and Alexandra Silva (Eds.). Springer International Publishing, Cham, 108–123.
  • Derrick et al. (2014) John Derrick, Graeme Smith, and Brijesh Dongol. 2014. Verifying Linearizability on TSO Architectures. In Integrated Formal Methods: 11th International Conference, IFM 2014, Bertinoro, Italy, September 9-11, 2014, Proceedings, Elvira Albert and Emil Sekerinski (Eds.). Springer International Publishing, Cham, 341–356. https://doi.org/10.1007/978-3-319-10181-1_21
  • Dijkstra and Scholten (1990) Edsger W. Dijkstra and Carel S. Scholten. 1990. Predicate Calculus and Program Semantics. Springer-Verlag, Berlin, Heidelberg.
  • Dubois et al. (1986) M. Dubois, C. Scheurich, and F. Briggs. 1986. Memory Access Buffering in Multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture (Tokyo, Japan) (ISCA ’86). IEEE Computer Society Press, 434–442.
  • et. al (2022) Peter Sewell et. al. 2022. The Cerberus project. https://www.cl.cam.ac.uk/~pes20/cerberus/ https://www.cl.cam.ac.uk/~pes20/cerberus/ Accessed March 2022.
  • Filipović et al. (2010) Ivana Filipović, Peter O’Hearn, Noam Rinetzky, and Hongseok Yang. 2010. Abstraction for concurrent objects. Theoretical Computer Science 411, 51 (2010), 4379 – 4398. https://doi.org/10.1016/j.tcs.2010.09.021 European Symposium on Programming 2009.
  • Fischer and Ladner (1979) Michael J. Fischer and Richard E. Ladner. 1979. Propositional dynamic logic of regular programs. J. Comput. System Sci. 18, 2 (1979), 194 – 211. https://doi.org/10.1016/0022-0000(79)90046-1
  • Flur et al. (2016) Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell. 2016. Modelling the ARMv8 Architecture, Operationally: Concurrency and ISA (POPL 2016). ACM, New York, NY, USA, 608–621. https://doi.org/10.1145/2837614.2837615
  • Fox and Harman (2000) A. C. J. Fox and N. A. Harman. 2000. Algebraic Models of Correctness for Microprocessors. Formal Aspects of Computing 12, 4 (2000), 298–312. https://doi.org/10.1007/PL00003936
  • Gharachorloo et al. (1990) Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. 1990. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors (ISCA 1990). ACM, 15–26. https://doi.org/10.1145/325164.325102
  • Hayes et al. (2012) Ian Hayes, Cliff Jones, and Robert Colvin. 2012. Refining rely-guarantee thinking. Technical Report CS-TR-1334. Newcastle University. http://www.cs.ncl.ac.uk/research/pubs/trs/papers/1334.pdf
  • Hayes et al. (2013) Ian J. Hayes, Alan Burns, Brijesh Dongol, and Cliff B. Jones. 2013. Comparing Degrees of Non-Determinism in Expression Evaluation. Comput. J. 56, 6 (02 2013), 741–755. https://doi.org/10.1093/comjnl/bxt005
  • Hayes et al. (2014) Ian J. Hayes, Cliff B. Jones, and Robert J. Colvin. 2014. Laws and semantics for rely-guarantee refinement. Technical Report CS-TR-1425. Newcastle University.
  • Hayes et al. (2021) Ian J. Hayes, Larissa A. Meinicke, and Patrick A. Meiring. 2021. Deriving Laws for Developing Concurrent Programs in a Rely-Guarantee Style. CoRR abs/2103.15292 (2021). arXiv:2103.15292 https://arxiv.org/abs/2103.15292
  • Hayes et al. (2019) Ian J. Hayes, Larissa A. Meinicke, Kirsten Winter, and Robert J. Colvin. 2019. A synchronous program algebra: a basis for reasoning about shared-memory and event-based concurrency. Formal Aspects of Computing 31, 2 (2019), 133–163. https://doi.org/10.1007/s00165-018-0464-4
  • Herlihy and Shavit (2011) Maurice Herlihy and Nir Shavit. 2011. The Art of Multiprocessor Programming. Morgan Kaufmann.
  • Herlihy and Wing (1990) Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: a correctness condition for concurrent objects. TOPLAS 12, 3 (1990), 463 – 492.
  • Hoare (1972) C. A. R. Hoare. 1972. Towards a Theory of Parallel Programming. In Operating System Techniques. Academic Press, 61–71. Proceedings of Seminar at Queen’s University, Belfast, Northern Ireland, August-September 1971.
  • Hoare (1978) C. A. R. Hoare. 1978. Some Properties of Predicate Transformers. J. ACM 25, 3 (July 1978), 461–480. https://doi.org/10.1145/322077.322088
  • Hoare (1985) C. A. R. Hoare. 1985. Communicating Sequential Processes. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  • Hoare (2002) C. A. R. Hoare. 2002. Towards a Theory of Parallel Programming. In The Origin of Concurrent Programming: From Semaphores to Remote Procedure Calls, Per Brinch Hansen (Ed.). Springer New York, New York, NY, 231–244. https://doi.org/10.1007/978-1-4757-3472-0_6
  • Jeffrey et al. (2022) Alan Jeffrey, James Riely, Mark Batty, Simon Cooksey, Ilya Kaysin, and Anton Podkopaev. 2022. The Leaky Semicolon: Compositional Semantic Dependencies for Relaxed-Memory Concurrency. Proc. ACM Program. Lang. 6, POPL, Article 54 (jan 2022), 30 pages. https://doi.org/10.1145/3498716
  • Jones (1983a) Cliff B. Jones. 1983a. Specification and Design of (Parallel) Programs. In IFIP Congress. 321–332.
  • Jones (1983b) Cliff B. Jones. 1983b. Tentative steps toward a development method for interfering programs. ACM Trans. Program. Lang. Syst. 5 (October 1983), 596–619. Issue 4. https://doi.org/10.1145/69575.69577
  • Jones et al. (1998) Robert B. Jones, Jens U. SkakkebÆk, and David L. Dill. 1998. Reducing Manual Abstraction in Formal Verification of Out-of-Order Execution. In Formal Methods in Computer-Aided Design, Ganesh Gopalakrishnan and Phillip Windley (Eds.). Springer, 2–17.
  • Kang et al. (2017a) Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017a. A Promising Semantics for Relaxed-memory Concurrency. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (Paris, France) (POPL 2017). ACM, New York, NY, USA, 175–189. https://doi.org/10.1145/3009837.3009850
  • Kang et al. (2017b) Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017b. A Promising Semantics for Relaxed-memory Concurrency (POPL 2017). ACM, 175–189. https://doi.org/10.1145/3009837.3009850
  • Kozen (2000) Dexter Kozen. 2000. On Hoare Logic and Kleene Algebra with Tests. ACM Trans. Comput. Logic 1, 1 (July 2000), 60–76. https://doi.org/10.1145/343369.343378
  • Lahav et al. (2016) Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. 2016. Taming Release-Acquire Consistency (POPL 2016). Association for Computing Machinery, 649–662. https://doi.org/10.1145/2837614.2837643
  • Lahav et al. (2017) Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing Sequential Consistency in C/C++11. SIGPLAN Not. 52, 6 (June 2017), 618–632. https://doi.org/10.1145/3140587.3062352
  • Lamport (1979) Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput. C-28, 9 (1979), 690–691.
  • Lau et al. (2019) Stella Lau, Victor B. F. Gomes, Kayvan Memarian, Jean Pichon-Pharabod, and Peter Sewell. 2019. Cerberus-BMC: A Principled Reference Semantics and Exploration Tool for Concurrent and Sequential C. In Computer Aided Verification, I. Dillig and S. Tasiran (Eds.). Springer, 387–397.
  • Lee et al. (2020) Sung-Hwan Lee, Minki Cho, Anton Podkopaev, Soham Chakraborty, Chung-Kil Hur, Ori Lahav, and Viktor Vafeiadis. 2020. Promising 2.0: Global Optimizations in Relaxed Memory Concurrency (PLDI 2020). ACM, 362–376. https://doi.org/10.1145/3385412.3386010
  • Liu et al. (2021) Lun Liu, Todd Millstein, and Madanlal Musuvathi. 2021. Safe-by-Default Concurrency for Modern Programming Languages. ACM Trans. Program. Lang. Syst. 43, 3, Article 10 (sep 2021), 50 pages. https://doi.org/10.1145/3462206
  • Mador-Haim et al. (2012) Sela Mador-Haim, Luc Maranget, Susmit Sarkar, Kayvan Memarian, Jade Alglave, Scott Owens, Rajeev Alur, Milo M. K. Martin, Peter Sewell, and Derek Williams. 2012. An Axiomatic Memory Model for POWER Multiprocessors (CAV 2012). Springer, 495–512. https://doi.org/10.1007/978-3-642-31424-7_36
  • Maranget et al. (2012) Luc Maranget, Susmit Sarkar, and Peter Sewell. 2012. A Tutorial Introduction to the ARM and POWER Relaxed Memory Models. https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
  • Memarian et al. (2016) Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016. Into the Depths of C: Elaborating the de Facto Standards. SIGPLAN Not. 51, 6 (jun 2016), 1–15. https://doi.org/10.1145/2980983.2908081
  • Milner (1982) Robin Milner. 1982. A Calculus of Communicating Systems. Springer-Verlag.
  • Morgan (1990) Carroll Morgan. 1990. Of wp and CSP. In Beauty Is Our Business: A Birthday Salute to Edsger W. Dijkstra, W. H. J. Feijen, A. J. M. van Gasteren, D. Gries, and J. Misra (Eds.). Springer New York, 319–326. https://doi.org/10.1007/978-1-4612-4476-9_37
  • Morgan (1994) Carroll Morgan. 1994. Programming from Specifications (second ed.). Prentice Hall.
  • Nienhuis et al. (2016) Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. 2016. An Operational Semantics for C/C++11 Concurrency. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Amsterdam, Netherlands) (OOPSLA 2016). Association for Computing Machinery, New York, NY, USA, 111–128. https://doi.org/10.1145/2983990.2983997
  • Nieto (2003) Leonor Prensa Nieto. 2003. The Rely-Guarantee Method in Isabelle/HOL. In Programming Languages and Systems, Pierpaolo Degano (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 348–362.
  • Nipkow and Nieto (1999) Tobias Nipkow and Leonor Prensa Nieto. 1999. Owicki/Gries in Isabelle/HOL. In Fundamental Approaches to Software Engineering, Jean-Pierre Finance (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 188–203.
  • Nipkow et al. (2002) Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. 2002. Isabelle/HOL — A Proof Assistant for Higher-Order Logic. LNCS, Vol. 2283. Springer.
  • O’Hearn (2019) Peter W. O’Hearn. 2019. Incorrectness Logic. Proc. ACM Program. Lang. 4, POPL, Article 10 (Dec. 2019), 32 pages. https://doi.org/10.1145/3371078
  • Ou and Demsky (2018) Peizhao Ou and Brian Demsky. 2018. Towards Understanding the Costs of Avoiding Out-of-Thin-Air Results. Proc. ACM Program. Lang. 2, OOPSLA, Article 136 (oct 2018), 29 pages. https://doi.org/10.1145/3276506
  • Owicki and Gries (1976) Susan Owicki and David Gries. 1976. An Axiomatic Proof Technique for Parallel Programs I. Acta Inf. 6, 4 (Dec. 1976), 319–340. https://doi.org/10.1007/BF00268134
  • Paulson (1994) Lawrence C. Paulson. 1994. Isabelle: a generic theorem prover. Springer Verlag.
  • Pichon-Pharabod and Sewell (2016) Jean Pichon-Pharabod and Peter Sewell. 2016. A Concurrency Semantics for Relaxed Atomics That Permits Optimisation and Avoids Thin-Air Executions. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (St. Petersburg, FL, USA) (POPL ’16). Association for Computing Machinery, New York, NY, USA, 622–633. https://doi.org/10.1145/2837614.2837616
  • Plotkin (2004) Gordon D. Plotkin. 2004. A structural approach to operational semantics. J. Log. Algebr. Program. 60-61 (2004), 17–139.
  • Pulte et al. (2017) Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2017. Simplifying ARM Concurrency: Multicopy-Atomic Axiomatic and Operational Models for ARMv8. Proc. ACM Program. Lang. 2, POPL, Article 19 (Dec. 2017), 29 pages. https://doi.org/10.1145/3158107
  • Sarkar et al. (2011) Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. 2011. Understanding POWER Multiprocessors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (San Jose, California, USA) (PLDI ’11). Association for Computing Machinery, New York, NY, USA, 175–186. https://doi.org/10.1145/1993498.1993520
  • Schellhorn et al. (2014) Gerhard Schellhorn, John Derrick, and Heike Wehrheim. 2014. A Sound and Complete Proof Technique for Linearizability of Concurrent Data Structures. ACM Trans. Comput. Logic 15, 4, Article 31 (sep 2014), 37 pages. https://doi.org/10.1145/2629496
  • Shasha and Snir (1988) Dennis Shasha and Marc Snir. 1988. Efficient and Correct Execution of Parallel Programs That Share Memory. ACM Trans. Program. Lang. Syst. 10, 2 (April 1988), 282–312. https://doi.org/10.1145/42190.42277
  • Singh et al. (2021) Sanjana Singh, Divyanjali Sharma, and Subodh Sharma. 2021. Dynamic Verification of C11 Concurrency over Multi Copy Atomics. In 2021 International Symposium on Theoretical Aspects of Software Engineering (TASE). 39–46. https://doi.org/10.1109/TASE52547.2021.00023
  • Smith (2016) Graeme Smith. 2016. Model Checking Simulation Rules for Linearizability. In Software Engineering and Formal Methods, Rocco De Nicola and Eva Kühn (Eds.). Springer International Publishing, Cham, 188–203.
  • Smith et al. (2019) Graeme Smith, Nicholas Coughlin, and Toby Murray. 2019. Value-Dependent Information-Flow Security on Weak Memory Models. In Formal Methods, M. H. ter Beek, A. McIver, and J. N. Oliveira (Eds.). Springer, 539–555.
  • Smith and Groves (2020) Graeme Smith and Lindsay Groves. 2020. Weakening Correctness and Linearizability for Concurrent Objects on Multicore Processors. In Formal Methods. FM 2019 International Workshops, Emil Sekerinski, Nelma Moreira, José N. Oliveira, Daniel Ratiu, Riccardo Guidotti, Marie Farrell, Matt Luckcuck, Diego Marmsoler, José Campos, Troy Astarte, Laure Gonnord, Antonio Cerone, Luis Couto, Brijesh Dongol, Martin Kutrib, Pedro Monteiro, and David Delmas (Eds.). Springer International Publishing, Cham, 342–357.
  • Thornton (1964) James E. Thornton. 1964. Parallel Operation in the Control Data 6600. In Proceedings of the October 27-29, 1964, Fall Joint Computer Conference, Part II: Very High Speed Computer Systems (San Francisco, California) (AFIPS ’64). ACM, 33–40. https://doi.org/10.1145/1464039.1464045
  • Tomasulo (1967) R. M. Tomasulo. 1967. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development 11, 1 (1967), 25–33.
  • Vafeiadis et al. (2015) Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common Compiler Optimisations Are Invalid in the C11 Memory Model and What We Can Do About It. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Mumbai, India) (POPL ’15). ACM, New York, NY, USA, 209–220. https://doi.org/10.1145/2676726.2676995
  • Winter et al. (2021) Kirsten Winter, Nicholas Coughlin, and Graeme Smith. 2021. Backwards-directed information flow analysis for concurrent programs. In 2021 IEEE 34th Computer Security Foundations Symposium (CSF). 1–16. https://doi.org/10.1109/CSF51468.2021.00017
  • Winter et al. (2013) Kirsten Winter, Chenyi Zhang, Ian J. Hayes, Nathan Keynes, Cristina Cifuentes, and Lian Li. 2013. Path-Sensitive Data Flow Analysis Simplified. In Formal Methods and Software Engineering, Lindsay Groves and Jing Sun (Eds.). Springer, 415–430.
  • Wright et al. (2021) Daniel Wright, Mark Batty, and Brijesh Dongol. 2021. Owicki-Gries Reasoning for C11 Programs with Relaxed Dependencies. In Formal Methods, Marieke Huisman, Corina Păsăreanu, and Naijun Zhan (Eds.). Springer, Cham, 237–254.
  • Yang et al. (2003) Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom, and Konrad Slind. 2003. Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT. In Correct Hardware Design and Verification Methods, Daniel Geist and Enrico Tronci (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 81–95.

Appendix A Lifting from actions to traces and commands

Throughout the paper we have used several functions and definitions based on actions, which for the most part are straightforwardly lifted to commands and traces. For completeness we give these below.

A.1. Extracting variables

For any function 𝖿𝗇(.){\sf fn}(.) that simply collects syntax elements into a set (i.e., 𝖿𝗏(.),𝗐𝗏(.),𝗋𝗏(.),.,|.|{\sf fv}(.),{\sf wv}(.),{\sf rv}(.),\lceil.\rceil,|\!.\!|, etc.) and is defined over instructions can be straightforwardly lifted to commands, actions and traces in the following generic pattern.

(A.1) 𝖿𝗇(𝐧𝐢𝐥)\displaystyle{\sf fn}(\mathop{\bf nil}) =\displaystyle=
(A.2) 𝖿𝗇(α)\displaystyle{\sf fn}(\vec{\alpha}) =\displaystyle= _αα𝖿𝗇(α)\displaystyle\bigcup_{\_}{\alpha\in\vec{\alpha}}{\sf fn}(\alpha)
(A.3) 𝖿𝗇(c_1c_2)\displaystyle{\sf fn}(c_{\_}1\sqcap c_{\_}2) =\displaystyle= 𝖿𝗇(c_1)𝖿𝗇(c_2)\displaystyle{\sf fn}(c_{\_}1)\mathbin{\mathstrut{\cup}}{\sf fn}(c_{\_}2)
(A.4) 𝖿𝗇(c_1;mc_2)\displaystyle{\sf fn}(c_{\_}1\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{{{\textsc{m}}}}}$}}{\mathchar 24635\relax}$}}c_{\_}2) =\displaystyle= 𝖿𝗇(c_1)𝖿𝗇(c_2)\displaystyle{\sf fn}(c_{\_}1)\mathbin{\mathstrut{\cup}}{\sf fn}(c_{\_}2)
(A.5) 𝖿𝗇(\bodycm)\displaystyle{\sf fn}(\body{c}{{{\textsc{m}}}}) =\displaystyle= 𝖿𝗇(c)\displaystyle{\sf fn}(c)
(A.6) 𝖿𝗇()\displaystyle{\sf fn}(\langle\rangle) =\displaystyle=
(A.7) 𝖿𝗇(αΓt)\displaystyle{\sf fn}(\alpha\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}t) =\displaystyle= 𝖿𝗇(α)𝖿𝗇(t)\displaystyle{\sf fn}(\alpha)\mathbin{\mathstrut{\cup}}{\sf fn}(t)

A.2. Lifting forwarding/reordering triples

Forwarding (see Sect. 9) an action α\alpha to action β\beta is defined below. Assume xyx\neq y.

(yocs:=f)α\guillemetright=yocs:=(fα\guillemetright)bα\guillemetright\displaystyle{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}({y}^{ocs}\mathop{{:}{=}}f)~{}~{}=~{}~{}{y}^{ocs}\mathop{{:}{=}}({}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}f)\qquad{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\llparenthesis b\rrparenthesis =\displaystyle= bα\guillemetright𝖿α\guillemetright=𝖿\displaystyle\llparenthesis{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}b\rrparenthesis\qquad{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\mathsf{f}~{}~{}=~{}~{}\mathsf{f}
vα\guillemetright=v(f)α\guillemetright\displaystyle{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}v~{}~{}=~{}~{}v\qquad{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}(\ominus f) =\displaystyle= (fα\guillemetright)(e_1e_2)α\guillemetright=(e_α\guillemetright1)(e_α\guillemetright2)\displaystyle\ominus({}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}f)\qquad{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}(e_{\_}1\oplus e_{\_}2)~{}~{}=~{}~{}({}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}e_{\_}1)\oplus({}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}e_{\_}2)
yocs_2xocs_1:=e\guillemetright=yocs_2\displaystyle{}_{{x}^{ocs_{\_}1}\mathrel{\,\mathtt{:=}\,}e\mbox{\raisebox{-0.5pt}{\guillemetright}}}{y}^{ocs_{\_}2}~{}~{}=~{}~{}{y}^{ocs_{\_}2}\qquad xocs_2xocs_1:=e\guillemetright=e\displaystyle\qquad{}_{{x}^{ocs_{\_}1}\mathrel{\,\mathtt{:=}\,}e\mbox{\raisebox{-0.5pt}{\guillemetright}}}{x}^{ocs_{\_}2}~{}~{}=~{}~{}e

Given α\alpha is an assignment xocs:=e{x}^{ocs}\mathop{{:}{=}}e then βα\guillemetright{}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta essentially replaces references to xx (with any constraints) by ee.

The reordering relation can be lifted from actions to commands straightforwardly as below.

(A.8) β\guillemetleft𝐧𝐢𝐥\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\mathop{\bf nil}\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} β=β\displaystyle\beta^{\prime}=\beta
(A.9) β\guillemetleftα\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\alpha\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} α\ext@arrow0055\Leftarrowfill@cββ=βα\guillemetright\displaystyle\alpha\mathrel{{\color[rgb]{0.2.,0.5,0.2}\ext@arrow 0055{\Leftarrowfill@}{}{\,{{\textsc{c}}}}}}\beta^{\prime}\mathrel{\mathstrut{\wedge}}\beta^{\prime}={}_{\alpha\mbox{\raisebox{-0.5pt}{\guillemetright}}}\beta
(A.10) β′′\guillemetleftc_1;c_2\guillemetleftmβ\displaystyle\beta^{\prime\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}1\mathbin{\mathbin{\raisebox{0.0pt}{$\overset{\raisebox{0.0pt}{${{}}$}}{\mathchar 24635\relax}$}}}c_{\_}2\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} ββ′′\guillemetleftc_1\guillemetleftmββ\guillemetleftc_2\guillemetleftmβ\displaystyle\mathop{\mathstrut{\exists}}\nolimits\beta^{\prime}\mathrel{\mathstrut{\bullet}}\beta^{\prime\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}1\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta^{\prime}\mathrel{\mathstrut{\wedge}}\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}2\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta
(A.11) β\guillemetleftc_1c_2\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}1\sqcap c_{\_}2\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} β\guillemetleftc_1\guillemetleftmββ\guillemetleftc_2\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}1\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta\mathrel{\mathstrut{\wedge}}\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c_{\_}2\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta
(A.12) β\guillemetleft\bodycm\guillemetleftmβ\displaystyle\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}\body{c}{{{\textsc{m}}}}\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} iNβ\guillemetleftcmi\guillemetleftmβ\displaystyle\mathop{\mathstrut{\forall}}\nolimits i\in{\mathbb N}\mathrel{\mathstrut{\bullet}}\beta^{\prime}\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}c^{i}_{{{\textsc{m}}}}\mathbin{\overset{{\color[rgb]{0.2.,0.5,0.2}{{\textsc{m}}}}}{\mathop{\textstyle{\raisebox{1.0pt}{${\color[rgb]{0.2.,0.5,0.2}\mbox{\guillemetleft}}$}}}}}\beta

Appendix B Mixing incremental and non-incremental evaluation

In Sect. 7 we gave a semantics for evaluating instructions incrementally, as opposed to treating instructions as indivisible in Sect. 3.2. In this section, for completeness, we show how to mix both possibilities within the syntax of IMP+pseq.

We define an instruction ι\iota to be one of the three basic types of assignment, guard and fence, and an action α\alpha to be a list of instructions (written ι\vec{\iota}). Actions are the basic type of a step in the operational semantics. We define a “specification instruction” to pair a basic instruction ι\iota with a designation as to whether it is divisible or indivisible, i.e., whether it is to be executed incrementally or as a single indivisible step. Finally, within the syntax of IMP+pseq, instead of a list of actions α\vec{\alpha} as the base type (defn. (3.1)), we allow a statement ss, which is a list of specification instructions.

ιInstrαActionι+SpecInstrsStatement\iota\in Instr\quad\alpha\in Action\quad{\iota^{+}}\in SpecInstr\quad s\in Statement
ι\displaystyle\iota ::=\displaystyle\mathrel{:\!:\!=} x:=e|e|𝖿\displaystyle x\mathop{{:}{=}}e\mathrel{~{}|~{}}\llparenthesis e\rrparenthesis\mathrel{~{}|~{}}\mathsf{f}
α\displaystyle\alpha =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} ι\displaystyle\vec{\iota}
ι+\displaystyle{\iota^{+}} ::=\displaystyle\mathrel{:\!:\!=} ι×(𝖽𝗂𝗏𝗂𝗌𝗂𝖻𝗅𝖾|𝗂𝗇𝖽𝗂𝗏𝗂𝗌𝗂𝖻𝗅𝖾)\displaystyle\iota\times(\mathsf{divisible}|\mathsf{indivisible})
s\displaystyle s ::=\displaystyle\mathrel{:\!:\!=} ι+\displaystyle\vec{{\iota^{+}}}

This gives a significant amount of flexibility in describing the execution mode of composite actions, for instance, (x=y,𝗂𝗇𝖽𝗂𝗏𝗂𝗌),(x:=y+z,𝖽𝗂𝗏𝗂𝗌)\langle(\llparenthesis x=y\rrparenthesis,\mathsf{indivis}),(x\mathop{{:}{=}}y+z,\mathsf{divis})\rangle is a statement that calculates whether x=yx=y in the current state, and then incrementally evaluates y+zy+z before assigning the result to xx. Of course, this level of flexibility is not necessary for the majority of cases, and syntactic sugar can be used to cover the commonly occurring cases, in particular, letting a singleton list of specification instructions be written as a single specification instruction; and conventions for distinguishing between divisible and indivisible versions of instructions.

We lift defn. (7.16) for indivisible instructions to the new types.

𝗂𝗇𝖽𝗂𝗏𝗂𝗌((ι,𝗂𝗇𝖽𝗂𝗏𝗂𝗌𝗂𝖻𝗅𝖾))\displaystyle\mathsf{indivis}((\iota,\mathsf{indivisible})) =\displaystyle= 𝖳𝗋𝗎𝖾\displaystyle\mathsf{True}
𝗂𝗇𝖽𝗂𝗏𝗂𝗌((ι,𝖽𝗂𝗏𝗂𝗌𝗂𝖻𝗅𝖾))\displaystyle\mathsf{indivis}((\iota,\mathsf{divisible})) =\displaystyle= 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(ι)\displaystyle\mathsf{indivis}(\iota)
𝗂𝗇𝖽𝗂𝗏𝗂𝗌(s)\displaystyle\mathsf{indivis}(s) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} ι+s𝗂𝗇𝖽𝗂𝗏𝗂𝗌(ι+)\displaystyle\mathop{\mathstrut{\forall}}\nolimits{\iota^{+}}\in s\mathrel{\mathstrut{\bullet}}\mathsf{indivis}({\iota^{+}})

Any specification instruction tagged 𝗂𝗇𝖽𝗂𝗏𝗂𝗌\mathsf{indivis} is indivisible, while a specification instruction tagged 𝖽𝗂𝗏𝗂𝗌\mathsf{divis} is divisible if its instruction is, but is otherwise indivisible.

The relevant operational semantics for specification instructions and statements is as follows.

(B.3) ιαι(ι,𝖽𝗂𝗏𝗂𝗌𝗂𝖻𝗅𝖾)α(ι,𝖽𝗂𝗏𝗂𝗌𝗂𝖻𝗅𝖾)\displaystyle\begin{array}[]{c}\iota\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}\iota^{\prime}\\ \cline{1-1}\cr(\iota,\mathsf{divisible})\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}(\iota^{\prime},\mathsf{divisible})\end{array}
(B.8) 𝗂𝗇𝖽𝗂𝗏𝗂𝗌(s_1)ι+αι+s_1Γι+Γs_2αs_1Γι+Γs_2𝗂𝗇𝖽𝗂𝗏𝗂𝗌(s)sstrip(s)𝐧𝐢𝐥\displaystyle\begin{array}[]{c}\mathsf{indivis}(s_{\_}1)\qquad{\iota^{+}}\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}{\iota^{+}}^{\prime}\\ \cline{1-1}\cr s_{\_}1\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}{\iota^{+}}\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}s_{\_}2\mathrel{\stackrel{{\scriptstyle\raisebox{5.69054pt}{\hbox{}}\alpha}}{{\longrightarrow}}}s_{\_}1\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}{\iota^{+}}^{\prime}\mathbin{\raise 3.44444pt\hbox{$\mathchar 0\relax\@@cat$}}s_{\_}2\end{array}\qquad\qquad\begin{array}[]{c}\mathsf{indivis}(s)\\ \cline{1-1}\cr s\xrightarrow{\raisebox{5.69054pt}{\hbox{}}\hbox{${}_{strip(s)}$}}\mathop{\bf nil}\end{array}

Rule B.3 states that any instruction tagged divisible can take an incremental execution step according to the evaluation rules in Sect. 7. This is used to build Rule B.8 for a statement ss, where specification instructions within ss are executed incrementally from left to right: the first divisible specification instruction (ι+{\iota^{+}}) in ss may take a step, which becomes a step of the statement. When all instructions within ss are indivisible a final, single, indivisible step is taken. This action is formed by simply stripping the 𝗂𝗇𝖽𝗂𝗏𝗂𝗌/𝖽𝗂𝗏𝗂𝗌\mathsf{indivis}/\mathsf{divis} tags from the specification instructions, i.e., strip((ι, ))=^ιstrip((\iota,\leavevmode\vbox{\hrule width=3.99994pt}))\mathrel{\mathstrut{\widehat{=}}}\iota, which is lifted to statements by applying stripstrip onto each element, i.e., strip(s)=^map(strip,s)strip(s)\mathrel{\mathstrut{\widehat{=}}}map(strip,s).

A compare and swap command (defn. (3.9)) can be redefined to incrementally evaluate its arguments before executing an indivisible (“atomic”) test-and-set step.

(B.9) 𝐜𝐚𝐬(x,e,e)\displaystyle\mathbf{cas}(x,e,e^{\prime}) =^\displaystyle\mathrel{\mathstrut{\widehat{=}}} (x=e,𝖽𝗂𝗏𝗂𝗌),(x:=e,𝖽𝗂𝗏𝗂𝗌)(xe,𝖽𝗂𝗏𝗂𝗌)\displaystyle\langle(\llparenthesis x=e\rrparenthesis,\mathsf{divis}),(x\mathop{{:}{=}}e^{\prime},\mathsf{divis})\rangle\sqcap(\llparenthesis x\neq e\rrparenthesis,\mathsf{divis})

In the successful case first ee is evaluated to a value vv, then ee^{\prime} is evaluated to a value vv^{\prime}, and finally the action x=vx:=v\langle\llparenthesis x=v\rrparenthesis~{}~{}~{}x\mathop{{:}{=}}v^{\prime}\rangle is executed (the 𝖽𝗂𝗏𝗂𝗌\mathsf{divis} tags are stripped). This means that ee and ee^{\prime} can be incrementally evaluated, but the final test/update remains atomic.