This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

stix@largesymbols"0E stix@largesymbols"0F

Improving Dynamic Code Analysis by Code Abstraction

Isabella Mastroeni Department of Computer Science, University of Verona (Italy) [email protected] Department of Environmental Sciences, Informatics and Statistics,
Ca’ Foscari University of Venice (Italy)
   Vincenzo Arceri Department of Environmental Sciences, Informatics and Statistics,
Ca’ Foscari University of Venice (Italy) [email protected]
Abstract

In this paper, our aim is to propose a model for code abstraction, based on abstract interpretation, allowing us to improve the precision of a recently proposed static analysis by abstract interpretation of dynamic languages. The problem we tackle here is that the analysis may add some spurious code to the string-to-execute abstract value and this code may need some abstract representations in order to make it analyzable. This is precisely what we propose here, where we drive the code abstraction by the analysis we have to perform.

1 Introduction

The possibility of dynamically building code instructions as the result of text manipulation is a key aspect in dynamic programming languages. In this scenario, programs can turn text, which can be built at run-time, into executable code [26]. These features are often used in code protection and tamper-resistant applications, employing camouflage for escaping attack or detection [22], in malware, in mobile code, in web servers, in code compression, and in code optimization, e.g., in Just-in-Time (JIT) compilers, employing optimized run-time code generation.
While the use of dynamic code generation may simplify considerably the art and performance of programming, this practice is also highly dangerous, making the code prone to unexpected behaviors and malicious exploits of its dynamic vulnerabilities, such as code/object-injection attacks for privilege escalation, database corruption, and malware propagation. It is clear that more advanced and secure functionalities based on string-to-code statements could be permitted if we better master how to safely generate, analyze, debug, and deploy programs that dynamically generate and manipulate code.

There are lots of good reasons to analyze programs building strings that can be later executed as code. An interesting example is code obfuscation. Recently, several techniques have been proposed for JavaScript code obfuscation111https://www.daftlogic.com/projects-online-javascript-obfuscator.htm,
http://www.danstools.com/javascript-obfuscate/,
http://javascript2img.com/,
https://javascriptobfuscator.herokuapp.com/,
https://javascriptobfuscator.com/
, meaning that also client-side code protection is becoming an increasingly important problem to be tackled by the research community and by practitioners. Hence, it is not always possible to simply ignore eval without accepting to lose the possibility of analyzing the rest of the program [4].

The Context: Analyzing Dynamic Code.

A major problem in presence of dynamic code generation is that static analysis becomes extremely hard if not impossible. This happens because program data structures, such as the control-flow graph and the system of recursive equations associated with the program in question, are themselves dynamically mutating objects. Recently [4], the problem of analyzing dynamic code has been tackled by treating code as any other dynamic structure that can be statically analyzed by abstract interpretation, and to treat the abstract interpreter as any other program function that can be recursively called. In particular, in [4], we provide a static analyzer architecture for a core dynamic language, containing non-removable eval statements, that still has some limitation in terms of precision but provides the necessary ground for studying more precise solutions to the problem. In particular,

  • \bullet

    We have designed an automata-based string abstract domain [5] for analyzing string values during execution. Automata (FA) provide the perfect choice for abstracting strings that may be executed by eval since they allow us to over-approximate the set of possible values of string variables by keeping enough information for both analyzing properties of string variables that are never executed by an eval during computation and for extracting the potential executable sub-language.

  • \bullet

    In order to statically analyze the code potentially executed by an eval, we have designed a systematic process for extracting from the (abstract) argument of eval (i.e., from the FA collection of its potential arguments) an over-approximation of executable code that this collection contains. Clearly, this approximation must keep a form that the analyzer can interpret.

  • \bullet

    We designed a static analyzer for dynamic languages performing a recursive call of the interpreter on the (over-approximated) code that eval may execute.

The Problem: Improve Precision Analysis by Abstracting Code.

This analysis provides a first step towards the analysis of dynamic languages but still has some important precision loss [4]. In particular, there are particular forms of FA (which occur when the string is dynamically generated by loops) avoiding the possibility of generating a control flow graph (𝖢𝖥𝖦\mathsf{CFG}) able to approximate the code executed by an eval. For instance, when the FA accepts a language such as {x=(5)n;|n>0}{{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}=(5)}}}}^{n}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers;}}}}\left|\begin{array}[]{l}n>0\end{array}\right.\!\right\}, the analysis in [4] cannot extract, from the FA, the 𝖢𝖥𝖦\mathsf{CFG} approximating the eval argument. In order to better explain the problem, consider the code in Fig. 1, where the value of i is statically unknown. In Fig. 1, we draw the automaton AA representing the abstract value of str before the eval execution. The problem is that AA has a cycle not involving a whole statement [4]. This situation makes the analyzer unable to build a 𝖢𝖥𝖦\mathsf{CFG} over-approximating the code potentially executed since, intuitively, such a 𝖢𝖥𝖦\mathsf{CFG} should be infinite. Indeed, only an infinite 𝖢𝖥𝖦\mathsf{CFG} could capture all the possible assignments described by the FA, namely all the assignments of any possible number formed only by 𝟻{\tt 5} to the variable 𝚡{\tt x} (i.e., x=5;,x=55;,x=555;\dots).

str = "x=5"; while (i < 3) { str = str + "5"; i = i + 1; } str = str + ";"; eval(str);

x=55;

Figure 1: AA s.t. (A)={𝚡=𝟻n;|n>0}\mathcal{L}(A)=\{\mathtt{x=5}^{n}\mathtt{;}\leavevmode\nobreak\ |\leavevmode\nobreak\ n>0\}, where 𝟻n\mathtt{5}^{n} means 𝟻\mathtt{5} repeated nn times.

In order to make it possible to overcome this limitation, at least for a set of potential eval patterns, we propose to define a form of abstract 𝖢𝖥𝖦\mathsf{CFG} able to finitely represent a potential infinite set of 𝖢𝖥𝖦\mathsf{CFG}s, e.g., we look for a 𝖢𝖥𝖦\mathsf{CFG} representing x=5.
Unfortunately, things are not so easy as it may seem, since this abstract code representation has to be built in such a way that the analyzer may still be able to interpret it.

Contribution.

The main contributions for tackling the problem above are:

  • \bullet

    We first define the notion of abstract 𝖢𝖥𝖦\mathsf{CFG}, based on the idea of making it possible to still perform a given analysis. The idea is to leave the control structure unchanged while approximating the edge labels (the statements to execute) to sets of labels, i.e., those sharing a fixed abstract property.

  • \bullet

    We show how completeness of code abstraction w.r.t. the semantic observation models the possibility, for the static analyzer, of interpreting also the abstract code, and we show how we can make any code abstraction complete.

  • \bullet

    We provide a systematic approach, based on the one proposed in [4], allowing us to analyze also the eval patterns described above, for which, instead, the analysis in [4] loses precision.

2 The Core Language: Imp

The language is quite standard (see Fig. 2222We use nn to denote the semantic value corresponding to the syntactic symbol n.), and each statement is annotated with a label Lab\ell\in\mbox{\sl Lab$$} (not part of the syntax) corresponding to the statement program point333 We suppose that there exists a function that, taken a well-written program, can label it with a fresh label for each program point..

Expe\displaystyle\mbox{Exp}\ni\mbox{\tt e} ::=a𝗌\displaystyle::=\;\mbox{\tt a}\mid{\sf s}
AExpa\displaystyle\mbox{AExp}\ni\mbox{\tt a} ::=xna+aaaaa{{\displaystyle::=\;\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\mid\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{n}}}}}}\mid\mbox{\tt a}+\mbox{\tt a}\mid\mbox{\tt a}-\mbox{\tt a}\mid\mbox{\tt a}*\mbox{\tt a}
BExpb\displaystyle\mbox{BExp}\ni\mbox{\tt b} ::=x𝚝𝚛𝚞𝚎𝚏𝚊𝚕𝚜𝚎e=ee>ee<ebb¬𝚋{\displaystyle::=\;\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\mid{\tt true}\mid{\tt false}\mid\mbox{\tt e}=\mbox{\tt e}\mid\mbox{\tt e}>\mbox{\tt e}\mid\mbox{\tt e}<\mbox{\tt e}\mid\mbox{\tt b}\wedge\mbox{\tt b}\mid\neg{\tt b}
SExp𝗌\displaystyle\operatorname{SExp}\ni{\sf s} ::=x"σ"concat(𝗌,𝗌)substr(𝗌,a,a){{\displaystyle::=\;\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\mid\mbox{\tt"$\sigma$"}\mid\mbox{\tt concat(${\sf s}$,${\sf s}$)}\mid\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{substr}}}}}}({\sf s},\mbox{\tt a},\mbox{\tt a})\
Commc\displaystyle\operatorname{Comm}\ni\mbox{\tt c} ::=skip12x:=e12c1;c23if1(𝚋){c2}3{c4}56{{\displaystyle::=\;{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}\mbox{\bf skip}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}\mid{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}\mid{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}\mbox{\tt c};{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}\mbox{\tt c}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{3}}}}\mid{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{if}}}}}}\leavevmode\nobreak\ ({\tt b})\leavevmode\nobreak\ \{{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}\mbox{\tt c}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{3}}}}\}\leavevmode\nobreak\ \{{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{4}}}}\mbox{\tt c}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{5}}}}\}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{6}}}}
while1(𝚋){c2}34eval1(𝗌)2{{\displaystyle\mid\ {{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{while}}}}}}\leavevmode\nobreak\ ({\tt b})\leavevmode\nobreak\ \{{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}\mbox{\tt c}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{3}}}}\}{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{4}}}}\mid{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s}){{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}
Imp𝙿\displaystyle\textsf{Imp}\ni\tt P ::=cι;2 where 𝖨𝖽x (Identifiers),n,σΣ{\displaystyle::={{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{\iota}}}}\mbox{\tt c};{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}\qquad\mbox{ where }\mathsf{Id}\ni\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\mbox{ (Identifiers)},n\in\mathbb{Z},\sigma\in\Sigma^{*}
Figure 2: Syntax of Imp

In order to analyze a program PImp{\mbox{\sf P}}\in\textsf{Imp}, we need to model it by building a corresponding control flow graph [28] (𝖢𝖥𝖦\mathsf{CFG} for short), which embeds the control structure in the graph structure and leaves in the edges (or equivalently on the nodes) only the access to states, i.e., manipulation of the states (assignments) and guards. The approach we use is quite standard, and we follow [28] for the construction of the control flow graph. For technical details see [4], here we show the construction on the example in Fig. 3, where ii denotes the node corresponding to the program point _i\ell_{\_}i.

1{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}x := 0; 2{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}while (x<5) {3{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{3}}}}x := x + 14{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{4}}}}}; 5{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{5}}}}x:=76{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{6}}}}

(a)
Refer to caption
(b)
Figure 3: Example of 𝖢𝖥𝖦\mathsf{CFG}: (a) Fragment of code and (b) corresponding 𝖢𝖥𝖦\mathsf{CFG}.

Note that, by construction [4], the language of the 𝖢𝖥𝖦\mathsf{CFG} edge labels is an intermediate language slightly different from the Imp grammar. Edge labels correspond to a primitive statement (i.e., an assignment or eval) or a boolean guard, namely they form the language Ψ\Psi generated by the grammar 𝚕::=x:=e𝚋eval(𝗌){{\mathtt{l}::=\;\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}}\mid{\tt b}\mid\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s}).

Concrete Semantics.

The concrete semantics of our language Imp is intuitive and it is fully reported in [3]. Since our aim is to analyze Imp programs by analyzing their 𝖢𝖥𝖦\mathsf{CFG}s, we focus here only on the interpretation of 𝖢𝖥𝖦\mathsf{CFG}’s labels [28]. In particular, we have to specify the semantics associated with each possible edge of the 𝖢𝖥𝖦\mathsf{CFG}. In other words, we have to formalize how each statement transforms a current state, which is represented as a store, namely as an association between identifiers and values. It is well known that static program analysis works by computing (abstract) collecting semantics, namely, for each program point \ell and for each variable x, it computes the set of values that the variable x can have in any computation at the program point \ell. Hence, we define (collecting) memories 𝕞\mathbb{m}, associating with each variable a set of values. The basic values of Imp are integers, booleans and strings, hence we define the set of memories as 𝕄=defVar(()𝖡𝗈𝗈𝗅(Σ))\mathbb{M}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\textsf{Var}\rightarrow(\wp(\mathbb{Z})\cup\mathsf{Bool}\cup\wp(\Sigma^{*})), ranged over the meta-variable 𝕞\mathbb{m}, where 𝖡𝗈𝗈𝗅=({𝚏𝚊𝚕𝚜𝚎,𝚝𝚛𝚞𝚎})\mathsf{Bool}=\wp(\{{\tt false},{\tt true}\}). Let us denote by 𝕍\mathbb{V} this domain of collections of values ()𝖡𝗈𝗈𝗅(Σ)\wp(\mathbb{Z})\cup\mathsf{Bool}\cup\wp(\Sigma^{*}). The update of memory 𝕞\mathbb{m} for a variable x with set of values vv is denoted 𝕞[x/v]{\mathbb{m}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/v]. The partial order \sqsubseteq between memories is defined as 𝕞_1𝕞_2x𝖨𝖽.𝕞_1(x)𝕞_2(x){{{\mathbb{m}_{\_}1\sqsubseteq\mathbb{m}_{\_}2\Leftrightarrow\forall\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\in\mathsf{Id}.\>\mathbb{m}_{\_}1(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\subseteq\mathbb{m}_{\_}2(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}). Finally, lub and glb of memories are computed point-wise, i.e., 𝕞_1𝕞_2=defλx.𝕞_1(x)𝕞_2(x){{{\mathbb{m}_{\_}1\sqcup\mathbb{m}_{\_}2\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\lambda\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}.\>\mathbb{m}_{\_}1(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\cup\mathbb{m}_{\_}2(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}) and 𝕞_1𝕞_2=defλx.𝕞_1(x)𝕞_2(x){{{\mathbb{m}_{\_}1\sqcap\mathbb{m}_{\_}2\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\lambda\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}.\>\mathbb{m}_{\_}1(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\cap\mathbb{m}_{\_}2(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}).
The collecting (input/output) semantics of statements cΨ\mbox{\tt c}\in\Psi is defined as the function c:𝕄𝕄{\llbracket\mbox{\tt c}\rrbracket}:\mathbb{M}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\mathrel{\mathop{\hskip 1.0pt\longrightarrow\hskip 1.0pt}\limits^{\,{}_{\mbox{\tiny}}}}$}}\mathbb{M}. We denote by {\llparenthesis\hskip 0.86108pt\cdot\hskip 0.86108pt\rrparenthesis} the collecting semantics of expressions, defined as additive lift444Let f:SSf:S\rightarrow S be a generic function, by additive lift we mean its extension to sets of elements, i.e., XS\forall X\subseteq S we define f(X)=def{f(x)|xS}f(X)\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ f(x)\left|\begin{array}[]{l}x\in S\end{array}\right.\!\right\}. If f:S(S)f:S\rightarrow\wp(S), then its lift to sets of memories is f(X)=def{f(x)|xS}f(X)\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\bigcup\left\{\leavevmode\nobreak\ f(x)\left|\begin{array}[]{l}x\in S\end{array}\right.\!\right\} to sets of memories of the standard expression semantics. We abuse notation by denoting as {\llbracket\cdot\rrbracket} also its additive lift to sets of statements.

x:=e𝕞=𝕞[x/e𝕞]b𝕞=𝕞{𝕞|b𝕞=𝚝𝚛𝚞𝚎}eval(𝗌)𝕞=𝗌𝕞Imp𝕞{{{\begin{array}[]{rcl}{\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}}\rrbracket}\mathbb{m}&=&\mathbb{m}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/{\llparenthesis\hskip 0.86108pt\mbox{\tt e}\hskip 0.86108pt\rrparenthesis}\mathbb{m}]\qquad{\llbracket\mbox{\tt b}\rrbracket}\mathbb{m}=\mathbb{m}\sqcap\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}\left|\begin{array}[]{l}{\llparenthesis\hskip 0.86108pt\mbox{\tt b}\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\tt true}\end{array}\right.\!\right\}\\ {\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s})\rrbracket}\mathbb{m}&=&{\llbracket{\llparenthesis\hskip 0.86108pt{\sf s}\hskip 0.86108pt\rrparenthesis}\mathbb{m}\Cap\textsf{Imp}\rrbracket}\mathbb{m}\end{array}

where \Cap is the intersection in the set of Imp programs. By computing the traces of application of this transfer function, starting from any possible input memory, we precisely compute the maximal trace semantics [23].

Static Analysis on 𝖢𝖥𝖦\mathsf{CFG}: Semantic Abstraction.

It is well known that when we perform static analysis on a 𝖢𝖥𝖦\mathsf{CFG}, we interpret, on the corresponding abstract domain, all the edges, and more specifically all the labels (in Ψ\Psi) [28]. This is also a quite standard approach, but we recall it here for fixing the notation used. We suppose to abstract values on the coalesced sum [3] of the 𝖲𝗂𝗀𝗇\mathsf{Sign} abstract domain for integers, of the concrete domain for booleans and of the (deterministic) finite state automata abstract domain for strings [3]555A string static analyzer using finite state automata abstract domain has been developed and it is available in [3].. Let us consider an abstraction ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V})666For the sake of simplicity here we abuse notation by considering a unique ρ\rho which is indeed the coalesced sum of three abstractions, one for integers, one for booleans and one for strings. of the values manipulated by our language, we denote by 𝕄ρ:Varρ(𝕍)\mathbb{M}^{\rho}:\textsf{Var}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\mathrel{\mathop{\hskip 1.0pt\longrightarrow\hskip 1.0pt}\limits^{\,{}_{\mbox{\tiny}}}}$}}\rho(\mathbb{V}) the set of (collecting) memories, where sets of values are abstracted by ρ\rho, ranged over 𝕞ρ\mathbb{m}^{\rho}. In the following, we abuse notation by applying ρ\rho to memories in 𝕄\mathbb{M}, simply by defining ρ(𝕞)𝕄ρ\rho(\mathbb{m})\in\mathbb{M}^{\rho} as ρ(𝕞):xVarρ(𝕞(x)){{\rho(\mathbb{m}):\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\in\textsf{Var}\mapsto\rho(\mathbb{m}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}))777For the sake of simplicity of presentation and implementation, we have considered here non-relational abstractions of data, anyway we believe that it is possible to easy extend our work to relational abstractions.. In this way, we can see abstract memories as sets of concrete memories, and therefore as particular collecting memories, i.e., 𝕄ρ𝕄\mathbb{M}^{\rho}\subseteq\mathbb{M}. Finally, we can define the abstract edge effect ρ{\llbracket\cdot\rrbracket}^{\rho} [28] telling us how to abstractly interpret each edge of the 𝖢𝖥𝖦\mathsf{CFG}:

x:=eρ𝕞ρ=𝕞ρ[x/ρ(eρ𝕞ρ)]bρ𝕞ρ=𝕞ρρ({𝕞|𝚝𝚛𝚞𝚎bρ𝕞ρ})eval(𝗌)ρ𝕞ρ=𝗌ρ𝕞ρImpρ𝕞ρ{{{\begin{array}[]{rcl}{\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}}\rrbracket}^{\rho}\mathbb{m}^{\rho}&=&\mathbb{m}^{\rho}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/\rho({\llparenthesis\hskip 0.86108pt\mbox{\sf e}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}^{\rho})]\qquad\qquad{\llbracket\mbox{\tt b}\rrbracket}^{\rho}\mathbb{m}^{\rho}=\mathbb{m}^{\rho}\sqcap\rho(\sqcup\left\{\leavevmode\nobreak\ \mathbb{m}\left|\begin{array}[]{l}{\tt true}\in{\llparenthesis\hskip 0.86108pt\mbox{\tt b}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}^{\rho}\end{array}\right.\!\right\})\\ {\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s})\rrbracket}^{\rho}\mathbb{m}^{\rho}&=&{\llbracket{\llparenthesis\hskip 0.86108pt{\sf s}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}^{\rho}\Cap\textsf{Imp}\rrbracket}^{\rho}\mathbb{m}^{\rho}\end{array}

where ρ=defρ\comp\compρ{\llparenthesis\hskip 0.86108pt\cdot\hskip 0.86108pt\rrparenthesis}^{\rho}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\rho\comp{\llparenthesis\hskip 0.86108pt\cdot\hskip 0.86108pt\rrparenthesis}\comp\rho.The semantics of a path in the 𝖢𝖥𝖦\mathsf{CFG} is the composition of the interpretation of each edge, and the interpretation of an edge is the interpretation, given above, of its label [28].

This is clearly, what happens when the 𝖢𝖥𝖦\mathsf{CFG} is not abstracted, namely when the edge labels are single statements. Finally, since we deal with potential abstract 𝖢𝖥𝖦\mathsf{CFG}, we have to say how we execute them, potentially on an abstract semantics. The idea is simple, since we move from executing single statements to executing sets of statements, we simply take as execution of the abstract 𝖢𝖥𝖦\mathsf{CFG} the additive lift of the single statements executions. Since the semantics is always additive888A function is said to be additive if it commutes with least upper bound., in order to guarantee that everything works, also the semantic abstraction ρ\rho must be additive. Hence, in the following of the paper we always require ρ\rho to be additive.

3 Semantic-driven Code Abstraction

In this section, we study how we can model a syntactic abstraction of the 𝖢𝖥𝖦\mathsf{CFG} and which is its relation with the semantic abstraction, i.e., the code analysis.

Modeling Code Abstraction.

Following the standard approach for abstracting objects, we should abstract each 𝖢𝖥𝖦\mathsf{CFG} in a set of 𝖢𝖥𝖦\mathsf{CFG}s sharing an invariant property, i.e., an equivalence class of 𝖢𝖥𝖦\mathsf{CFG}s. In particular, since we aim at abstracting code (𝖢𝖥𝖦\mathsf{CFG}) without changing the analysis performed on the code, we choose to abstract 𝖢𝖥𝖦\mathsf{CFG} by abstracting edge labels, and by leaving unchanged the control structure of the 𝖢𝖥𝖦\mathsf{CFG}. In other words, an abstract 𝖢𝖥𝖦\mathsf{CFG}, denoted 𝖢𝖥𝖦#\mathsf{CFG}^{\#}, is a pair Nodes,Edges#\langle\mbox{\sl Nodes},\mbox{\sl Edges}^{\#}\rangle, where we leave the nodes unchanged, while the edge labels are abstracted to sets of labels. Formally, Edges#Nodes×(Ψ)×Nodes\mbox{\sl Edges}^{\#}\subseteq\mbox{\sl Nodes}\times\wp(\Psi)\times\mbox{\sl Nodes}, where Ψ\Psi is the 𝖢𝖥𝖦\mathsf{CFG} label language.

Given ηuco((Ψ))\eta\in\mbox{\it uco}(\wp(\Psi)), 𝙶η=defNodes(𝙶),Edgesη(𝙶)\mathtt{G}^{\eta}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\langle\mbox{\sl Nodes}(\mathtt{G}),\mbox{\sl Edges}^{\eta}(\mathtt{G})\rangle is the 𝖢𝖥𝖦#\mathsf{CFG}^{\#} built from a 𝖢𝖥𝖦\mathsf{CFG} 𝙶\mathtt{G} in terms of η\eta, where Edgesη(𝙶)Nodes(𝙶)×η((Ψ))×Nodes(𝙶)\mbox{\sl Edges}^{\eta}(\mathtt{G})\subseteq\mbox{\sl Nodes}(\mathtt{G})\times\eta(\wp(\Psi))\times\mbox{\sl Nodes}(\mathtt{G}).

Refer to caption
Figure 4: 𝖢𝖥𝖦\mathsf{CFG} abstracted by signs.

As an example, consider the 𝖢𝖥𝖦\mathsf{CFG} in Fig. 3, in Fig. 4 we have the 𝖢𝖥𝖦#\mathsf{CFG}^{\#} where numerical expressions are abstracted by {m|m𝖲𝗂𝗀𝗇(n)}{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{m}}}}}}\left|\begin{array}[]{l}m\in\mathsf{Sign}(n)\end{array}\right.\!\right\}999We use nn to denote the semantic value corresponding to the syntactic symbol n. (where 𝖲𝗂𝗀𝗇\mathsf{Sign} is the well-known sign abstraction 𝖲𝗂𝗀𝗇uco(())\mathsf{Sign}\in uco(\wp(\mathbb{Z})) such as 𝖲𝗂𝗀𝗇(())={,+,,{0},}\mathsf{Sign}(\wp(\mathbb{Z}))=\{\top,\mathbb{Z}^{+},\mathbb{Z}^{-},\{0\},\varnothing\}). For instance, x:=x+1 is abstracted in x:=x++{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{x}}+}}}}\mathbb{Z}^{+} where x++=def{x+n|n+}{{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}+}}}}\mathbb{Z}^{+}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}+{\@listingGroup{ltx_lst_identifier}{n}}}}}}\left|\begin{array}[]{l}n\in\mathbb{Z}^{+}\end{array}\right.\!\right\}, being 𝖲𝗂𝗀𝗇(1)=+\mathsf{Sign}(1)=\mathbb{Z}^{+}.

Abstracting Code vs Abstracting Semantics.

As previously noted, we aim at characterizing code abstractions, for dynamically generated code, for which the given analysis works precisely. Formally, let us consider the following equation:

𝕞ρ𝕄ρ𝕄.φΨ.η(φ)𝕞ρ=η(φ)ρ𝕞ρ\forall\mathbb{m}^{\rho}\in\mathbb{M}^{\rho}\subseteq\mathbb{M}.\>\forall\varphi\in\Psi.\>{\llbracket\eta(\varphi)\rrbracket}\mathbb{m}^{\rho}={\llbracket\eta(\varphi)\rrbracket}^{\rho}\mathbb{m}^{\rho}\vspace{-.2cm} (1)

If this equality does not hold it means that the abstract semantic interpretation ρ{\llbracket\cdot\rrbracket}^{\rho} merges predicates distinguished by η\eta. Namely, when the program is observed by means of its (abstract) semantics the actual abstraction of predicates is not precisely η\eta, but it is η\eta affected in some way by ρ{\llbracket\cdot\rrbracket}^{\rho}. By changing the point of view, we have that, in this case, the analysis cannot precisely interpret the abstract code, since η\eta abstracts the code by distinguishing information that ρ\rho cannot distinguish.
As an example, consider the sign domain above, when η(x:=5)={x:=n|1n5}{{\eta(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=5}}}})=\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{n}}}}}}\left|\begin{array}[]{l}1\leq n\leq 5\end{array}\right.\!\right\} the equation does not hold since the concrete semantics of this set does not take any positive value for x. While, if η(x:=5)={x:=n|n+{0}}{{\eta(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=5}}}})=\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{n}}}}}}\left|\begin{array}[]{l}n\in\mathbb{Z}^{+}\cup\{0\}\end{array}\right.\!\right\}, then Eq. 1 holds since its concrete semantics is precisely the set of non-negative values. It is worth noting that Eq. 1 is a forward completeness [16] of the code abstraction w.r.t. the semantic interpretation, meaning that the semantic abstraction does not add imprecision to the code one.
In order to investigate the relation existing between the code abstraction η\eta and the semantic abstraction ρ\rho, we observe that, whenever we have a semantic abstraction ρ\rho, we have a natural code abstraction induced by ρ\rho. Namely, by only observing (abstract) information about the computation, we cannot distinguish statements with the same (abstract) semantics, independently from what any possible code abstraction does. For instance, if we analyze parity of program variables, we are unable to distinguish x:=2 from x:=4, independently from how a potential code abstraction η\eta is defined on x:=2. The first step consists in defining a code abstraction for expressions in terms of semantic one. Consider ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V}), we define ρ^(e)\widehat{\rho}(\mbox{\sf e}) inductively on the expressions structure

ρ^(a):{ρ^(a_1opa_2)=def{aopa′′|aρ^(a_1),a′′ρ^(a_2)}=defρ^(a_1)opρ^(a_2)ρ^(x)=defx,ρ^(n)=def{m|mρ({n})}ρ^(b):{ρ^(b_1bopb_2)=defρ^(b_1)bopρ^(b_2),ρ^(¬b)=def¬ρ^(b)ρ^(x)=defx,ρ^(𝚝𝚛𝚞𝚎)=def{t|tρ(𝚝𝚛𝚞𝚎)},ρ^(𝚏𝚊𝚕𝚜𝚎)=def{t|tρ(𝚏𝚊𝚕𝚜𝚎)}ρ^(𝗌):{ρ^(concat(𝗌_1,𝗌_2))=defconcat(ρ^(𝗌_1),ρ^(𝗌_2)),ρ^(substr(𝗌,a_1,a_2))=defsubstr(ρ^(𝗌),ρ^(a_1),ρ^(a_2))ρ^(x)=defx,ρ^("σ")=def{"δ"|δρ(σ)}{{{{{{{{{{{{\begin{array}[]{ll}\widehat{\rho}(\mbox{\tt a})&:\left\{\begin{array}[]{ll}\widehat{\rho}(\mbox{\tt a}_{\_}1\mbox{\tt op}\>\mbox{\tt a}_{\_}2)\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \mbox{\tt a}^{\prime}\mbox{\tt op}\>\mbox{\tt a}^{\prime\prime}\left|\begin{array}[]{l}\mbox{\tt a}^{\prime}\in\widehat{\rho}(\mbox{\tt a}_{\_}1),\mbox{\tt a}^{\prime\prime}\in\widehat{\rho}(\mbox{\tt a}_{\_}2)\end{array}\right.\!\right\}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\widehat{\rho}(\mbox{\tt a}_{\_}1)\mbox{\tt op}\>\widehat{\rho}(\mbox{\tt a}_{\_}2)\\ \widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}},\qquad\widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{n}}}}}})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{m}}}}}}\left|\begin{array}[]{l}m\in\rho(\{n\})\end{array}\right.\!\right\}\\ \end{array}\right.\\ \widehat{\rho}(\mbox{\tt b})&:\left\{\begin{array}[]{ll}\widehat{\rho}(\mbox{\tt b}_{\_}1\mbox{\tt bop}\>\mbox{\tt b}_{\_}2)\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\widehat{\rho}(\mbox{\tt b}_{\_}1)\mbox{\tt bop}\>\widehat{\rho}(\mbox{\tt b}_{\_}2),\qquad\widehat{\rho}(\neg\mbox{\tt b})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\neg\widehat{\rho}(\mbox{\tt b})\\ \widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}},\qquad\widehat{\rho}({\tt true})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{t}}}}}}\left|\begin{array}[]{l}t\in\rho({\tt true})\end{array}\right.\!\right\},\qquad\widehat{\rho}({\tt false})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{t}}}}}}\left|\begin{array}[]{l}t\in\rho({\tt false})\end{array}\right.\!\right\}\end{array}\right.\\ \widehat{\rho}({\sf s})&:\left\{\begin{array}[]{ll}\widehat{\rho}(\mbox{\tt concat(${\sf s}_{\_}1$,${\sf s}_{\_}2$)})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\mbox{\tt concat($\widehat{\rho}({\sf s}_{\_}1)$,$\widehat{\rho}({\sf s}_{\_}2)$)},\\ \widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{substr}}}}}}({\sf s},\mbox{\tt a}_{\_}1,\mbox{\tt a}_{\_}2))\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{substr}}}}}}(\widehat{\rho}({\sf s}),\widehat{\rho}(\mbox{\tt a}_{\_}1),\widehat{\rho}(\mbox{\tt a}_{\_}2))\\ \widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}},\qquad\widehat{\rho}(\mbox{\tt"$\sigma$"})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \mbox{\tt"$\delta$"}\left|\begin{array}[]{l}\delta\in\rho(\sigma)\end{array}\right.\!\right\}\end{array}\right.\end{array}

At this point, we can characterize the 𝖢𝖥𝖦\mathsf{CFG} labels abstraction Υ¯[ρ]:(Ψ)(Ψ)\overline{\Upsilon}[\rho]:\wp(\Psi)\longrightarrow\wp(\Psi), as the additive lift of the function

Υ¯[ρ](x:=e)=defx:=ρ^(e)=def{x:=e|eρ^(e)}Υ¯[ρ](b)=defρ^(b)Υ¯[ρ](eval(𝗌))=defeval(ρ^(𝗌)){{{\begin{array}[]{rl}\overline{\Upsilon}[\rho](\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\mbox{\sf e})&\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}x:=\widehat{\rho}(\mbox{\tt e})\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ x:=\mbox{\tt e}^{\prime}\left|\begin{array}[]{l}\mbox{\tt e}^{\prime}\in\widehat{\rho}(\mbox{\tt e})\end{array}\right.\!\right\}\\ \overline{\Upsilon}[\rho](\mbox{\sf b})&\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\widehat{\rho}(\mbox{\sf b})\qquad\qquad\overline{\Upsilon}[\rho](\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s}))\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}(\widehat{\rho}({\sf s}))\end{array}

where eval(ρ^(𝗌)){\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}(\widehat{\rho}({\sf s})) is treated as the implicit representation of all the statements that it can execute, namely it represents the (potentially infinite) set {𝚌|𝚌𝕞𝗌ρImpρ𝕞}\left\{\leavevmode\nobreak\ \mathtt{c}\left|\begin{array}[]{l}{\llbracket\mathtt{c}\rrbracket}\mathbb{m}\sqsubseteq{\llbracket{\llparenthesis\hskip 0.86108pt{\sf s}\hskip 0.86108pt\rrparenthesis}^{\rho}\Cap\textsf{Imp}\rrbracket}^{\rho}\mathbb{m}\end{array}\right.\!\right\}.

The following result is immediate by construction.

Proposition 3.1

Given ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V}), then Υ¯[ρ]uco((Ψ))\overline{\Upsilon}[\rho]\in\mbox{\it uco}(\wp(\Psi)) and it is additive.

Finally, in order to show that this code abstraction can be used to force satisfiability of Eq. 1, we have first to characterize the meaning of interpreting an edge label abstracted by Υ¯[ρ]\overline{\Upsilon}[\rho]:

x:=ρ^(e)𝕞={x:=e𝕞|eρ^(e)}ρ^(b)𝕞={b𝕞|bρ^(b)}eval(ρ^(𝗌))𝕞={𝚌𝕞|𝚌𝕞𝗌ρImpρ𝕞}{{{\begin{array}[]{rcl}{\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\widehat{\rho}(\mbox{\sf e})\rrbracket}\mathbb{m}&=&\bigsqcup\left\{\leavevmode\nobreak\ {\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}}^{\prime}\rrbracket}\mathbb{m}\left|\begin{array}[]{l}\mbox{\sf e}^{\prime}\in\widehat{\rho}(\mbox{\sf e})\end{array}\right.\!\right\}\qquad\qquad{\llbracket\widehat{\rho}(\mbox{\tt b})\rrbracket}\mathbb{m}=\bigsqcup\left\{\leavevmode\nobreak\ {\llbracket\mbox{\sf b}^{\prime}\rrbracket}\mathbb{m}\left|\begin{array}[]{l}\mbox{\sf b}^{\prime}\in\widehat{\rho}(\mbox{\sf b})\end{array}\right.\!\right\}\\ {\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}(\widehat{\rho}({\sf s}))\rrbracket}\mathbb{m}&=&\bigsqcup\left\{\leavevmode\nobreak\ {\llbracket\mathtt{c}\rrbracket}\mathbb{m}\left|\begin{array}[]{l}{\llbracket\mathtt{c}\rrbracket}\mathbb{m}\sqsubseteq{\llbracket{\llparenthesis\hskip 0.86108pt{\sf s}\hskip 0.86108pt\rrparenthesis}^{\rho}\Cap\textsf{Imp}\rrbracket}^{\rho}\mathbb{m}\end{array}\right.\!\right\}\end{array}

Then we have the following results

Lemma 3.2

Given ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V}) additive, then e.𝕞𝕄ρ.ρ^(e)𝕞=eρ𝕞\forall\mbox{\sf e}.\>\forall\mathbb{m}\in\mathbb{M}^{\rho}.\>{\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e})\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\llparenthesis\hskip 0.86108pt\mbox{\sf e}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m} (trivially implying eρ^(e)𝕞𝕄ρ.e𝕞eρ𝕞\mbox{\sf e}^{\prime}\in\widehat{\rho}(\mbox{\sf e})\ \Leftrightarrow\ \forall\mathbb{m}\in\mathbb{M}^{\rho}.\>{\llparenthesis\hskip 0.86108pt\mbox{\sf e}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}\subseteq{\llparenthesis\hskip 0.86108pt\mbox{\sf e}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}) and Φ(Ψ).𝕞𝕄ρ.Υ¯[ρ](Φ)𝕞=Φρ𝕞\forall\Phi\in\wp(\Psi).\>\forall\mathbb{m}\in\mathbb{M}^{\rho}.\>{\llbracket\overline{\Upsilon}[\rho](\Phi)\rrbracket}\mathbb{m}={\llbracket\Phi\rrbracket}^{\rho}\mathbb{m}.

Proof 3.3.

Let us prove first the property for expressions by induction on the syntactic structure of e.

  • e=n{\mbox{\sf e}=\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{n}}}}}}: ρ^(e)𝕞=ρ^(n)𝕞=defρ(n){{\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e})\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{n}}}}}})\hskip 0.86108pt\rrparenthesis}\mathbb{m}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\rho(n), while eρ𝕞=nρ𝕞=ρ(n){{\llparenthesis\hskip 0.86108pt\mbox{\sf e}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}={\llparenthesis\hskip 0.86108pt\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{n}}}}}}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}=\rho(n) (where n𝕞=n{{\llparenthesis\hskip 0.86108pt\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{n}}}}}}\hskip 0.86108pt\rrparenthesis}\mathbb{m}=n);

  • e=x{\mbox{\sf e}=\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}: ρ^(e)𝕞=ρ^(x)𝕞=defx𝕞=𝕞(x){{{{\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e})\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})\hskip 0.86108pt\rrparenthesis}\mathbb{m}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}{\llparenthesis\hskip 0.86108pt\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\hskip 0.86108pt\rrparenthesis}\mathbb{m}=\mathbb{m}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}), while eρ𝕞=xρ𝕞=ρ(𝕞(x))=𝕞(x){{{{\llparenthesis\hskip 0.86108pt\mbox{\sf e}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}={\llparenthesis\hskip 0.86108pt\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}=\rho(\mathbb{m}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}))=\mathbb{m}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}) (since 𝕞𝕄ρ\mathbb{m}\in\mathbb{M}^{\rho});

  • e=e_1ope_2{\mbox{\sf e}=\mbox{\sf e}_{\_}1\>\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{op}}}}}}\>\mbox{\sf e}_{\_}2: Suppose op any arithmetic or boolean operator.
    ρ^(e)𝕞=ρ^(e_1ope_2)𝕞=defρ^(e_1)opρ^(e_2)𝕞=ρ^(e_1)𝕞opρ^(e_2)𝕞=e_1ρ𝕞ope_2ρ𝕞{{{{{\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e})\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e}_{\_}1\>\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{op}}}}}}\>\mbox{\sf e}_{\_}2)\hskip 0.86108pt\rrparenthesis}\mathbb{m}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}{\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e}_{\_}1)\>\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{op}}}}}}\>\widehat{\rho}(\mbox{\sf e}_{\_}2)\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e}_{\_}1)\hskip 0.86108pt\rrparenthesis}\mathbb{m}\>\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{op}}}}}}\>{\llparenthesis\hskip 0.86108pt\widehat{\rho}(\mbox{\sf e}_{\_}2)\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\llparenthesis\hskip 0.86108pt\mbox{\sf e}_{\_}1\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}\>\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{op}}}}}}\>{\llparenthesis\hskip 0.86108pt\mbox{\sf e}_{\_}2\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m} by inductive hypothesis. But this is precisely e_1ope_2ρ𝕞{{\llparenthesis\hskip 0.86108pt\mbox{\sf e}_{\_}1\>\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{op}}}}}}\>\mbox{\sf e}_{\_}2\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m} since op is computed on the semantics as additive lift to sets.

  • Analogously, we can prove all the other cases.

Now, let us prove the fact for 𝖢𝖥𝖦\mathsf{CFG} single edge labels, again by induction on the syntactic structure. Note that, being ρ\rho additive then also ρ{\llbracket\cdot\rrbracket}^{\rho} is additive, being also the concrete semantics additive on sets of statements.

  • Υ¯[ρ](x:=e)𝕞=x:=ρ^(e)𝕞={x:=e𝕞|eρ^(e)}={𝕞[x/e𝕞]|eρ^(e)}=𝕞[x/{e𝕞|eρ^(e)}]=𝕞[x/{e𝕞|e𝕞eρ𝕞}]=𝕞[x/eρ𝕞]=x:=eρ𝕞{{{{{{{{\begin{array}[]{lll}{\llbracket\overline{\Upsilon}[\rho](\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}})\rrbracket}\mathbb{m}&=&{\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\widehat{\rho}(\mbox{\sf e})\rrbracket}\mathbb{m}\\ &=&\bigsqcup\left\{\leavevmode\nobreak\ {\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{e}}}}}}^{\prime}\rrbracket}\mathbb{m}\left|\begin{array}[]{l}\mbox{\sf e}^{\prime}\in\widehat{\rho}(\mbox{\sf e})\end{array}\right.\!\right\}\\ &=&\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/{\llparenthesis\hskip 0.86108pt\mbox{\sf e}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}]\left|\begin{array}[]{l}\mbox{\sf e}^{\prime}\in\widehat{\rho}(\mbox{\sf e})\end{array}\right.\!\right\}\\ &=&\mathbb{m}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/\bigcup\left\{\leavevmode\nobreak\ {\llparenthesis\hskip 0.86108pt\mbox{\sf e}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}\left|\begin{array}[]{l}\mbox{\sf e}^{\prime}\in\widehat{\rho}(\mbox{\sf e})\end{array}\right.\!\right\}]\\ &=&\mathbb{m}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/\bigcup\left\{\leavevmode\nobreak\ {\llparenthesis\hskip 0.86108pt\mbox{\sf e}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}\left|\begin{array}[]{l}{\llparenthesis\hskip 0.86108pt\mbox{\sf e}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}\subseteq{\llparenthesis\hskip 0.86108pt\mbox{\sf e}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}\end{array}\right.\!\right\}]\\ &=&\mathbb{m}[\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}/{\llparenthesis\hskip 0.86108pt\mbox{\sf e}^{\prime}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}]={\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\mbox{\sf e}\rrbracket}^{\rho}\mathbb{m}\\ \end{array}
  • Υ¯[ρ](b)𝕞=ρ^(b)𝕞={b𝕞|bρ^(b)}={𝕞{𝕞|b𝕞=𝚝𝚛𝚞𝚎}]|bρ^(b)}=𝕞{𝕞|b𝕞=𝚝𝚛𝚞𝚎,bρ^(b)}=𝕞{𝕞|b𝕞=𝚝𝚛𝚞𝚎,b𝕞bρ𝕞}=𝕞{𝕞|𝚝𝚛𝚞𝚎bρ𝕞}=bρ𝕞\begin{array}[]{lll}{\llbracket\overline{\Upsilon}[\rho](\mbox{\sf b})\rrbracket}\mathbb{m}&=&{\llbracket\widehat{\rho}(\mbox{\sf b})\rrbracket}\mathbb{m}\\ &=&\bigsqcup\left\{\leavevmode\nobreak\ {\llbracket\mbox{\sf b}^{\prime}\rrbracket}\mathbb{m}\left|\begin{array}[]{l}\mbox{\sf b}^{\prime}\in\widehat{\rho}(\mbox{\sf b})\end{array}\right.\!\right\}\\ &=&\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}\sqcap\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}\left|\begin{array}[]{l}{\llparenthesis\hskip 0.86108pt\mbox{\tt b}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\tt true}\end{array}\right.\!\right\}]\left|\begin{array}[]{l}\mbox{\sf b}^{\prime}\in\widehat{\rho}(\mbox{\sf b})\end{array}\right.\!\right\}\\ &=&\mathbb{m}\sqcap\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}\left|\begin{array}[]{l}{\llparenthesis\hskip 0.86108pt\mbox{\tt b}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\tt true},\ \mbox{\sf b}^{\prime}\in\widehat{\rho}(\mbox{\sf b})\end{array}\right.\!\right\}\\ &=&\mathbb{m}\sqcap\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}\left|\begin{array}[]{l}{\llparenthesis\hskip 0.86108pt\mbox{\tt b}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}={\tt true},\ {\llparenthesis\hskip 0.86108pt\mbox{\sf b}^{\prime}\hskip 0.86108pt\rrparenthesis}\mathbb{m}\subseteq{\llparenthesis\hskip 0.86108pt\mbox{\sf b}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}\end{array}\right.\!\right\}\\ &=&\mathbb{m}\sqcap\bigsqcup\left\{\leavevmode\nobreak\ \mathbb{m}\left|\begin{array}[]{l}{\tt true}\in{\llparenthesis\hskip 0.86108pt\mbox{\sf b}\hskip 0.86108pt\rrparenthesis}^{\rho}\mathbb{m}\end{array}\right.\!\right\}={\llbracket\mbox{\sf b}\rrbracket}^{\rho}\mathbb{m}\\ \end{array}
  • Υ¯[ρ](eval(𝗌))𝕞=eval(ρ^(𝗌))𝕞={𝚌𝕞|𝚌𝕞𝗌ρImpρ𝕞}By additivity of ρ=𝗌ρImpρ𝕞=eval(𝗌)ρ𝕞{{{\begin{array}[]{lll}{\llbracket\overline{\Upsilon}[\rho](\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s}))\rrbracket}\mathbb{m}&=&{\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}(\widehat{\rho}({\sf s}))\rrbracket}\mathbb{m}\\ &=&\bigsqcup\left\{\leavevmode\nobreak\ {\llbracket\mathtt{c}\rrbracket}\mathbb{m}\left|\begin{array}[]{l}{\llbracket\mathtt{c}\rrbracket}\mathbb{m}\sqsubseteq{\llbracket{\llparenthesis\hskip 0.86108pt{\sf s}\hskip 0.86108pt\rrparenthesis}^{\rho}\Cap\textsf{Imp}\rrbracket}^{\rho}\mathbb{m}\end{array}\right.\!\right\}\\ \mbox{By additivity of ${\llbracket\cdot\rrbracket}^{\rho}$}&=&{\llbracket{\llparenthesis\hskip 0.86108pt{\sf s}\hskip 0.86108pt\rrparenthesis}^{\rho}\Cap\textsf{Imp}\rrbracket}^{\rho}\mathbb{m}={\llbracket\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{eval}}}}}}({\sf s})\rrbracket}^{\rho}\mathbb{m}\\ \end{array}

Finally, for each set of labels Φ\Phi, we have that Υ¯[ρ](Φ)𝕞=_φΦΥ¯[ρ](φ)𝕞=_φΦφρ𝕞=Φρ𝕞{\llbracket\overline{\Upsilon}[\rho](\Phi)\rrbracket}\mathbb{m}=\bigsqcup_{\_}{\varphi\in\Phi}{\llbracket\overline{\Upsilon}[\rho](\varphi)\rrbracket}\mathbb{m}=\bigsqcup_{\_}{\varphi\in\Phi}{\llbracket\varphi\rrbracket}^{\rho}\mathbb{m}={\llbracket\Phi\rrbracket}^{\rho}\mathbb{m}, since all the involved functions are additive by definition or by construction.

Then we have that:

Theorem 3.4.

Let ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V}) additive, and ηuco((Ψ))\eta\in\mbox{\it uco}(\wp(\Psi)). Then η¯_=defΥ¯[ρ]η\overline{\eta}_{\_}\uparrow\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\overline{\Upsilon}[\rho]\circ\eta satisfies Eq. 1.

Proof 3.5.

It is worth noting that, we trivially have by abstraction that φΨ.η_(φ)η_(φ)ρ\forall\varphi\in\Psi.\>{\llbracket\eta_{\_}\uparrow(\varphi)\rrbracket}\subseteq{\llbracket\eta_{\_}\uparrow(\varphi)\rrbracket}^{\rho}. Let us prove the other implication: φΨ\forall\varphi\in\Psi

η_(φ)=Υ¯[ρ]η(φ)=Υ¯[ρ]Υ¯[ρ]η(φ)[By properties of uco]=Υ¯[ρ]η(φ)ρ[By Lemma. 3.2]=η_(φ)ρ\begin{array}[]{llr}{\llbracket\eta_{\_}\uparrow(\varphi)\rrbracket}&={\llbracket\overline{\Upsilon}[\rho]\circ\eta(\varphi)\rrbracket}&\\ &={\llbracket\overline{\Upsilon}[\rho]\circ\overline{\Upsilon}[\rho]\circ\eta(\varphi)\rrbracket}&\qquad[\mbox{By properties of uco}]\\ &={\llbracket\overline{\Upsilon}[\rho]\circ\eta(\varphi)\rrbracket}^{\rho}&[\mbox{By Lemma.\leavevmode\nobreak\ \ref{lemmaImp}}]\\ &={\llbracket\eta_{\_}\uparrow(\varphi)\rrbracket}^{\rho}\end{array}

This result tells us that by taking a code abstraction more abstract than (or equal to) Υ¯[ρ]\overline{\Upsilon}[\rho], we guarantee that the abstract interpretation ρ\rho can be performed on the abstracted program (Eq. 1). We have so far proved that it is always possible to force Eq. 1, in order to make it possible to continue the analysis (observing ρ\rho) also on the abstracted code. In the following we show how this framework can be integrated with the existing analysis of dynamic code [4] in order to improve its precision.

4 An Improved Dynamic Code Analysis

In this section we show how the constructive code abstraction characterization, provided in the previous section, can be used for representing the code approximation which soundly captures the potential code executed by a string-to-code statement. As we will show, without abstracting code, we cannot capture situations where the collecting semantics on strings generates sets of statements that cannot be represented by using the concrete syntax. Nevertheless, we must also observe that the analyzer cannot change dynamically with the generated code, hence the abstraction must be driven by the semantic property analyzed. This means that, without using the proposed framework, the analysis would surely be less precise in those situations where code abstraction becomes a necessity.

Let us summarize how we propose to exploit the framework:

  • \bullet

    Consider a fixed semantic abstraction ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V}) and a corresponding static analyzer, designed in such a way that it can interpret also code abstracted by Υ¯[ρ]\overline{\Upsilon}[\rho].

  • \bullet

    Analyze the program, and when an eval is met, extract the language of its argument. If the language is infinite (under specific conditions that we will discuss) build the abstract 𝖢𝖥𝖦\mathsf{CFG} approximating it and extract the corresponding code abstraction η\eta. In general, this code abstraction η\eta is not more abstract than Υ¯[ρ]\overline{\Upsilon}[\rho] (the code abstraction already embedded in the static analyzer, depending only on ρ\rho);

  • \bullet

    Build Υ¯[ρ]η\overline{\Upsilon}[\rho]\circ\eta in order to make also the generated code (approximated by the generated abstract 𝖢𝖥𝖦\mathsf{CFG}) analyzable by the static analysis for ρ\rho.

Analyzing Dynamic Code.

Let ρ\rho be a static analysis performing in particular ρSuco((𝕊))\rho_{\mbox{\tiny\sl S}}\in uco(\wp(\mathbb{S})) on strings, where 𝕊=𝒦\mathbb{S}=\mathcal{K}^{*} denotes strings over a finite alphabet 𝒦\mathcal{K}. Note that, our analyzer has to work on any (abstract) 𝖢𝖥𝖦\mathsf{CFG} that can be dynamically generated, hence it has to be designed with this purpose in mind. In particular, as we will show, we will generate only abstract 𝖢𝖥𝖦\mathsf{CFG}s with a code abstraction η\eta complete w.r.t. ρ\rho. This means, by construction, that η\eta must be more abstract than Υ¯[ρ]\overline{\Upsilon}[\rho], which means that each set of elements in η\eta corresponds to a subset of the elements (abstract predicates) of Υ¯[ρ]\overline{\Upsilon}[\rho]. Hence, in order to guarantee to interpret predicates in any η\eta complete, it is sufficient to design the analyzer soundly interpreting any abstract predicate in Υ¯[ρ]\overline{\Upsilon}[\rho]. For instance, Υ¯[𝖲𝗂𝗀𝗇]\overline{\Upsilon}[\mathsf{Sign}] is the abstraction containing all the predicates, involving integers, of the form x:=S, x<S, etc, with S𝖲𝗂𝗀𝗇{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{S}}}}}}\in\mathsf{Sign}, e.g., an abstract predicate is x:=+{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\mathbb{Z}^{+}, and the analyzer for 𝖲𝗂𝗀𝗇\mathsf{Sign} should be able to interpret also such abstract predicates.
Let x be the input string parameter of an eval statement, we denote by 𝒮ρS(x){\mathcal{S}^{\rho_{\mbox{\tiny\sl S}}}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}}) the abstract value for x computed by the analysis on ρS\rho_{\mbox{\tiny\sl S}}. For example, suppose that the collection of values for the string x before the eval is {a:=0,a:=1}{{\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{a}}:=0}}}},\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{a}}:=1}}}}\}. By defining ρS\rho_{\mbox{\tiny\sl S}} as the kk-bounded string set abstract domain [2], with k=2k=2, 𝒮ρS(x)={a:=0,a:=1}{{{\mathcal{S}^{\rho_{\mbox{\tiny\sl S}}}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})=\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{a}}:=0}}}},\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{a}}:=1}}}}\}, while by using the prefix abstract domain 𝒫¯\overline{\mathcal{PR}}  [9], 𝒮𝒫¯(x)={a:=s|s𝕊}{{\mathcal{S}^{\mbox{\tiny$\overline{\mathcal{PR}}$}}(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}}}}})=\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{a}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in\mathbb{S}\end{array}\right.\!\right\}. When the abstracted string and the abstraction is clear from the context, we simply denote this set by 𝒮\mathcal{S} and we assume (for the sake of simplicity) that any string in 𝒮\mathcal{S} is an executable language statement101010Note that, this assumption corresponds to a decidable condition, hence it is possible to check it and to implement ad hoc solutions when it does not hold.. In the following, we abuse notation by denoting 𝒮\mathcal{S} also the automaton recognizing the language.
Consider for example, the program reported in Fig. 5(a), a program building and manipulating the string str at run-time, which is, afterwards, interpreted as executable code, being the input parameter of the string-to-code statement eval. Since the value of N is unknown at compile-time, we cannot predict the precise number of iterations of the while-loop. In this case, a suitable string abstract analysis would approximate the value of str, before the eval execution, to an abstract value corresponding to an over-approximation of the possible values for str, which may be also, due to abstraction, an infinite set of strings, and therefore an infinite set of possible programs.

1{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{1}}}}str := "x:=5"; 2{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{2}}}}i := 0; 3{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{3}}}}while (i < N) { 4{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{4}}}}str := str + "5"; 5{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{5}}}}i:=i+1;6{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{6}}}} } 7{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{7}}}}str := "if(x<5){" + str + "}else{x:=1};"; 8{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{8}}}}eval(str)9{{}^{{\color[rgb]{.6,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{.6,0,0}\ell_{9}}}}

(a)
Refer to caption
(b)
Figure 5: (a) Dynamically-generating code sample. (b) 𝖢𝖥𝖦\mathsf{CFG} associated to str labeled with abstract expressions.

For instance, in the example, if we abstract strings into the regular expression abstract domain [8] (or equivalently into the finite state automata abstract domain [3]), the value of str after the while loop will be the abstract value 𝚡:=𝟻(𝟻);\mathtt{x:=5(5)^{*};} corresponding to an infinite set of programs, i.e., x:=5;, x:=55, x:=555;…. In this case, the common practice for analyzing eval is simply to give up with the analysis, for example by halting the analysis throwing an exception [17] or forbidding its usage [18].

Let ρ𝒞𝒮\rho_{\scriptscriptstyle\mathcal{CS}} be the abstract domain for all the possible values (integers, strings and booleans) [4]. Note that, Υ¯[ρ𝒞𝒮]\overline{\Upsilon}[\rho_{\scriptscriptstyle\mathcal{CS}}] contains, for integers, predicates like the ones in the abstract 𝖢𝖥𝖦\mathsf{CFG} in Fig. 4.
The analysis ρ𝒞𝒮\rho_{\scriptscriptstyle\mathcal{CS}} at point _3\ell_{\_}3, due to widening111111Widening is a fix-point accelerator used in infinite domains with infinite ascending chains, namely where the semantic fix-point computation may diverge. In this case we use a widening on automata defined in [8] applied in the analysis of the while loop [3], abstracts the value of str in the infinite language {x:=s|s(5)+}{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\} (namely x is assigned to any value represented by a finite sequence of 55). Hence, at point _8\ell_{\_}8 the analysis abstracts str to the strings set 𝒮_𝚜𝚝𝚛={if(x<5){x:=s}else{x:=1}|s(5)+}{\mathcal{S}_{\_}{\mathtt{str}}=\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{if}}({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)\textbraceleft{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}\textbraceright{\@listingGroup{ltx_lst_identifier}{else}}\textbraceleft{\@listingGroup{ltx_lst_identifier}{x}}:=1\textbraceright}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\} meaning that, the true-branch of the string that may be transformed by eval may be either x:=5, or x:=55, or x:=555,{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=555}}}},\dots. The automaton corresponding to the abstract value of str is reported in Fig. 6, and it denotes an infinite language, i.e., an infinite set of possible statements. Unfortunately, this is a problem for the analysis provided in [4], where the language containing all the possible strings would be returned, losing any precision.

Refer to caption
Figure 6: Finite state automaton corresponding to the abstract value of str.

Generating the Code: From Automata to 𝖢𝖥𝖦\mathsf{CFG}s.

At this point, we have the (potentially infinite) language of the eval argument (and hence an automaton 𝒮\mathcal{S}), and the goal is to generate a 𝖢𝖥𝖦\mathsf{CFG} modeling an over-approximation of the executable code contained in the language of the automaton 𝒮\mathcal{S}. The idea is to generate a 𝖢𝖥𝖦\mathsf{CFG} from a language of strings, i.e., from an automaton, by performing a parsing on the paths of the automaton. Indeed, we have defined and implemented an algorithm121212In the following, we only discuss the main parts of the algorithm for space limitations., reported in Alg. 1, performing an abstract parser on automata that, given an automaton 𝒮\mathcal{S}, returns the 𝖢𝖥𝖦\mathsf{CFG} 𝒫\mathcal{P} that over-approximates, for each s𝒮s\in\mathcal{S} (executable), the concrete execution of eval.

The idea of Alg. 1 is to perform a depth-first search on the automaton and, when a language statement is recognized, to generate an edge in the 𝖢𝖥𝖦\mathsf{CFG}. This phase is handled by lines 3-13 of Alg. 1, building the set of nodes Nodes and the set of edges Edges of the resulting 𝖢𝖥𝖦\mathsf{CFG} 𝒫\mathcal{P}. The set WW contains the states of the finite state automaton for which we still have to generate edges in the 𝖢𝖥𝖦\mathsf{CFG} and it is initialized, at line 2, with the initial state q_0q_{\_}0. At this point, Alg. 1 looks for language statements readable from any path of the input automaton starting from a state qq, taken from WW, by means of the module ReduceStmts\mathrm{ReduceStmts} (line 5). In particular, ReduceStmts\mathrm{ReduceStmts} returns a set of triples (q,c,q′′)(q^{\prime},\mbox{\tt c},q^{\prime\prime}), where each returned triple means that from qQq^{\prime}\in Q to q′′Qq^{\prime\prime}\in Q a language statement c has been recognized.

Input: 𝒮=(Q,𝒦,δ,q_0,F)\mathcal{S}=(Q,\mathcal{K},\delta,q_{\_}0,F)
Output: 𝖢𝖥𝖦\mathsf{CFG} 𝒫\mathcal{P} over-approximating executable strings of 𝒮\mathcal{S}
1
2𝒮=ReduceCycles(𝒮)\mathcal{S}=\mathrm{ReduceCycles}(\mathcal{S});
3 Nodes\mbox{\sl Nodes}\leftarrow\varnothing; Edges\mbox{\sl Edges}\leftarrow\varnothing; W{q_0}W\leftarrow\{q_{\_}0\}; visitedvisited\leftarrow\varnothing;
4while WW\neq\varnothing do
5       select and remove qq from WW;
6       stmtsReduceStmts(𝒮,q)stmts\leftarrow\mathrm{ReduceStmts}(\mathcal{S},q);
7       foreach (q,c,q′′)stmts(q^{\prime},\mbox{\tt c},q^{\prime\prime})\in stmts do
8             NodesNodes{𝗅𝖺𝖻(q),𝗅𝖺𝖻(q′′)}\mbox{\sl Nodes}\leftarrow\mbox{\sl Nodes}^{\prime}\cup\{\mathsf{lab}(q^{\prime}),\mathsf{lab}(q^{\prime\prime})\};
9            
10            EdgesEdges{(𝗅𝖺𝖻(q),c,𝗅𝖺𝖻(q′′))}\mbox{\sl Edges}\leftarrow\mbox{\sl Edges}\cup\{(\mathsf{lab}(q^{\prime}),\mbox{\tt c},\mathsf{lab}(q^{\prime\prime}))\};
11            
12            visitedvisited{q}visited\leftarrow visited\cup\{q^{\prime}\};
13             WW{q′′}W\leftarrow W\cup\{q^{\prime\prime}\};
14             WWvisitedW\leftarrow W\smallsetminus visited;
15            
16       end foreach
17      
18 end while
19return 𝒫=Nodes,Edges\mathcal{P}=\langle\mbox{\sl Nodes},\mbox{\sl Edges}\rangle;
Algorithm 1

The set returned by ReduceStmts\mathrm{ReduceStmts} corresponds to the set of statements of 𝒫\mathcal{P} readable from the state qq, hence they are added to Edges, substituting the reached states with the corresponding labels by means of the function 𝗅𝖺𝖻\mathsf{lab} (lines 7-8). At this point, we need to look for the statements that can be read from q′′q^{\prime\prime}, hence, q′′q^{\prime\prime} is added to WW in order to be eventually processed at the next iterations of the while loop at lines 3-13. When there are no more states of 𝒮\mathcal{S} to be processed, namely when WW is empty, the 𝖢𝖥𝖦\mathsf{CFG} 𝒫=Nodes,Edges\mathcal{P}=\langle\mbox{\sl Nodes},\mbox{\sl Edges}\rangle is returned (line 14), with entry label 𝗅𝖺𝖻(q_0)\mathsf{lab}(q_{\_}0) and exit labels the ones associated with the states in FF.

Problems arise when the automaton contains cycles (namely, when the automaton denotes an infinite language). In this case, Alg. 1 first transforms, at line 1, the input automaton, over the alphabet 𝒦\mathcal{K}, in an automaton without cycles, over the alphabet 𝒦(𝒦)\mathcal{K}\cup\wp(\mathcal{K}^{*}), by means of the module ReduceCycles\mathrm{ReduceCycles}. Given an input automaton 𝒮\mathcal{S}, we retrieve the cycles of 𝒮\mathcal{S} using the well-known Tarjan’s algorithm [27] for identifying cycles. Then, for each detected cycle of 𝒮\mathcal{S}, we check whether the string read by the cycle is a whole statement 𝗋\mathsf{r} or not. In the first case, we substitute the cycle of the string 𝗋\mathsf{r} in the automaton, i.e., 𝗋\mathsf{r}^{*}, with the automaton reading the string corresponding to the statement while(true){ 𝗋\mathsf{r} } over the alphabet 𝒦\mathcal{K}. Otherwise, if the cycle does not read a whole statement, the idea is to collapse the cycle in a single transition, labeled with the regular expression corresponding to what is read in the cycle, i.e., denoting a set of string on 𝒦\mathcal{K} ((𝒦)\wp(\mathcal{K}^{*})). Hence the resulting automaton is on the alphabet 𝒦(𝒦)\mathcal{K}\cup\wp(\mathcal{K}^{*}). In Fig. 7 we report an example of application of ReduceCycles\mathrm{ReduceCycles} algorithm.

Refer to caption
(a)
Refer to caption
(b)
Figure 7: (a) Finite state automaton with cycle. (b) Result of ReduceCycles\mathrm{ReduceCycles}.

As example note that, by applying Alg. 1 to the automaton for 𝒮_𝚜𝚝𝚛\mathcal{S}_{\_}{\mathtt{str}} in Fig. 6, we generate the 𝖢𝖥𝖦\mathsf{CFG} 𝒫_𝚜𝚝𝚛\mathcal{P}_{\_}{\mathtt{str}}, depicted in Fig. 5(b). It is worth noting that the 𝖢𝖥𝖦\mathsf{CFG} obtained so far may contain abstract expressions on edges, hence edges may represent an infinite collection of statements. At this point, we need to approximate these edges for making it possible to analyze the 𝖢𝖥𝖦\mathsf{CFG}.

Making the Code Analyzable: Abstracting the 𝖢𝖥𝖦\mathsf{CFG}.

Let us recall that we have to perform the analysis ρ\rho also on the resulting code, in order to continue the static analysis. Hence, as observed before, we have to combine the code abstraction corresponding to the generated (abstract) 𝖢𝖥𝖦\mathsf{CFG} with the code abstraction induced by the semantic abstraction ρ\rho, i.e., Υ¯[ρ]\overline{\Upsilon}[\rho], which models, as code abstraction, the analysis.
First of all, we have to formally characterize the abstraction η\eta induced by the construction of the 𝖢𝖥𝖦\mathsf{CFG} given above, namely we characterize how the construction abstracts together different predicates. Let us build a code abstraction starting from the 𝖢𝖥𝖦\mathsf{CFG} 𝒫=Nodes,Edges\mathcal{P}=\langle\mbox{\sl Nodes},\mbox{\sl Edges}\rangle built in Alg. 1: In particular, let Merge=def{{φΨ|,φ,′′Edges}|,′′Nodes}Ψ\mbox{\sl Merge}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\left\{\leavevmode\nobreak\ \left\{\leavevmode\nobreak\ \varphi\in\Psi\left|\begin{array}[]{l}\langle\ell^{\prime},\varphi,\ell^{\prime\prime}\rangle\in\mbox{\sl Edges}\end{array}\right.\!\right\}\left|\begin{array}[]{l}\ell^{\prime},\ell^{\prime\prime}\in\mbox{\sl Nodes}\end{array}\right.\!\right\}\subseteq\Psi be the set of collections of predicates between any pair of states in the 𝖢𝖥𝖦\mathsf{CFG}, we define

η𝒫((Ψ))=def({XMerge|YMerge{X}.XY=})uco((Ψ))\eta^{\mathcal{P}}(\wp(\Psi))\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\wp(\left\{\leavevmode\nobreak\ X\in\mbox{\sl Merge}\left|\begin{array}[]{l}\forall Y\in\mbox{\sl Merge}\smallsetminus\{X\}.\>X\cap Y=\varnothing\end{array}\right.\!\right\})\in\mbox{\it uco}(\wp(\Psi)) (2)

Note that, this abstraction, being characterized starting from the 𝖢𝖥𝖦\mathsf{CFG} is defined only in terms of a finite subset of Ψ\Psi, namely on the predicates in the given 𝖢𝖥𝖦\mathsf{CFG}, i.e., Ψ𝒫=defΨ{φ|,φ,′′Edges}\Psi^{\mathcal{P}}\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\stackrel{{\scriptstyle\mbox{\tiny def}}}{{\;=\;}}$}}\Psi\cap\left\{\leavevmode\nobreak\ \varphi\left|\begin{array}[]{l}\langle\ell^{\prime},\varphi,\ell^{\prime\prime}\rangle\in\mbox{\sl Edges}\end{array}\right.\!\right\}.
In the example, Ψ𝒫_𝚜𝚝𝚛((Ψ))={{x:=s|s(5)+},{x:=1},{(x<5)},{¬(x<5)}}{{{{\Psi^{\mathcal{P}_{\_}{\mathtt{str}}(\wp(\Psi))}=\{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\},\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=1}}}}\},\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)}}}}\},\{\neg\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)}}}}\}\}, hence we have that η𝒫_𝚜𝚝𝚛=(Ψ𝒫_𝚜𝚝𝚛)\eta^{\mathcal{P}_{\_}{\mathtt{str}}}=\wp(\Psi^{\mathcal{P}_{\_}{\mathtt{str}}}), being Ψ𝒫_𝚜𝚝𝚛\Psi^{\mathcal{P}_{\_}{\mathtt{str}}} already a partition. In Fig. 8(a) this abstraction is partially depicted.

\top{x:=1}{x:=s|s(5)+}{{\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=1}}}}\}\cup\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\}\vee-closure{¬(x<5)}{\{\neg\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)}}}}\}{(x<5)}{\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)}}}}\}{x:=s|s(5)+}{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\}{x:=1}{\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=1}}}}\}\bot
(a)
\topx:=+(x<+){{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\mathbb{Z}^{+}\vee(\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}\textless}}}}\mathbb{Z}^{+})x<+¬(x<+){{{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}\textless}}}}\mathbb{Z}^{+}\vee\neg\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless}}}}\mathbb{Z}^{+}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers)}}}}x:=+¬(x<+){{{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\mathbb{Z}^{+}\vee\neg\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless}}}}\mathbb{Z}^{+}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers)}}}}¬(x<+){{\neg\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless}}}}\mathbb{Z}^{+}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers)}}}}(x<+){{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless}}}}\mathbb{Z}^{+}\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers)}}}}x:=+{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=}}}}\mathbb{Z}^{+}\bot
(b)
Figure 8: (a) Code abstraction η𝒫_𝚜𝚝𝚛\eta^{\mathcal{P}_{\_}{\mathtt{str}}} w.r.t. the 𝖢𝖥𝖦\mathsf{CFG} reported in Fig. 5(b), (b) Code abstraction Υ¯[ρ𝒞𝒮]𝒫_𝚜𝚝𝚛\overline{\Upsilon}[\mathsf{\rho_{\scriptscriptstyle\mathcal{CS}}}]^{\mathcal{P}_{\_}{\mathtt{str}}}

Finally, we need to satisfy Eq. 1 (completeness) between the code abstraction η𝒫\eta^{\mathcal{P}}, built so far, and the static analysis, modeled as a semantic abstraction ρ\rho, performing ρS\rho_{\mbox{\tiny\sl S}} (introduced above) on strings. Clearly we have no guarantee that η𝒫\eta^{\mathcal{P}} satisfies Eq. 1, hence, we have to (further) abstract the 𝖢𝖥𝖦\mathsf{CFG} in order to guarantee completeness w.r.t. the performed static analysis, namely in order to make it possible to perform the given static analysis on the code in the generated 𝖢𝖥𝖦\mathsf{CFG}. As observed in the previous section, in order to force completeness, we have to combine the desired abstraction η𝒫\eta^{\mathcal{P}} on predicates, with the code abstraction Υ¯[ρ]\overline{\Upsilon}[\rho]. Formally, in order to allow this operation, since η𝒫\eta^{\mathcal{P}} is defined on Ψ𝒫\Psi^{\mathcal{P}}, we have to restrict also Υ¯[ρ]\overline{\Upsilon}[\rho] on Ψ𝒫\Psi^{\mathcal{P}} (denoted Υ¯[ρ]𝒫\overline{\Upsilon}[\rho]^{\mathcal{P}}). This abstraction is obtained by intersecting the meaning of each one of its elements (i.e., its concretization) with the set of predicates in the 𝖢𝖥𝖦\mathsf{CFG}. In the running example, we have to compute Υ¯[ρ𝒞𝒮]𝒫_𝚜𝚝𝚛\overline{\Upsilon}[\mathsf{\rho_{\scriptscriptstyle\mathcal{CS}}}]^{\mathcal{P}_{\_}{\mathtt{str}}}, which is the code abstraction induced by the 𝖲𝗂𝗀𝗇\mathsf{Sign} on the predicates in 𝒫_𝚜𝚝𝚛\mathcal{P}_{\_}{\mathtt{str}}. For instance, all the predicates in {x:=s|s(5)+}{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\} and the predicate x:=1 cannot be distinguished when integers are abstracted by observing only their signs, hence the resulting abstraction is depicted in Fig. 8(b), where the abstract predicate x:=+\mathbb{Z}^{+} corresponds, in the concrete, to the set of predicates {x:=s|s(5)+}{x:=1}{{\left\{\leavevmode\nobreak\ \mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:={\@listingGroup{ltx_lst_identifier}{s}}}}}}\left|\begin{array}[]{l}s\in(5)^{+}\end{array}\right.\!\right\}\cup\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers{\@listingGroup{ltx_lst_identifier}{x}}:=1}}}}\}, while x<+\mathbb{Z}^{+} and ¬\neg(x<+\mathbb{Z}^{+}) correspond, respectively, to {(x<5)}{\{\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)}}}}\} and to {¬(x<5)}{\{\neg\mbox{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers({\@listingGroup{ltx_lst_identifier}{x}}\textless 5)}}}}\} (all the other elements corresponds to \bot).

Finally, we aim at building a code abstraction which can be interpreted by the initial abstract interpreter ρ\rho, namely, that satisfies Eq. 1. By Th. 3.4 such an abstraction is η¯_𝒫=Υ¯[ρ]𝒫η𝒫\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\overline{\eta}_{\_}\uparrow^{\mathcal{P}}=\overline{\Upsilon}[\rho]^{\mathcal{P}}$}}\circ\eta^{\mathcal{P}}.

Corollary 4.1.

Let ρuco(𝕍)\rho\in\mbox{\it uco}(\mathbb{V}) be additive. Then the code abstraction η¯_𝒫=Υ¯[ρ]𝒫η𝒫uco(Ψ𝒫)\overline{\eta}_{\_}\uparrow^{\mathcal{P}}=\overline{\Upsilon}[\rho]^{\mathcal{P}}\circ\eta^{\mathcal{P}}\in\mbox{\it uco}(\Psi^{\mathcal{P}}) is complete w.r.t. the semantic abstraction ρ\rho, i.e., it satisfies Eq. 1.

Refer to caption
Figure 9: Abstract 𝖢𝖥𝖦\mathsf{CFG} generated by abstracting 𝒫_𝚜𝚝𝚛\mathcal{P}_{\_}{\mathtt{str}} by means of η¯_𝒫_𝚜𝚝𝚛\overline{\eta}_{\_}\uparrow^{\mathcal{P}_{\_}{\mathtt{str}}}

Hence, in our example, the code abstraction η¯_𝒫_𝚜𝚝𝚛=Υ¯[ρ𝒞𝒮]𝒫_𝚜𝚝𝚛η𝒫_𝚜𝚝𝚛\mbox{\raisebox{0.0pt}[4.30554pt][4.30554pt]{$\overline{\eta}_{\_}\uparrow^{\mathcal{P}_{\_}{\mathtt{str}}}=\overline{\Upsilon}[\mathsf{\rho_{\scriptscriptstyle\mathcal{CS}}}]^{\mathcal{P}_{\_}{\mathtt{str}}}$}}\circ\eta^{\mathcal{P}_{\_}{\mathtt{str}}} satisfies Eq. 1. In particular, we can observe that η¯_𝒫_𝚜𝚝𝚛=Υ¯[ρ𝒞𝒮]𝒫_𝚜𝚝𝚛\overline{\eta}_{\_}\uparrow^{\mathcal{P}_{\_}{\mathtt{str}}}=\overline{\Upsilon}[\mathsf{\rho_{\scriptscriptstyle\mathcal{CS}}}]^{\mathcal{P}_{\_}{\mathtt{str}}}. Finally, we have to abstract the 𝖢𝖥𝖦\mathsf{CFG} 𝒫\mathcal{P}, previously generated, by applying η¯_𝒫\overline{\eta}_{\_}\uparrow^{\mathcal{P}} to each edge of the 𝖢𝖥𝖦\mathsf{CFG}. In our example, the so far resulting abstract 𝖢𝖥𝖦\mathsf{CFG} is reported in Fig. 9, where the abstract 𝖢𝖥𝖦\mathsf{CFG} generated by abstracting 𝒫_𝚜𝚝𝚛\mathcal{P}_{\_}{\mathtt{str}} by means of η¯_𝒫_𝚜𝚝𝚛\overline{\eta}_{\_}\uparrow^{\mathcal{P}_{\_}{\mathtt{str}}} is depicted.

A Taste of Implementation.

A static analyzer based on finite state automata is available at [3]. Moreover, we have implemented Alg. 1 in order to validate our approach131313Available at
https://github.com/SPY-Lab/java-fsm-library/tree/abstract-parser
. The implementation of a static analysis of abstract 𝖢𝖥𝖦\mathsf{CFG}s is in an early stage development and it is left as future work. Nevertheless, it is able to parse executable automata and to abstract them into abstract 𝖢𝖥𝖦\mathsf{CFG}s, as we have previously described. In order to make these abstract 𝖢𝖥𝖦\mathsf{CFG}s effectively analyzable, we are currently extending the static analyzer, and the underlying abstract interpreter, to parse, and thus analyze, also abstract predicates.

5 Conclusion

We conclude by highlighting the value, in the context of static analysis, of the framework presented in this paper. What we propose here is a precision improvement of [4], an analysis that attacks an extremely hard problem in static program analysis by abstract interpretation, since the standard static analysis assumption (i.e., the program code we want to analyze must be static) is broken when we have to deal with string-to-code statements. In [4], we have shown that even without this assumption, it is still possible for static analysis to semantically analyze dynamically mutating code in a meaningful and sound way. It has been the very first proof of concept for a sound static analysis for self-modifying code based on bounded reflection for a high-level script-like programming language. In this paper, we improve this approach by characterizing code transformations that do not lose precision w.r.t. a fixed abstract semantics/analysis of the code. The idea we develop consists of embedding the property to analyze in the code transformation in order to make the property analysis work also on the transformed code (as it happens in dynamic code analysis). Hence, the main contribution is to make even more precise the first truly dynamic static analyzer, which has the feature to keep the analysis going on, even when code is dynamically built.
Clearly, the framework improved here is still at an early stage and surely there is much work to do, not only for the presented algorithm and the implementation, which has clearly to be further developed but also for making the approach more precise and general. As far as the algorithm is concerned we have not explicitly provided soundness and completeness proofs or discussions. In particular, completeness holds under decidable hypotheses (the input automaton has to recognize only executable strings), here only briefly treated, and therefore these aspects need further formal development.
On the other hand, a direction for improving precision can be that of integrating the proposed static analysis in a hybrid solution, by using, for instance, taint analysis (or other dynamic analyses) for driving when to apply static analysis, or considering more advanced forms of automata-based domains for abstracting strings, such as the one reported in [24]. Finally, we have considered only eval as a string-to-code statement, while there are other ways, for dynamically executing code built out of strings, that should be investigated. However, we strongly believe that the same approach used for eval, could be easily applied to any other string-to-code statement. Moreover, we believe that this framework could be instantiated in order to deal with other forms of code transformations, maybe by considering more general 𝖢𝖥𝖦\mathsf{CFG} abstractions.

From a more theoretical point of view, interesting future works consist of exploiting the proposed approach for analyzing code in order to investigate, on dynamic languages, several application contexts where static analysis by abstract interpretations has been exploited. First of all, we could trace (abstract) flows of information during execution [15, 21, 19, 20, 13, 12, 11] in order to tackle different security issues, such as the detection of (abstract) code injections [7, 6] or the formal characterization of dynamic code obfuscators and of their potency [10, 14]. Moreover, the ability to analyze malware code could be exploited for extracting code properties which could be used for analyzing code similarity [25], a technique useful for instance to identify or at least classify malicious code.

References

  • [1]
  • [2] Roberto Amadini, Graeme Gange, François Gauthier, Alexander Jordan, Peter Schachte, Harald Søndergaard, Peter J. Stuckey & Chenyi Zhang (2018): Reference Abstract Domains and Applications to String Analysis. Fundam. Informaticae 158(4), pp. 297–326, 10.3233/FI-2018-1650.
  • [3] Vincenzo Arceri & Isabella Mastroeni (2019): An Automata-based Abstract Semantics for String Manipulation Languages. In Alexei Lisitsa & Andrei P. Nemytykh, editors: Proceedings Seventh International Workshop on Verification and Program Transformation, VPT@Programming 2019, Genova, Italy, 2nd April 2019, EPTCS 299, pp. 19–33, 10.4204/EPTCS.299.5.
  • [4] Vincenzo Arceri & Isabella Mastroeni (2021): Analyzing Dynamic Code: A Sound Abstract Interpreter for Evil Eval. ACM Trans. Priv. Secur. 24(2), pp. 10:1–10:38, 10.1145/3426470.
  • [5] Vincenzo Arceri, Isabella Mastroeni & Sunyi Xu (2020): Static Analysis for ECMAScript String Manipulation Programs. Appl. Sci. 10, p. 3525, 10.3390/app10103525.
  • [6] Musard Balliu & Isabella Mastroeni (2010): A Weakest Precondition Approach to Robustness. Trans. Comput. Sci. 10, pp. 261–297, 10.1007/978-3-642-17499-5_11.
  • [7] Samuele Buro & Isabella Mastroeni (2018): Abstract Code Injection - A Semantic Approach Based on Abstract Non-Interference. In Isil Dillig & Jens Palsberg, editors: Verification, Model Checking, and Abstract Interpretation - 19th International Conference, VMCAI 2018, Los Angeles, CA, USA, January 7-9, 2018, Proceedings, Lecture Notes in Computer Science 10747, Springer, pp. 116–137, 10.1007/978-3-319-73721-8_6.
  • [8] Tae-Hyoung Choi, Oukseh Lee, Hyunha Kim & Kyung-Goo Doh (2006): A Practical String Analyzer by the Widening Approach. In Naoki Kobayashi, editor: Programming Languages and Systems, 4th Asian Symposium, APLAS 2006, Sydney, Australia, November 8-10, 2006, Proceedings, Lecture Notes in Computer Science 4279, Springer, pp. 374–388, 10.1007/11924661_23.
  • [9] Giulia Costantini, Pietro Ferrara & Agostino Cortesi (2015): A suite of abstract domains for static analysis of string values. Softw. Pract. Exp. 45(2), pp. 245–287, 10.1002/spe.2218.
  • [10] Roberto Giacobazzi, Neil D. Jones & Isabella Mastroeni (2012): Obfuscation by partial evaluation of distorted interpreters. In Oleg Kiselyov & Simon J. Thompson, editors: Proceedings of the ACM SIGPLAN 2012 Workshop on Partial Evaluation and Program Manipulation, PEPM 2012, Philadelphia, Pennsylvania, USA, January 23-24, 2012, ACM, pp. 63–72, 10.1145/2103746.2103761.
  • [11] Roberto Giacobazzi & Isabella Mastroeni (2004): Proving Abstract Non-interference. In Jerzy Marcinkowski & Andrzej Tarlecki, editors: Computer Science Logic, 18th International Workshop, CSL 2004, 13th Annual Conference of the EACSL, Karpacz, Poland, September 20-24, 2004, Proceedings, Lecture Notes in Computer Science 3210, Springer, pp. 280–294, 10.1007/978-3-540-30124-0_23.
  • [12] Roberto Giacobazzi & Isabella Mastroeni (2010): Adjoining classified and unclassified information by abstract interpretation. J. Comput. Secur. 18(5), pp. 751–797, 10.3233/JCS-2009-0382.
  • [13] Roberto Giacobazzi & Isabella Mastroeni (2010): A Proof System for Abstract Non-interference. J. Log. Comput. 20(2), pp. 449–479, 10.1093/logcom/exp053.
  • [14] Roberto Giacobazzi & Isabella Mastroeni (2012): Making Abstract Interpretation Incomplete: Modeling the Potency of Obfuscation. In Antoine Miné & David Schmidt, editors: Static Analysis - 19th International Symposium, SAS 2012, Deauville, France, September 11-13, 2012. Proceedings, Lecture Notes in Computer Science 7460, Springer, pp. 129–145, 10.1007/978-3-642-33125-1_11.
  • [15] Roberto Giacobazzi & Isabella Mastroeni (2018): Abstract Non-Interference: A Unifying Framework for Weakening Information-flow. ACM Trans. Priv. Secur. 21(2), pp. 9:1–9:31, 10.1145/3175660.
  • [16] Roberto Giacobazzi & Elisa Quintarelli (2001): Incompleteness, Counterexamples, and Refinements in Abstract Model-Checking. In Patrick Cousot, editor: Static Analysis, 8th International Symposium, SAS 2001, Paris, France, July 16-18, 2001, Proceedings, Lecture Notes in Computer Science 2126, Springer, pp. 356–373, 10.1007/3-540-47764-0_20.
  • [17] Simon Holm Jensen, Peter A. Jonsson & Anders Møller (2012): Remedying the eval that men do. In Mats Per Erik Heimdahl & Zhendong Su, editors: International Symposium on Software Testing and Analysis, ISSTA 2012, Minneapolis, MN, USA, July 15-20, 2012, ACM, pp. 34–44, 10.1145/2338965.2336758.
  • [18] Vineeth Kashyap, Kyle Dewey, Ethan A. Kuefner, John Wagner, Kevin Gibbons, John Sarracino, Ben Wiedermann & Ben Hardekopf (2014): JSAI: a static analysis platform for JavaScript. In Shing-Chi Cheung, Alessandro Orso & Margaret-Anne D. Storey, editors: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014, ACM, pp. 121–132, 10.1145/2635868.2635904.
  • [19] Isabella Mastroeni (2013): Abstract interpretation-based approaches to Security - A Survey on Abstract Non-Interference and its Challenging Applications. In Anindya Banerjee, Olivier Danvy, Kyung-Goo Doh & John Hatcliff, editors: Semantics, Abstract Interpretation, and Reasoning about Programs: Essays Dedicated to David A. Schmidt on the Occasion of his Sixtieth Birthday, Manhattan, Kansas, USA, 19-20th September 2013, EPTCS 129, pp. 41–65, 10.4204/EPTCS.129.4.
  • [20] Isabella Mastroeni & Durica Nikolic (2010): Abstract Program Slicing: From Theory towards an Implementation. In Jin Song Dong & Huibiao Zhu, editors: Formal Methods and Software Engineering - 12th International Conference on Formal Engineering Methods, ICFEM 2010, Shanghai, China, November 17-19, 2010. Proceedings, Lecture Notes in Computer Science 6447, Springer, pp. 452–467, 10.1007/978-3-642-16901-4_30.
  • [21] Isabella Mastroeni & Damiano Zanardini (2017): Abstract Program Slicing: An Abstract Interpretation-Based Approach to Program Slicing. ACM Trans. Comput. Log. 18(1), pp. 7:1–7:58, 10.1145/3029052.
  • [22] Nikos Mavrogiannopoulos, Nessim Kisserli & Bart Preneel (2011): A taxonomy of self-modifying code for obfuscation. Comput. Secur. 30(8), pp. 679–691, 10.1016/j.cose.2011.08.007.
  • [23] Antoine Miné (2013): Static analysis by abstract interpretation of concurrent programs. (Analyse statique par interprétation abstraite de programmes concurrents). Available at https://tel.archives-ouvertes.fr/tel-00903447.
  • [24] Luca Negrini, Vincenzo Arceri, Pietro Ferrara & Agostino Cortesi (2021): Twinning Automata and Regular Expressions for String Static Analysis. In Fritz Henglein, Sharon Shoham & Yakir Vizel, editors: Verification, Model Checking, and Abstract Interpretation - 22nd International Conference, VMCAI 2021, Copenhagen, Denmark, January 17-19, 2021, Proceedings, Lecture Notes in Computer Science 12597, Springer, pp. 267–290, 10.1007/978-3-030-67067-2_13.
  • [25] Mila Dalla Preda, Roberto Giacobazzi, Arun Lakhotia & Isabella Mastroeni (2015): Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of executables. In Sriram K. Rajamani & David Walker, editors: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, ACM, pp. 329–341, 10.1145/2676726.2676986.
  • [26] Gregor Richards, Christian Hammer, Brian Burg & Jan Vitek (2011): The Eval That Men Do - A Large-Scale Study of the Use of Eval in JavaScript Applications. In Mira Mezini, editor: ECOOP 2011 - Object-Oriented Programming - 25th European Conference, Lancaster, UK, July 25-29, 2011 Proceedings, Lecture Notes in Computer Science 6813, Springer, pp. 52–78, 10.1007/978-3-642-22655-7_4.
  • [27] Robert Endre Tarjan (1972): Depth-First Search and Linear Graph Algorithms. SIAM J. Comput. 1(2), pp. 146–160, 10.1137/0201010.
  • [28] Reinhard Wilhelm, Helmut Seidl & Sebastian Hack (2013): Compiler Design - Syntactic and Semantic Analysis. Springer, 10.1007/978-3-642-17540-4.