Fast and Efficient Bit-Level Precision Tuning
Abstract
In this article, we introduce a new technique for precision tuning. This problem consists of finding the least data types for numerical values such that the result of the computation satisfies some accuracy requirement. State of the art techniques for precision tuning use a try and fail approach. They change the data types of some variables of the program and evaluate the accuracy of the result. Depending on what is obtained, they change more or less data types and repeat the process. Our technique is radically different. Based on semantic equations, we generate an Integer Linear Problem (ILP) from the program source code. Basically, this is done by reasoning on the most significant bit and the number of significant bits of the values which are integer quantities. The integer solution to this problem, computed in polynomial time by a (real) linear programming solver, gives the optimal data types at the bit level. A finer set of semantic equations is also proposed which does not reduce directly to an ILP problem. So we use policy iteration to find the solution. Both techniques have been implemented and we show that our results encompass the results of state of the art tools.
Keywords:
Static analysis, computer arithmetic, integer linear problems, numerical accuracy, policy iteration.1 Introduction
Let us consider a program computing some numerical result , typically but not necessarily in the IEEE754 floating-point arithmetic [1]. Precision tuning then consists of finding the smallest data types for all the variables and expressions of such that the result has some desired accuracy. These last years, much attention has been paid to this problem [6, 8, 10, 11, 12, 18]. Indeed, precision tuning makes it possible to save memory and, by way of consequence, it has a positive impact on the footprint of programs concerning energy consumption, bandwidth usage, computation time, etc.
A common point to all the techniques cited previously is that they follow a try and fail strategy. Roughly speaking, one chooses a subset of the variables of , assigns to them smaller data types (e.g. binary32 instead of binary64 [1]) and evaluates the accuracy of the tuned program . If the accuracy of the result returned by is satisfying then new variables are included in or even smaller data types are assigned to certain variables already in (e.g. binary16). Otherwise, if the accuracy of the result of is not satisfying, then some variables are removed from . This process is applied repeatedly, until a stable state is found. Existing techniques differ in their way to evaluate the accuracy of programs, done by dynamic analysis [10, 11, 12, 18] or by static analysis [6, 8] of and . They may also differ in the algorithm used to define , delta debugging being the most widespread method [18].
Anyway all these techniques suffer from the same combinatorial limitation: If has variables and if the method tries different data types then the search space contains configurations. They scale neither in the number of variables (even if heuristics such as delta debugging [18] or branch and bound [6] reduce the search space at the price of optimality) or in the number of data types which can be tried. In particular, bit level precision tuning, which consists of finding the minimal number of bits needed for each variable to reach the desired accuracy, independently of a limited number of data types, is not an option.
So the method introduced in this article for precision tuning of programs is radically different. Here, no try and fail method is employed. Instead, the accuracy of the arithmetic expressions assigned to variables is determined by semantic equations, in function of the accuracy of the operands. By reasoning on the number of significant bits of the variables of and knowing the weight of their most significant bit thanks to a range analysis performed before the tuning phase, we are able to reduce the problem to an Integer Linear Problem (ILP) which can be optimally solved in one shot by a classical linear programming solver (no iteration). Concerning the number of variables, the method scales up to the solver limitations and the solutions are naturally found at the bit level, making the parameter irrelevant. An important point is that the optimal solution to the continuous linear programming relaxation of our ILP is a vector of integers, as demonstrated in Section 4.2. By consequence, we may use a linear solver among real numbers whose complexity is polynomial [19] (contrarily to the linear solvers among integers whose complexity is NP-complete [16]). This makes our precision tuning method solvable in polynomial-time, contrarily to the existing exponential methods.
Next, we go one step further by introducing a second set of semantic equations. These new equations make it possible to tune even more the precision by being less pessimistic on the propagation of carries in arithmetic operations. However the problem do not reduce any longer to an ILP problem ( and operators are needed). Then we use policy iteration (PI) [7] to find efficiently the solution.
Both methods have been implemented inside a tool for precision tuning named XXX111In this article, XXX hides the actual name of our tool and missing references refer to our previous work for anonymity. [5]. Formerly, XXX was expressing the precision tuning problem as a set of first order logical propositions among relations between linear integer expressions. An SMT solver (Z3 in practice [15]) was used repeatedly to find the existence of a solution with a certain weight expressing the number of significant bits () of the variables [5]. In the present article, we compare experimentally our new methods to the SMT based method previously used by XXX [5] and to the Precimonious tool [10, 18]. These experiments on programs coming from mathematical libraries or other applicative domains such as IoT [3] show that the technique introduced in this article for precision tuning clearly encompasses the state of the art techniques.
The rest of this article is organized as follows. In the next section, we provide a motivating example. We then present in Section 3 some essential background on the functions needed for the constraint generation and also we detail the set of constraints for both ILP and PI methods. Section 4 presents the proofs of correctness. We end up in Section 5 by showing that our new techniques exhibits very good results in practice before concluding in Section 6.
2 Motivating Example
A motivating example to better explain our method is given by the code snippet of Figure 1. In this example, we aim at modeling the movement of a simple pendulum without damping. Let be the length of this pendulum, its mass and Newton’s gravitational constant. We denote by the tilt angle in radians as shown in Figure 1 (initially ). The Equation describing the movement of the pendulum is given in Equation (1).
⬇
1 g = 9.81; l = 0.5;
2 y1 = 0.785398; y2 = 0.785398;
3 h = 0.1; t = 0.0;
4 while (t<10.0) {
5 y1new = y1 + y2 * h ;
6 aux1 = sin(y1) ;
7 aux2 = aux1 * h * g / l;
8 y2new = y2 - aux2;
9 t = t + h;
10 y1 = y1new; y2 = y2new;
11 };
12 require_nsb(y2,20);
⬇
1 g = 9.81; l = 0.5;
2 y1 = 0.785398;
3 y2 = 0.785398;
4 h = 0.1; t = 0.0;
5 while (t < 10.0) {
6 y1new = y1 + y2 * h;
7 aux1 = sin(y1);
8 aux2 = aux1 * h
9 * g / l;
10 y2new = y2 - aux2;
11 t = t + h;
12 y1 = y1new;
13 y2 = y2new;
14 };
15 require_nsb(y2,20);
⬇
1 g|20| = 9.81|20|; l|20| = 1.5|20|;
2 y1|29| = 0.785398|29|;
3 y2|21| = 0.0|21|;
4 h|21| = 0.1|21|; t|21| = 0.0|21|;
5 while (t<1.0) {
6 y1new|20| = y1|21| +|20| y2|21|
7 *|22| h|21|;
8 aux1|20| = sin(y1|29|)|20|;
9 aux2|20| = aux1|19| *|20| h|18|
10 *|19| g|17| /|18|l|17|;
11 y2new|20| = y2|21| -|20| aux2|18|;
12 t|20| = t|21| +|20| h|17|;
13 y1|20| = y1new|20|;
14 y2|20| = y2new|20|;
15 };
16 require_nsb(y2,20);
(1) |
Equation (1) being a second order differential equation, we need to transform it into a system of two first order differential equations for resolution so we obtain and . By applying Euler’s method to these last equations, we obtain Equation (2) implemented in Figure 1.
(2) |
The key point of our technique is to generate a set of constraints for each statement of our imperative language introduced further in Section 3. For our example, we suppose that all variables, before XXX analysis, are in double precision (source program in the top left corner of Figure 1) and that a range determination is performed by dynamic analysis on the program variables (we plan to use a static analyzer in the future). XXX assigns to each node of the program’s syntactic tree a unique control point in order to determine easily the number of significant bits of the result as mentioned in the bottom corner of Figure 1. Some notations can be stressed about the structure of XXX source code. For instance, the annotation g denotes that g has a unique control point . As well, we have the statement require_nsb(y2,20) which informs the tool that the user wants to get on variable y2 only 20 bits (we consider that a result has significants if the relative error between the exact and approximated results is less than ). Finally, the minimal precision needed for the inputs and intermediary results satisfying the user assertion is observed on the bottom right corner of Figure 1. In this code, if we consider for instance Line , then y1new|20| means that the variable needs significant bits at this point. Similarly, y1 and y2 need bits each and the addition requires bits.
The originality of our method is that we reduce the precision tuning problem to an ILP. For example, taking again Line of the pendulum code, we generate six constraints as shown in Equation 3 (this is detailed further in Section 3).
(3) |
The first two constraints are for the addition. Basically, stands for number of significant bits as described in Section 3.1. It represents the difference between the unit in the first place (, see Section 3.1) of the result of the sum and the of its error denoted . We have . As mentioned previously, the are computed by a prior range analysis. Then, at constraint generation time, they are constants. For our example, . This quantity occurs in the first constraints. The next two constraints are for the multiplication. The fourth constraint is for the assignment and the last two constraints are for the constant functions and respectively for the addition and multiplication (see next paragraphs and Section 3 for more details). Note that XXX generates such constraints for all the statements of the program.
For a user requirement of 20 bits on the variable y2 as shown in the original program of the top right corner of Figure 1 (all variables are in double precision initially), XXX succeeds in tuning the majority of variables of the pendulum program into simple precision with a total number of bits at bit level equivalent to 274 (originally the program used 689 bits). The new mixed precision formats obtained for line are y1new|20| = y1|21| +|20| y2|22| |22| h|22|.
Let us now focus on the term (for the addition). In our ILP, we always assume that is a constant function equal to . This corresponds to the carry bit which can be propagated up to the and and increments them by which is correct but pessimistic. In large codes, this function becomes very costly if we perform several computations at a time and therefore the errors would be considerable, especially that in many cases adding this carry bit is useless because the operands and their errors do not overlap. It is then crucial to use the most precise function . Unfortunately, when we model this optimization the problem is no more linear ( and operators arise) and we have to use the PI technique [7] to solve it (see Section 3.3).
In this case, by analyzing Line of our program, we have to add the following new constraints (along with the former ones) as mentioned in Equation (4). In fact, policy iteration makes it possible to break the in the function by choosing the between and , the between and and the constant . Next, it becomes possible to solve the corresponding ILP. If no fixed point is reached, XXX iterates until a solution is found as shown in Section 3.3. By applying this optimization, the new formats of line are given as y1new|20| = y1|21| +|20| y2|21| |22| h|21|. By comparing with the formats obtained above, a gain of precision of bit is observed on variables y2 and h (total of 272 bits at bit level for the optimized program). The program on the bottom right corner of Figure 1 illustrates the new optimized formats obtained by using the policy iteration technique.
(4) |
(5) |
(6) |
(7) |
(8) |
(9) |
(10) |
(11) |
3 Constraints Generation for Bit-Level Precision Tuning
In this section, we start by providing essential definitions for understanding the rest of the article. Also, we define a simple imperative language (see Figure 3) from which we generate semantic equations in order to determine the least precision needed for the program numerical values. Then, we will focus on the difference between the two sets of constraints obtained when using the simple ILP and the more complex PI formulations which optimizes the carry bit that can propagate throughout computations. The operational semantics of the language as well as the theorem proving that the solution to the system of constraints gives the desired when running programs are detailed further in Section 4.
3.1 Elements of Computer Arithmetic
Our technique is independent of a particular computer arithmetic. In fact, we manipulate numbers for which we know their unit in the first place () and the number of significant digits () defined as follows.
- Unit in the First Place
-
The unit in the first place of a real number (possibly encoded up to some rounding mode by a floating-point or a fixpoint number) is given in Equation (12). This function, which is independent of the representation of , will be used further in this section to describe the error propagation across the computations.
(12) - Number of Significant Bits
-
Intuitively, is the number of significant bits of . Formally, following Parker [17], if , for then the error on is less than . If then . For example, if the exact binary value is approximated by either or then .
In the following, we also use and to denote the and of the error on , i.e. and .
In this article, we consider a finite precision arithmetic, independently of any particular representation (IEEE754 [1], POSIT [7], ). Nevertheless, the representation being finite, roundoff errors may arise when representing values or performing elementary operation. These errors are defined in Figure 2. First of all, by definition, using the function of Equation (12), for a number with p number of significant bits, the roundoff error is bounded as shown in Equation (13).
(13) |
Let be the precision of the operation at control point . For example, the precision is bits for the IEEE754 binary64 format. In fact, is used to compute the truncation error of the operations. Equations (5) to (11) of Figure 2 define the numerical errors of the arithmetic expressions of our language (presented in Section 3). For constants occurring in the code, the initial precision must be given by the user and we write a constant with p significant bits at control point . Then, following Equation (13), is defined in Equation (5) of Figure 2: the of the constant is and consequently the error is bounded by . In equations (6) to (9), we propagate the errors on the operands and we add the roundoff error due to the elementary operation itself. For an elementary function , we assume that bits are lost as shown in Equation (10) (more details are given in Section 3). The last equation is for the square root function. This function being computable exactly, no more error than for the elementary operation is introduced.
3.2 Integer Linear Problem Formulation
First, we define in Figure 3 the simple imperative language in which our input programs are written.
{+, -, , } {, , , , , }
Expr e : e ::= c#
Cmd c :
We denote by the set of identifiers and by the set of control points of the program as a means to assign to each element and of our language a unique control point . First, in , p indicates the number of significant bits of the constant in the source code. The parameter p is computed by our tool XXX, when solving the constraints. Next, the statement require_nsb(x,n)ℓ indicates the number of significant bits that a variable must have at a control point . The rest of the grammar is standard.
As we have mentioned, we are able to reduce the problem of determining the lowest precision on variables and intermediary values in programs to an Integer Linear Problem (ILP) by reasoning on their unit in the first place () and the number of significant bits. In addition, we assign to each control point an integer variable corresponding to the of the arithmetic expressions. is determined by solving the ILP generated by the rules of Figure 4.
Let us now focus on the rules of Figure 4 where is an environment which relates each identifier to its last assignment : Assuming that is the last assignment of , the environment maps to . Then, generates the set of constraints for an expression in the environment . We now formally define these constraints for each element of our language. No constraint is generated for a constant as mentioned in Rule (Const) of Figure 4. For Rule (Id) of a variable , we require that the at control point is less than its in the last assignment of given in . For a binary operator {+, -, , }, we first generate the set of constraints and for the operands at control points and . Considering Rule (ADD), the result of the addition of two numbers is stored in control point . Recall that a range determination is performed before the accuracy analysis, , and are known at constraint generation time.
Now, before going further in the explanation of the constraints generation for binary operations, we introduce the function which computes the carry bit that can occur throughout an addition (similar reasoning will be done for the other elementary operations). In the present ILP of Figure 4, we over-approximate the function by for all and , thus assuming the worst case, i.e. a carry bit is added at each operation. We will optimize in Section 3.3 but the problem will not remain an ILP any longer. To wrap up, for the addition (Rule (Add)), the of the exact result is the number of bits between and the of the error which is:
(14) |
Hence, the error on the addition in precision is
(15) |
Equation (14) is obtained from Equation (6):
Since , we may get rid of the last term of in Equation (15) and the two equations generated for Rule (ADD) are derived from Equation (16).
(16) |
For example, let us consider the piece of code hereafter.
Wrapping up our constraints, we have and . Since and we have and, consequently, . Rule (Sub) for the subtraction is obtained similarly to the addition case. For Rule (Mult) of multiplication (and in the same manner Rule(Div)), the reasoning mimics the one of the addition. Let and be two floating point numbers and the result of their product, . We denote by , and the errors on , and , respectively. The error of this multiplication is . These numbers are bounded as shown in Equation (17).
(17) |
By getting rid of the last term of the error which is strictly less than the former two ones, assuming that and, finally, by reasoning on the exponents, we obtain the equations of Rule (Mult).
Although the square root (Sqrt) is included, e.g, in the IEEE754 Standard, it is not the case for the other elementary functions such as the natural logarithm, the exponential functions and the hyperbolic and trigonometric functions gathered in Rule (Math). Also, each implementation of these functions has its own which we have to know to model the propagation of errors in our analyses. To cope with this limitation, we consider that each elementary function introduces a loss of precision of bits, where is a parameter of the analysis and consequently of our tool, XXX.
The rules of commands are rather classical, we use control points to distinguish many assignments of the same variable and also to implement joins in conditions and loops. Given a command and an environment , returns a pair made of a set of constraints and of a new environment . The function is defined by induction on the structure of commands in figures 4 and 5. For conditionals, we generate the constraints for the then and else branches plus additional constraints to join the results of both branches. For loops, we relate the number of significants bits at the end of the body to the of the same variables and the beginning of the loop.
3.3 Policy Iteration for Optimized Carry Bit Propagation
The policy iterations algorithm is used to solve nonlinear fixpoint equations when the function is written as the infimum of functions for which a fixpoint can be easily computed. The infimum formulation makes the function not being differentiable in the classical sense. The one proposed in [7] to solve smallest fixpoint equations in static analysis requires the fact that the function is order-preserving to ensure the decrease of the intermediate solutions provided by the algorithm. In this article, because of the nature of the semantics, we propose a policy iterations algorithm for a non order-preserving function.
More precisely, let be a map from a complete to itself such that . Classical policy iterations solve by generating a sequence such that and . The set is called the set of policies and a policy map (associated to ). The set of policy maps has to satisfy the selection property meaning that for all , there exists such that . This is exactly the same as for each , the minimization problem has an optimal solution. If is finite and is order-preserving, policy iterations converge in finite time to a fixpoint of . The number of iterations is bounded from above by the number of policies. Indeed, a policy cannot be selected twice in the running of the algorithm. This is implied by the fact that the smallest fixpoint of a policy map is computed. In this article, we adapt policy iterations to the problem of precision tuning. The function here is constructed from inequalities depicted in Figure 4 and Figure 5. We thus have naturally constraints of the form . We will give details about the construction of at Proposition 1. Consequently, we are interested in solving:
(18) |
Let . We will write the system of inequalities depicted in Figure 4 and the system of inequalities presented at Figure 5. Note that the final system of inequalities is meaning that we add new constraints to . If the system is used alone, is the constant function equal to 1. Otherwise, is defined by the formula at the end of Figure 5.
Proposition 1
The following results hold:
-
1.
Let the constant function equal to 1. The system can be rewritten as where maps to itself, and has coordinates which are the maximum of a finite family of affine order-preserving functions.
-
2.
Let the function such that equals the function defined at Figure 5. The system can be rewritten as where maps to itself, and all its coordinates are the min-max of a finite family of affine functions.
From Proposition 1, when is used, we can write as , where is the maximum of a finite family of affine functions and thus used a modified policy iterations algorithm. The set of policies here is a map . A choice is thus a vector of 0 or 1. A policy map is a function to itself such that the coordinates are . If the coordinate depends on then . Otherwise, the function is the maximum of affine functions and a choice is not required.
Proposition 2 (Algorithm correctness)
The sequence generated by Algorithm 1 is of finite length (i.e. ) and satisfies a strict decrease before convergence: if and . The number of terms is smaller than the number of policies.
Proof
Let be a term of the sequence and be the optimal solution of . Then by definition of . Moreover,
and . It follows that and is feasible for the minimisation problem for which is an optimal solution. We conclude that and the Algorithm terminates if the equality holds or continues as the criterion strictly decreases. Finally, from the strict decrease, a policy cannot be selected twice without terminating the algorithm. In conclusion, the number of iterations is smaller than the number of policies.
Figure 5 displays the new rules that we add to the global system of constraints in the case where we optimize the carry bit of the elementary operations. Before introducing the optimized function , the definition of the unit in the last place of a real number is defined in Equation(19).
(19) |
Indeed, during an operation between two numbers, the function is computed as follows: If the of one of the two operands is greater than the of the other one (or conversely) then the two numbers are not aligned and no carry bit can be propagated through the operation (otherwise ). This idea is presented in Figure 5 in which is formulated by and operators. Winning one bit may seen a ridiculous optimization at first sight. However, when many operations are done in a program which has to compute with some tens of , this is far from being negligible. Formally, let and be the operands of some operation whose result is . The errors are , and respectively on , and . Note that the behaviour of is also valid in the case of propagation of the carry bit on the errors. Consequently, using the definitions of and introduced in Section 3.1, we have and with and are the unit in the first and in the last place of the errors, respectively, for (same reasoning for and ). The optimized function of Figure 5 derives from Equation (20).
(20) |
As mentioned in Equation (20), to compute the of the errors on the operands, we need to estimate the number of bits of the error for each operand which explains the new rules and constraints given in Figure 5. Hence, these constraints are complementary to the rules of Figure 4, already explained in Section 3.2, in which the only difference is that we activate the new function instead of its over-approximation of Figure 4. Let us concentrate on the rules of Figure 5. The function generates the new set of constraints for an expression in the environment . For Rule , the number of significant bits of the error whereas we impose that the of a variable at control point is less than the last assignment of in as shown in Rule of Figure 5. Considering Rule , we start by generating the new set of constraints and on the operands at control points and . Then, we require that and where the result of the addition is stored at control point . Additionally, the number of significant bits of the error on the result at control point is computed as shown hereafter.
By breaking the and operators, we obtain the constraints on of Rule . For the subtraction, the constraints generated are similar to the addition case. Considering now Rule , as we have defined in Section 3.2, where is the result of the product of and . By reasoning on the of the error, we bound by
By selecting the smallest term , we obtain that
Finally, by simplifying the equation above we found the constraints of Rule in Figure 5 (same for Rule ). For Rule , we generate the constraints on the expression and we require that of the result stored at control point is greater than the of the expression a control point . For Rule , we assume that is unbounded. Concerning the commands, we define the set which has the same function as defined in Figure 4. The reasoning on the commands also remains similar except that this time we reason on the number of bits of the errors . The only difference is in Rule where the set of constraints is empty. Let us recall that the constraints of Figure 5 are added to the former constraints of Figure 4 and are sent to a linear solver (GLPK in practice).
4 Correctness
In this section, we present proofs of correctness concerning the soundness of the analysis (Section 4.1) and the integer nature of the solutions (Section 4.2).
4.1 Soundness of the Constraint System
Let denote the syntactic equivalence and let be an expression. We write the set of constants occurring in the expression . For example, . Also, we denote by a function mapping the labels of an expression to a . The notation means that is the minimal solution to the ILP . We write the empty environment ().
The small step operational semantics of our language is displayed in Figure 6. It is standard, the only originality being to indicate explicitly the of constants. For the result of an elementary operation, this is computed in function of the of the operands. Lemma 1 below asses the soundness of the constraints for one step of the semantics.
Lemma 1
Given an expression , if and then for all #p we have p = .
Proof
By case examination of the rules of Figure 4. Hereafter, we focus on the most interesting case of addition of two constants. Recall that for any control point . Assuming that then by following the reduction rule of Figure 6, we have with . On the other side, by following the set of constraints of Rule (Add) in Figure 4 we have . These constraints can be written as
and may themselves be rewritten as Equation (16), i.e.
Since, obviously, and since the solver finds the minimal solution to the ILP, it remains to show that
which corresponds to the assertion of Equation(15). Consequently, as required, for this case, in Figure 6.
Theorem 4.1
Given an expression . If and if , then we have .
Proof
By recurrence on the length of the reduction path.
4.2 ILP Nature of the Problem
In this section, we give insights about the complexity of the problem. The computation relies on integer linear programming. Integer linear programming is known to belong to the class of NP-Hard problems. A lower bound of the optimal value in a minimization problem can be furnished by the continuous linear programming relaxation. This relaxation is obtained by removing the integrity constraint. Recall that a (classical) linear program can be solved in polynomial-time. Then, we can solve our problem in polynomial-time if we can show that the continuous linear programming relaxation of our ILP has an unique optimal solution with integral coordinates. Proposition 3 presents a situation where a linear program has a unique optimal solution which is a vector of integers.
Proposition 3
Let be an order-preserving function such that . Suppose that the set is non-empty. Let a strictly monotone function such that . Then, the minimization problem:
has an unique optimal solution which is integral.
Proof
Let and . It suffices to prove that . Indeed, as is stricly monotone then for all s.t. and . The optimal solution is thus . If , the result holds. Now suppose that , then . Let . Then and we write . As is a complete lattice s.t. , from Tarski’s theorem, satisfies and . Moreover, and . Again, from Tarski’s theorem, is the smallest fixpoint of then it coincides with . We conclude that .
Theorem 4.2
Proof
For the second system, in practice, we get integral solutions to the continuous linear programming relaxation of our ILP of Equation (18). However, because of the lack of monotonicity of the functions for the rules (ADD) and (SUB), we cannot exploit Proposition 3 to prove the polynomial-time solvability.
5 Experimental Results
Program TH BL IEEE ILP-time BL IEEE PI-time H S D LD 61% 43% 0.9s 62% 45% 1.5s 8 88 25 0 50% 21% 0.9s 51% 21% 1.4s 2 45 74 0 arclength 37% 3% 0.8s 38% 4% 1.6s 2 6 113 0 24% -1% 1.0s 25% -1% 1.7s 2 0 116 3 12% -17% 0.3s 14% -8% 1.5s 2 0 109 10 64% 45% 0.1s 67% 56% 0.5s 6 42 1 0 53% 30% 0.2s 56% 31% 0.5s 1 27 21 0 simpson 40% 4% 0.1s 43% 7% 0.3s 1 5 43 0 27% 1% 0.1s 28% 1% 0.4s 1 0 48 0 16% 1% 0.1s 16% 1% 0.3s 0 1 48 0 73% 61% 0.2s 76% 62% 1.0s 53 69 0 0 62% 55% 0.2s 65% 55% 1.0s 2 102 0 0 accelerometer 49% 15% 0.2s 52% 18% 1.0s 2 33 69 0 36% 1% 0.2s 39% 1% 1.0s 2 0 102 0 25% 1% 0.2s 28% 1% 1.0s 2 0 102 0 78% 66% 0.08s 79% 68% 1.3s 46 38 0 0 67% 53% 0.08s 68% 56% 0.5s 12 70 2 0 rotation 53% 29% 0.07s 54% 29% 0.4s 0 46 38 0 40% 0% 0.1s 41% 0% 0.5s 0 0 84 0 29% 0% 0.09s 30% 0% 0.5s 0 0 48 0 68% 46% 1.8s 69% 46% 10.7s 260 581 0 0 57% 38% 1.8s 58% 45% 11.0s 258 580 3 0 lowPassFilter 44% -7% 2.0s 45% -7% 11.4s 258 2 581 0 31% -7% 1.7s 32% -7% 10.9s 258 0 583 0 20% -7% 1.8s 21% -7% 11.3s 258 0 583 0 71% 54% 0.15s 71% 54% 0.4s 0 13 0 0 60% 50% 0.2s 60% 50% 0.5s 0 12 1 0 Pendulum 47% 0% 012s 47% 0% 0.4s 0 0 13 0 33% 0% 0.16s 34% 0% 0.5 0 0 13 0 22% 0% 0.11s 22% 0% 0.4s 0 0 13 0
In this section, we aim at evaluating the performance of our tool XXX implementing the techniques of Section 3. Recall that XXX reduces the problem of precision tuning to an ILP by generating the constraints defined in Section 3.2 which can be solved by a linear solver. We use GLPK222https://www.gnu.org/software/glpk/ in practice with continuous variables since we know from Section 3.3 that it is possible. Indeed, GLPK’s constant col-kind is set to cv and not to iv as for integer problems. Alternatively, a second optimized set of constraints may be generated by applying the PI technique introduced in Section 3.3.
We have evaluated XXX on several numerical programs. Two of them were used as a benchmark for precision tuning in prior work [18] and are coming from the GNU scientific library (GSL): arclength which is a program first introduced in [2] and the simpson program which corresponds to an implementation of the widely used Simpson’s rule for integration in numerical analysis [13]. In addition, we have experimented the techniques introduced in this article on three programs used as benchmarks for XXX in its former version [3, 4, 5]. The program rotation performs a matrix-vector product to rotate a vector around the axis by an angle of [5]. The second program accelerometer is an application to measure an inclination angle with an accelerometer [3] and, finally, lowPassFilter program [4] which is taken from a pedometer application [14] that counts the number of footsteps of the user. The last two programs come from the IoT field. Also, we take the pendulum program already introduced in Section 2.
The experiments shown in Table 1 present the results of precision tuning returned by XXX for each error threshold , , and . The error threshold represents the . For instance, the result is required to be correct up to 4 digits for an error threshold of . Obviously, XXX counting the number of significant bits, these threshold are converted in binary. In Table 1, we represent by ”TH” the error threshold given by the user. ”BL” is the percentage of optimization at bit level. ”IEEE” denotes the percentage of optimized variables in IEEE754 formats (binary16, binary32, The obtained at bit level is approximated by the immediately the upper number of bits corresponding to a IEEE754 format). ”ILP-time” is the total time of XXX analysis in the case of ILP formulation. We have also ”PI-time” to represent the time passed by XXX to find the right policy and to resolve the precision tuning problem. ”H”, ”S”, ”D” and ”LD” denote respectively the number of variables obtained in, half, float, double and long-double precision when using the PI formulation that clearly displays better results.
Program | Tool
|
#Bits saved - Time in seconds | |||
---|---|---|---|---|---|
Threshold | Threshold | Threshold | Threshold | ||
arclength | XXX ILP (28) | 2464b. - 1.8s. | 2144b. - 1.5s. | 1792b. - 1.7s. | 1728b. - 1.8s. |
XXX SMT (22) | 1488b. - 4.7s. | 1472b. - 3.04s. | 864b. - 3.09s. | 384b. - 2.9s. | |
Precimonious (9) | 576b. - 146.4s. | 576b. - 156.0s. | 576b. - 145.8s. | 576b. - 215.0s. | |
simpson | XXX ILP (14) | 1344b. - 0.4s. | 1152b. - 0.5s. | 896b. - 0.4s. | 896b. - 0.4s. |
XXX SMT (11) | 896b. - 2.9s. | 896b. - 1.9s. | 704b. - 1.7s. | 704b. - 1.8s. | |
Precimonious (10) | 704b. - 208.1s. | 704b. - 213.7s. | 704b. - 207.5s. | 704b. - 200.3s. | |
rotation | XXX ILP (25) | 2624b. - 0.47s. | 2464b. - 0.47s. | 2048b. - 0.54s. | 1600b. - 0.48s. |
XXX SMT (22) | 1584b. - 1.85s. | 2208b. - 1.7s. | 1776b. - 1.6s. | 1600b. - 1.7s. | |
Precimonious (27) | 2400b. - 9.53s. | 2592b. - 12.2s. | 2464b. - 10.7s. | 2464b. - 7.4s. | |
accel. | XXX ILP (18) | 1776b. - 1.05s. | 1728b. - 1.05s. | 1248b. - 1.04s. | 1152b. - 1.03s. |
XXX SMT (15) | 1488b. - 2.6s. | 1440b. - 2.6s. | 1056 - 2.4s. | 960b. - 2.4s. | |
Precimonious (0) | - | - | - | - |
Let us focus on the first ”TH”, ”BL”, ”IEEE” and ”ILP-time” columns of Table 1. These columns have been measured for the case of ILP by being pessimistic on the propagation of carry bit as described in Section 3.2. We recall that all variables of our programs are in double precision before analysis. For the arclength program, the final percentage of bits after optimization reaches at bit level while it achieves in IEEE formats ( is the percentage of all variables initially in double precision, variables for the original arclength program that used bits). This is obtained in only second by applying the ILP formulation. As another option, when we refine the solution by applying the policy iteration method (from the sixth column), XXX displays better results attaining at bit level and for the IEEE formats. Although XXX needs more time of analysis to find and iterate between policies, the time of analysis remain negligible, not exceeding seconds. For a total of variables for the arclength original program, XXX succeeds in tuning variables to half precision (H), the majority with variables passes to simple precision (S) whereas variables remain in double precision (D) for an error threshold of . We remark that our second method displays better results also for the other user error thresholds. For the simpson, accelerometer, rotation and lowPassFilter, the improvement is also more important when using the PI technique than when using the ILP formulation. For instance, for an error threshold of for the simpson program, only one variable passes to half precision, variables turns to simple precision while remains in double precision with of percentage of total number of bits at bit level using the policy iteration method. Concerning the pendulum code, the two techniques return the same percentage at bit level and IEEE754 format for the majority of error thresholds (except where XXX reaches at bit level when using the PI method).
Now, we stress on the negative percentage that we obtain in Table 1, especially for the arclength program with and for the columns IEEE and for the lowPassFilter program for errors of , and . In fact, XXX being able to return new formats for any precision required by the user without additional cost nor by increasing the complexity even if it fails to have a significant improvement on the program output. To be specific, taking again the arclength program, for an error of , XXX fulfills this requirement by informing the user that this precision is achievable only if 10 variables passes to the long double precision (LD) which is more than the original program whose variables are all in double precision. By doing so, the percentage of IEEE formats for both ILP and PI formulations reaches and , respectively. Same reasoning is adopted for the lowPassFilter which spends more time, nearly seconds, with the policy iteration technique to find the optimized formats (total of 841 variables). Note that in these cases, other tools like Precimonious [18] fail to propose formats.
Table 2 shows a comparison between the new version of XXX combining both ILP and PI formulations, the former version of XXX that uses the Z3 SMT solver coupled to binary search to find optimal solution [5] and the prior state-of the-art Precimonious [18]. The results of the mixed precision tuning are shown for the arclength, simpson, rotation and accelerometer programs. Since XXX and Precimonious implement two different techniques, we have adjusted the criteria of comparison in several points. First, we mention that XXX optimizes much more variables than Precimonious. While it disadvantages XXX, we only consider in the experiments of Table 2 the variables optimized by Precimonious to estimate the quality of the optimization. Second, let us note that the error thresholds are expressed in base in XXX and in base in Precimonious. For the relevance of comparisons, all the thresholds are expressed in base in tables 1 and 2. In practice, XXX will use the base threshold immediately lower than the required base threshold.
In Table 2, we indicate in bold the tool that exhibits better results for each error threshold and each program. Starting with the arclength program, XXX ILP displays better results than the other tools by optimizing variables. For an error threshold of , bits are saved by XXX ILP in seconds while XXX Z3 saved only 1488 bits in more time ( seconds). Precimonious were the slowest tool on this example, more than minutes with bits for only variables optimized. For the simpson program, XXX ILP do also better than both other tools. However, for the rotation program, XXX ILP saves more bits than the other tools only for an error of while Precimonious do well for this program for the rest of error thresholds. Finally, Precimonious fails to tune the accelerometer program (0 variables) at the time that XXX ILP do faster (only 1 second) to save much more bits than XXX SMT for the four given error thresholds.
6 Conclusion and Perspectives
In this article, we have introduced a new technique for precision tuning, totally different from the existing ones. Instead of changing more or less randomly the data types of the numerical variables and running the programs to see what happens, we propose a semantical modelling of the propagation of the numerical errors throughout the code. This yields a system of constraints whose minimal solution gives the best tuning of the program, furthermore, in polynomial time. Two variants of this system are proposed. The first one corresponds to a pure ILP. The second one, which optimizes the propagation of carries in the elementary operations can be solved using policy iterations [7]. Proofs of correctness concerning the soundness of the analysis and the integer nature of the solutions have been presented in Section 4 and experimental results showing the efficiency of our method have been introduced in Section 5.
Compared to other approaches, the strength of our method is to find directly the minimal number of bits needed at each control point to get a certain accuracy on the results. Consequently, it is not dependant of a certain number of data types (e.g. the IEEE754 formats) and its complexity does not increase as the number of data types increases. The information provided may also be used to generate computations in the fixpoint arithmetic with an accuracy guaranty on the results. Concerning scalability, we generate a linear number of constraints and variables in the size of the analyzed program. The only limitation is the size of the problem accepted by the solver. Note that the number of variables could be reduced by assigning the same precision to a whole piece of code (for example an arithmetic expression, a line of code, a function, etc.) Code synthesis for the fixpoint arithmetic and assigning the same precision to pieces of code are perspectives we aim at explore at short term.
At longer term, other developments of the present work are planned. First we wish to adapt the techniques developed in this article to the special case of Deep Neural Networks for which it is important to save memory usage and computational resources. Second, we aim at using our precision tuning method to guide lossy compression techniques for floating-point datasets [9]. In this case, the bit-level accuracy inferred by our method would determine the compression rate of the lossy technique.
References
- [1] ANSI/IEEE: IEEE Standard for Binary Floating-point Arithmetic, std 754-2008 edn. (2008)
- [2] Bailey, D.: Resolving numerical anomalies in scientific computation (03 2008)
- [3] Ben Khalifa, D., Martel, M.: Precision tuning and internet of things. In: International Conference on Internet of Things, Embedded Systems and Communications, IINTEC 2019. pp. 80–85. IEEE (2019)
- [4] Ben Khalifa, D., Martel, M.: Precision tuning of an accelerometer-based pedometer algorithm for iot devices. In: International Conference on Internet of Things and Intelligence System, IOTAIS 2020. pp. 113–119. IEEE (2020)
- [5] Ben Khalifa, D., Martel, M., Adjé, A.: POP: A tuning assistant for mixed-precision floating-point computations. In: Formal Techniques for Safety-Critical Systems - 7th International Workshop, FTSCS 2019. Communications in Computer and Information Science, vol. 1165, pp. 77–94. Springer (2019)
- [6] Chiang, W., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamaric, Z.: Rigorous floating-point mixed-precision tuning. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL. pp. 300–315. ACM (2017)
- [7] Costan, A., Gaubert, S., Goubault, E., Martel, M., Putot, S.: A policy iteration algorithm for computing fixed points in static analysis of programs. In: Computer Aided Verification, 17th International Conference, CAV 2005, Edinburgh, Scotland, UK, July 6-10, 2005, Proceedings. Lecture Notes in Computer Science, vol. 3576, pp. 462–475. Springer (2005)
- [8] Darulova, E., Horn, E., Sharma, S.: Sound mixed-precision optimization with rewriting. In: Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS. pp. 208–219. IEEE Computer Society / ACM (2018)
- [9] Diffenderfer, J., Fox, A., Hittinger, J.A.F., Sanders, G., Lindstrom, P.G.: Error analysis of ZFP compression for floating-point data. SIAM J. Sci. Comput. 41(3), A1867–A1898 (2019)
- [10] Guo, H., Rubio-González, C.: Exploiting community structure for floating-point precision tuning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018. pp. 333–343. ACM (2018)
- [11] Kotipalli, P.V., Singh, R., Wood, P., Laguna, I., Bagchi, S.: AMPT-GA: automatic mixed precision floating point tuning for GPU applications. In: Proceedings of the ACM International Conference on Supercomputing, ICS. pp. 160–170. ACM (2019)
- [12] Lam, M.O., Hollingsworth, J.K., de Supinski, B.R., LeGendre, M.P.: Automatically adapting programs for mixed-precision floating-point computation. In: International Conference on Supercomputing, ICS’13. pp. 369–378. ACM (2013)
- [13] McKeeman, W.M.: Algorithm 145: Adaptive numerical integration by simpson’s rule. Commun. ACM 5(12), 604 (1962)
- [14] Morris, D., Saponas, T., Guillory, A., Kelner, I.: Recofit: Using a wearable sensor to find, recognize, and count repetitive exercises. Conference on Human Factors in Computing Systems - Proceedings (2014)
- [15] de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Tools and Algorithms for the Construction and Analysis of Systems. LNCS, vol. 4963, pp. 337–340. Springer (2008)
- [16] Papadimitriou, C.H.: On the complexity of integer programming. Journal of the ACM (JACM) 28(4), 765–768 (1981)
- [17] Parker, D.S.: Monte carlo arithmetic: exploiting randomness in floating-point arithmetic. Tech. Rep. CSD-970002, University of California (Los Angeles) (1997)
- [18] Rubio-González, C., Nguyen, C., Nguyen, H.D., Demmel, J., Kahan, W., Sen, K., Bailey, D.H., Iancu, C., Hough, D.: Precimonious: tuning assistant for floating-point precision. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC’13. pp. 27:1–27:12. ACM (2013)
- [19] Schrijver, A.: Theory of linear and integer programming. John Wiley & Sons (1998)
7 Appendix
We need a lemma on some algebraic operations stable on the set of functions written as the min-max of a finite family of affine functions. The functions are defined on .
Lemma 1
The following statements hold:
-
•
The sum of two min-max of a finite family of affine functions is a min-max of a finite family of affine functions.
-
•
The maximum of two min-max of a finite family of affine functions is a min-max of a finite family of affine functions.
Proof
Let and be two min-max of a finite family of affine functions and . We have and . Let . There exist such that . We have also, for all , . We conclude that for all . We use the same argument for the max.
Proposition 4
The following results hold:
-
1.
Let the constant function equal to 1. The system can be rewritten as where maps to itself, and has coordinates which are the maximum of a finite family of affine order-preserving functions.
-
2.
Let the function such that equals the function defined at Fig. 5. The system can be rewritten as where maps to itself, and all its coordinates are the min-max of a finite family of affine functions.
Proof
We only give details about the system (Figure 4). By induction on the rules. We write . This set is used in the proof to construct inductively.
For the rule (CONST), there is nothing to do. For the rule (ID), if the label then we define . Otherwise . As is order-preseving and the maximum of one affine function, is the maximum of a finite family of order-preserving affine functions since preserves order-preservation.
For the rules (ADD), (SUB), (MULT), (DIV), (MATH) and (ASSIGN), by induction, it suffices to focus on the new set of inequalities. If , we define as the max with old definition and i.e. where is the right-hand side part of the new inequality. If , we define . In the latter rules, are order-preserving affine functions. It follows that is the maximum of a finite family of order-preserving affine functions.
The result follows by induction for the rule (SEQ).
The rules (COND) and (WHILE) are treated as the rules (ADD), (SUB), (MULT), (DIV), (MATH) and (ASSIGN), by induction and the consideration of the new set of inequalities.
The last rule (REQ) constructs either as the constant function equal to at label or the maximum of the old definition of and if .
The proof for the system uses the same arguments and Lemma 1.