# Computer Algebraic Approach to Verification and Debugging of Galois Field Multipliers 

Tiankai Su ${ }^{1}$, Atif Yasin ${ }^{1}$, Cunxi $\mathrm{Yu}^{1,2}$, Maciej Ciesielski ${ }^{1}$<br>ECE Department, University of Massachusetts, Amherst, USA ${ }^{1}$ LSI, EPFL, CH ${ }^{2}$<br>tiankaisu@umass.edu, ayasin@umass.edu, cunxi.yu@epfl.ch, ciesiel@umass.edu


#### Abstract

The paper presents a novel method to verify and debug gate-level arithmetic circuits implemented in Galois Field arithmetic. The method is based on forward reduction of the specification polynomials of the circuit in $G F\left(2^{m}\right)$ using $G F(2)$ models of its logic gates. We define a forward variable order " $F O>$ " and the rules of forward reduction that enable verification, bug detection, and automatic bug correction in the circuit. By analyzing the remainder generated by forward reduction, the method can determine whether the circuit is buggy, and finds the location and the type of the bug. The experiments performed on Mastrovito and Montgomery multipliers show that our debugging method is independent of the location of the bug(s) and the debugging time is comparable to the time needed to verify the bug-free circuit. Keywords- Galois Field, Arithmetic circuits, Formal verification, Computer Algebra, Logic Debugging.


## I. Introduction

Galois Field (GF) arithmetic has numerous applications in digital communication, cryptography and security engineering, and formal verification of such circuits is of prime importance. A large body of work has been published on formal verification of arithmetic circuits, both integer arithmetic [1][2][3][4] and GF arithmetic circuits [5][6]. The most successful verification techniques for verifying arithmetic circuits are based on computer algebra [7]. In this approach, the verification problem is solved by proving that the implementation satisfies the specification, where the specification and the implementation are represented by polynomial rings, $F[x]$. A typical approach is based on polynomial reduction using Gröbner Basis, which transforms the verification problem to membership testing of the specification polynomial in the ideals [1][5]. An alternative approach to verify gatelevel arithmetic circuits uses algebraic rewriting of the polynomial at the primary outputs (POs) into a polynomial at the primary inputs (PIs). The method, termed backward rewriting, is performed in a reverse topological order [4][6] and is characterized by a good runtime performance due to a large number of monomial cancellations during rewriting [6][8].

However, very little work has been done on debugging of arithmetic circuits. One such work reasons about the bugs in integer circuits by analyzing the remainder polynomial generated by backward rewriting [9][10]. However, it is limited by an uncontrollably large size of the remainder generated in case of a buggy circuit. Another approach combines backward rewriting with forward rewriting, carried from PIs to POs, and compares the two results to identify and correct the bugs [11]. This method also suffers from the polynomial size explosion in a buggy circuit and the complexity of forward rewriting.

In this paper we propose a novel approach to debugging of GF arithmetic circuits based on forward rewriting. This technique does not suffer from the polynomial size explosion encountered by other methods and allows one to identify and automatically correct the bugs in GF circuits. We limit our attention to bugs that are caused by using a wrong logic gate (e..g, using an AND gate instead of an XOR), referred to as gate replacement, which is the most common source of errors. Specifically, we make the following novel contributions:

- Use forward variable order " $F O>$ " which enables performing verification and debugging at the same time.
- The method does not suffer from memory explosion problem.
- It can handle multiple dependent bugs.
- The performance of bug detection is not related to the location of the bug; the time it takes to verify a buggy circuit is about the same as verifying a bug-free circuit.


## II. BACKGROUND

## A. Galois Fields

Galois field (GF) is a number system with a finite number of elements and basic arithmetic operations of addition, multiplication, and division. Of particular interest in hardware design for cryptography systems are binary extension fields, denoted $\operatorname{GF}\left(2^{m}\right)\left(\right.$ or $\left.\mathbb{F}_{2^{m}}\right)$, which are finite fields with $2^{m}$ elements represented by polynomial rings $F[x]$. The addition of finite field elements is the addition of polynomials, with coefficients computed in GF(2). Multiplication of field elements in $\operatorname{GF}\left(2^{m}\right)$ is performed modulo irreducible polynomial $P(x)$ of degree $m$ with coefficients in GF(2). Extension fields are used in many cryptography applications, such as Advanced Encryption Standard (AES) and Elliptic-curve cryptography (ECC).

## B. Computer Algebra Approach

Algebraic approach used in this work relies on polynomial representation of the circuit specification and its logic gate components. Input Signature is the polynomial in terms of the PI variables that represents the expected function of the circuit, denoted $S i g_{i n}$. For example, input signature for an $n$-bit binary adder is Sig $_{\text {in }}=\sum_{i=0}^{n-1} 2^{i}\left(a_{i}+b_{i}\right)$. Output Signature is a binary encoded polynomial expressed in the PO variables, denoted by $\mathrm{Sig}_{\text {out }}$. Output signature of an unsigned arithmetic circuit with $m$ output bits $\left\{z_{i}\right\}$ is $S i g_{\text {out }}=\sum_{i=0}^{m-1} 2^{i} z_{i}$. Backward rewriting is the process that transforms the output signature Sig out $^{\text {into a unique input signature }}$ $S i g_{i n}$ using algebraic (polynomial) models of the logic gates of the circuit [4][6]. This work models Boolean operators using algebraic models of $G F(2)$. For example, the pseudo-Boolean model of XOR in integer arithmetic, $\operatorname{XOR}(a, b)=a+b-2 a b$, is reduced in $\operatorname{GF}(2)$ to $(a+b+2 a b) \bmod 2=(a+b) \bmod 2$. The following algebraic models are used to describe basic logic gates in $G F\left(2^{m}\right)$ [1][6]:

$$
\begin{align*}
& \neg a=1+a \bmod 2 \\
& a \wedge b=a \cdot b \bmod 2  \tag{1}\\
& a \vee b=a+b+a \cdot b \bmod 2 \\
& a \oplus b=a+b \bmod 2
\end{align*}
$$

To address the debugging problem of gate-level GF circuits we define two orders by which monomials are listed in each polynomial: Definition 1: $B O>$ order: A backward, reverse-topological order, such that every output signal of a gate is always greater than its input signal. The set of polynomials representing a logic cone ordered according to $B O>$ is called a $B O$ base.
Definition 2: FO > order: A forward, topological order, such that every output signal of a gate is always less than its input signal. The set of polynomials ordered according to $F O>$ is called an $\underline{F O}$ base.

For example, the polynomials for an OR gate $z$ in these orders are:
$B O>$ order: $z+a+b+a \cdot b$,
$F O>$ order: $a+b+a \cdot b+z$.
The $B O>$ order has been used successfully in verification works of [5][8][6], but, as demonstrated in this paper, it is ineffective in debugging of GF circuits. The approach described in this paper is based on an $F O>$ order.

## C. GF Multiplier Principles

Galois Field multiplication is performed modulo an irreducible polynomial $P(x)$, a polynomial that cannot be factored into nontrivial polynomials over the given field [12][13]. The inputs and outputs of $\operatorname{GF}\left(2^{k}\right)$ multiplication are $k$-bit binary numbers. An example of a 4-bit $\mathrm{GF}\left(2^{4}\right)$ multiplication, with irreducible polynomial $P(x)=x^{4}+x^{3}+1$, is shown in Figure 1.

|  |  |  |  | $\begin{aligned} & a_{3} \\ & b_{3} \end{aligned}$ | $\begin{aligned} & a_{2} \\ & b_{2} \end{aligned}$ | $\begin{aligned} & a_{1} \\ & b_{1} \end{aligned}$ | $\begin{aligned} & a_{0} \\ & b_{0} \end{aligned}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | $a_{3} b_{3}$ | $\begin{aligned} & a_{3} b_{2} \\ & a_{2} b_{3} \end{aligned}$ | $\begin{aligned} & a_{3} b_{1} \\ & a_{2} b_{2} \\ & a_{1} b_{3} \end{aligned}$ | $\begin{aligned} & a_{3} b_{0} \\ & a_{2} b_{1} \\ & a_{1} b_{2} \\ & a_{0} b_{3} \end{aligned}$ | $\begin{aligned} & a_{2} b_{0} \\ & a_{1} b_{1} \\ & a_{0} b_{2} \end{aligned}$ | $\begin{aligned} & a_{1} b_{0} \\ & a_{0} b_{1} \end{aligned}$ | $a_{0} b_{0}$ |
| $s_{6}$ |  | $s_{5}$ | $s_{4}$ | $s_{3}$ | $s_{2}$ | $s_{1}$ | $s_{0}$ |
| $s_{q}=\bigoplus a_{i} b_{j}, \forall i+j=q, 0 \leq q \leq 6$ |  |  |  |  |  |  |  |
| $s_{3}$ | $s_{2}$ | $s_{1}$ | $s_{0}$ |  | $z_{0}=$ | $\oplus s_{4}$ | $s_{5} \oplus s_{6}$ |
| $s_{4}$ | 0 | 0 | $s_{4}$ |  |  | $\oplus s_{5}$ |  |
| $s_{5}$ | 0 | $s_{5}$ | $s_{5}$ |  |  | $\oplus s_{6}$ |  |
| $s_{6}$ | $s_{6}$ | $s_{6}$ | $s_{6}$ |  |  |  | $s_{5} \oplus s_{6}$ |
| $z_{3}$ | $z_{2}$ | $z_{1}$ | $z_{0}$ |  |  |  |  |

Fig. 1: Multiplication in $\mathrm{GF}\left(2^{4}\right): Z \bmod P(x)=A \cdot B \bmod P(x)$, where $P(x)=x^{4}+x^{3}+1$.

The GF multiplication is performed in a straightforward way by: 1) generating and adding the partial products; and 2 ) reducing the result over $\operatorname{GF}\left(2^{m}\right)$ with $P(x)$. The partial products are generated in the same way as in the integer multiplication, using AND operations. However the sum of the partial products (denoted $s_{q}$ in Figure 1) is obtained using a series of XORs, since additions in finite field are implemented as XOR operations. This multiplication structure is called Mastrovito multiplier [14]. Note that in GF arithmetic there is no carry propagation between the columns of the result bits. Hence, each bit can be computed separately as a linear (XOR) sum of the product terms in the respetive column. An alternative method for performing fast modular multiplication is the Montgomery multiplication [15]. It works by transforming two integer inputs, $A$, $B$, into Montgomery forms, $A R \bmod N$ and $B R \bmod N$, for some constant $R$, and computing the product $A B R \bmod N$. The multiplication result $A \cdot B$ is then obtained by transforming the result from the Montgomery form, as $A \cdot B=A \cdot B \cdot R^{-1} \bmod P(x)$. Since the Montgomery form generator is a GF multiplication with a constant input $R$, each of the four components can be considered as a simplified Mastrovito multiplication [16].

## III. Bug Identification

This section describes an algorithm that utilizes the property of GF multipliers to identify $\operatorname{bug}(\mathrm{s})$ in the circuit. It consists of two major parts: 1) Remainder Generation and 2) Bug Analysis.

Algorithm I describes forward rewriting; it imposes the $F O>$ order on the polynomials and reduces the specification by the $F O$ base. Each polynomial representing a logic gate in the $F O$ base has the form: $\{h e a d\}, z_{i}$. The $\{h e a d\}$ is the set of head monomials representing the gate inputs; the tail monomial $z_{i}$ is the output of the gate. The degree of the monomial is the sum of the degrees of all its variables. For example, polynomial of an XOR gate is $a+b+z_{i}$, where the head set is $\{a, b\}$, each of degree one, and the tail is $z_{i}$.

```
Algorithm 1 Remainder Generation via Forward Rewriting
Input: Gate-level netlist of the GF circuit
Input: Expected InputSignature \(_{i}\) of each output bit
Output: Residual polynomials \(\left\{\right.\) Remainder \(\left._{i}\right\}\)
Output: Non-zero elements of \(F O_{i}\) base
    \(\mathcal{P O}=\left\{z_{0}, z_{1}, \ldots, z_{n-1}\right\}\) : primary output bits of the GF circuit
    for \(i \leftarrow 0\) to \(n-1\) do
        Extract cone \(P O_{i}\) from the gate-level netlist and generate the \(F O_{i}\) base
        Spec \(_{i} \leftarrow\) InputSignatrue \(_{i}-z_{i}\)
        while Spec \(_{i} \neq 0\) do
            Success \(_{\text {in }} \leftarrow 0\)
            for each polynomial \(P_{j}\) in \(F O_{i}\) base do
                for each monomial \(M_{k}\) in \(P_{i}\) do
                if \(M_{k}\) divides \(\operatorname{Spec}_{i}\) and \(\operatorname{Deg}\left(M_{k}\right)=\operatorname{Deg}\left(\right.\) Spec \(\left._{i}\right)\) and \(M_{k}\)
    is not the last monomial in \(P_{j}\) then
                                    Reduce Spec \(_{i}\) by \(P_{j}\); Set \(P_{j}\) in \(F O_{i}\) to 0 ; Success \({ }_{i n} \leftarrow 1\)
                                    end if
                    end for
            end for
            if Success \(_{i n}==0\) then
                            Move the leading term of Spec \(_{i}\) into Remainder \({ }_{i}\)
            end if
        end while
    end for
    return \(\left\{\right.\) Remainder \(_{i}\), non-empty \(F O_{i}\) base \(\}\)
```

The specification of the circuit is defined as $S i g_{i n}-S i g_{o u t}$, the difference between the correct (expected) world-level input signature $S_{i g_{i n}}$ and the output signature $S i g_{\text {out }}$, which is the binary encoded outputs. Since in a GF circuit the output bits are independent from each other, we can define $S p e c_{i}$, the bit-level logic of output $z_{i}$, as $\operatorname{Sig}_{i n}(i)-z_{i}$. The input signature $\operatorname{Sig}_{i n}(i)$ is known, computed using the irreducible polynomial $P(x)$. The verification goal is then to reduce each specification $S p e c_{i}$ by its $F \overline{O_{i}}$ base. If the result is 0 , we conclude that the logic cone associated with bit $i$ is bug-free. Otherwise, a non-zero Remainder ${ }_{i}$ will indicate the presence of a bug (or bugs) in that cone.
The first step of Algorithm I is to extract all the cones from the circuit and derive the corresponding Spec $_{i}$ (lines 3-6). Recall that the $F O_{i}$ base is the set of polynomials with an $F O>$ order describing the input-output relationship for cone $i$. While creating logic cones for each output, the common logic is duplicated, effectively making each cone fanout-free ${ }^{1}$. This means that every intermediate signal will appear only once in the head monomials of $F O_{i}$ base.
The next step is to reduce each $S p p e c_{i}$ by its corresponding $F O_{i}$ base. In lines 8-10, the Algorithm scans the polynomials in the $F O_{i}$ base to check if the leading term $l t_{i}$ of $S p e c_{i}$ is divisible by any of the head monomials with the same degree as $l t_{i}$. This reduction is different than in the polynomial reduction based on the $B O>$ order, where only a leading monomial is used in the division. Allowing the use of any of the head monomials is essential in identifying the bugs.

As an example, consider the reduction of Spec $_{i}=a+b+R$, where $R$ represents the remaining set of monomials, by polynomial $f=a+b+z$. The result of such a reduction should be Spec $_{i}=z+R$. However, if the monomial $a$ is missing in $S p e c_{i}$ due to a bug (when the gate with output $a$ is false), i.e., when $\operatorname{Spec}_{i}=b+R$, the head monomial $b$ in $f$ still can be used to divide $S p e c_{i}$. The result of such a reduction will be Spec $_{i}=a+z+R$. The monomial $a$ in the reduced $S p e c_{i}$ will never be eliminated by other polynomials in $F O_{i}$ base, since there would be no other gate with signal $a$ as input. When examining the content of the final remainder, it will be clear that the gate with output $a$ is the source of the bug.

In summary, if there is a polynomial $P_{i}$ in $F O_{i}$ base that contains a head monomial that divides $S p e c_{i}$, it will be used to reduce $S p e c_{i}$. The polynomial $P_{i}$ will then be removed from the base (each base

[^0]polynomial can only be used once during the reduction). However, if the leading term of $S p e c_{i}$ is not divisible by any of the head monomials in the $F O_{i}$ base, it will be moved into Remainder ${ }_{i}$. This process will be repeated until $S p e c_{i}$ becomes empty (lines 13- $^{\text {13 }}$ 18); or until it cannot be reduced anymore, in which case the content of the Remainder will be used to identify the bug.


Fig. 2: A two-bit Mastrovito GF multiplier.
Example 1: Consider a bug-free two-bit Mastrovito multiplier in Figure 2. The circuit can be separated into two cones: $z_{0}$ and $z_{1}$. The $F O_{1}$ base of cone $z_{1}$ includes polynomials: $p_{1}=a_{0} b_{1}+c_{1}$; $p_{2}=$ $a_{1} b_{0}+c_{2} ; p_{3}=a_{1} b_{1}+c_{3} ; p_{4}=c_{1}+c_{2}+r_{0}$ and $p_{5}=c_{3}+r_{0}+z_{1}$, each written in $F O>$ order. According to the definition, Spec $_{1}$ is equal to the input signature of $z_{1}$ minus $z_{1}$, the output signature. The expected input signature of the circuit is $F=\left(a_{0}+X a_{1}\right) \cdot\left(b_{0}+\right.$ $\left.X b_{1}\right)=a_{0} b_{0}+X\left(a_{0} b_{1}+a_{1} b_{0}\right)+X^{2} a_{1} b_{1}$. After $G F(2)$ reduction with the irreducible polynomial $P(X)=X^{2}+X+1$, we obtain $F=X\left(a_{0} b_{1}+a_{1} b_{0}+a_{1} b_{1}\right)+\left(a_{0} b_{0}+a_{1} b_{1}\right)$. Hence, for output $z_{1}$, associated with $X$, we have Spec $_{1}=\left(a_{0} b_{1}+a_{1} b_{0}+a_{1} b_{1}\right)+z_{1}(\bmod$ 2). Similar expression can be derived for Spec $_{0}$ of output $z_{0}$.

The next step is to reduce the specification $S p e c_{1}$ by the polynomial base of $F O_{1}$. This process is shown in Figure 3. We can see that $S p e c_{1}$ is eventually reduced to 0 . That is, Remainder ${ }_{1}=0$ and all polynomials in $F O_{1}$ have been used, which indicates that cone $z_{1}$ is bug-free. We proceed similarly with bit $z_{0}$ to determine that its logic cone is also bug-free.

Example 2: Let us now consider the case when there is a bug in the two-bit multiplier in Figure 2. Let the bug be caused by replacing the XOR gate $r_{0}$ with an AND gate in cone $z_{1}$. That is, polynomial $p_{4}=c_{1}+c_{2}+r_{0}$ is replaced by $p_{4}=c_{1} c_{2}+r_{0}$ in the $F O_{1}$ base, while $S p e c_{1}$ remains the same.

The process of reducing $S p e c_{1}$ by the new $F O_{1}$ base is shown in Figure 3. Note that in the 4th iteration of the reduction, the polynomial $c_{1}+c_{2}$ in $S p e c_{1}$ is not divisible by any of the head monomials in $F O_{1}$, so it will be moved into Remainder ${ }_{1}$. The monomial $r_{0}$ in the 5th iteration also be moved to Remainder ${ }_{1}$.

| Polynomials in $F O_{1}$ base of a bug-free cone | $S_{\text {pec }}^{1}=a_{0} b_{1}+a_{1} b_{0}+a_{1} b_{1}+z_{1}$ |
| :---: | :---: |
| $p_{1}=a_{0} b_{1}+c_{1}$ | Spec $_{1} / p_{1}=c_{1}+a_{1} b_{0}+a_{1} b_{1}+z_{1}$ |
| $p_{2}=a_{1} b_{0}+c_{2}$ | Spec $_{1} / p_{2}=c_{1}+c_{2}+a_{1} b_{1}+z_{1}$ |
| $p_{3}=a_{1} b_{1}+c_{3}$ | Spec $_{1} / p_{3}=c_{1}+c_{2}+c_{3}+z_{1}$ |
| $p_{4}=c_{1}+c_{2}+r_{0}$ | Spec $_{1} / p_{4}=2 c_{2}+c_{3}+r_{0}+z_{1} \bmod 2$ |
| $p_{5}=c_{3}+r_{0}+z_{1}$ | Spec $_{1} / p_{5}=2 r_{0}+2 z_{1}=0 \bmod 2$ |
| Polynomials in $F O_{1}$ base of a buggy cone | Spec $_{1}=a_{0} b_{1}+a_{1} b_{0}+a_{1} b_{1}+z_{1}$ |
| $p_{1}=a_{0} b_{1}+c_{1}$ | Spec $_{1} / p_{1}=c_{1}+a_{1} b_{0}+a_{1} b_{1}+z_{1}$ |
| $p_{2}=a_{1} b_{0}+c_{2}$ | Spec $_{1} / p_{2}=c_{1}+c_{2}+a_{1} b_{1}+z_{1}$ |
| $p_{3}=a_{1} b_{1}+c_{3}$ | Spec $_{1} / p_{3}=c_{1}+c_{2}+c_{3}+z_{1}$ |
| $p_{4}=c_{1} c_{2}+r_{0}$ | Spec $_{1} / F O_{1}=c_{1}+c_{2}+c_{3}+z_{1}$ |
| $p_{5}=c_{3}+r_{0}+z_{1}$ | Spec $_{1} / p_{5}=c_{1}+c_{2}+r_{0}+2 z_{1} \bmod 2$ |

Fig. 3: Generating Remainder with Forward Rewriting of bug-free and buggy logic cone of output bit $z_{1}$. Remainder $=0$ for bug-free cone, and Remainder $=c_{1}+c_{2}+r_{0}$ for a buggy cone.

As a result, we obtain a non-zero Remainder $_{1}=c_{1}+c_{2}+r_{0}$, and a non-empty $F O_{1}$ base: $p_{4}=c_{1} c_{2}+r_{0}$. This is a clear manifestation of a bug; namely, the XOR gate $\left(c_{1}+c_{2}+r_{0}\right)$ has been replaced by an AND gate ( $c_{1} c_{2}+r_{0}$ ), while during the reduction, polynomial
$c_{1}+c_{2}$ should have been canceled by $r_{0}$. In a similar fashion we can identify other types of errors caused by gate replacement.

TABLE I: Bug Analysis

| Bug Type | Correct gate | False gate | Remainder | Non-empty base |
| :---: | :---: | :---: | :---: | :---: |
| 1 | XOR | AND | $a+b+z$ | $a \cdot b+z$ |
| 2 | XOR | OR | $a \cdot b$ | - |
| 3 | AND | XOR | $a \cdot b+z$ | $a+b+z$ |
| 4 | AND | OR | $a+b$ | - |
| 5 | OR | XOR | $a \cdot b$ | - |
| 6 | OR | AND | $a+b$ | - |

Table I shows some common cases of erroneous gate-replacement in GF circuit for 2-input logic gates: AND, OR, and XOR. Similar relations can be derived for other gates as needed. In the table, signals $a$ and $b$ represent the inputs, and $z$ is the output of the false gate. By analyzing the Remainder $_{i}$ and the non-empty $F O_{i}$ base generated by the algorithm, we can readily determine the type of the error and locate the bug. This is possible because the remainder contains the names of the input and output signals of the false gate.
The bug of type $2,4,5,6$ are all associated with the OR gate. Since the polynomial of the OR gate is $f=a+b+a b+z$, the Spec polynomial can be reduced by $f$, regardless whether the inputs are in the sum form $(a+b)$ or the product form $(a \cdot b)$. This means that the forward reduction will always "go through" the bug, and leave the residual polynomial with their inputs ( $a+b$ or $a \cdot b$ ) in the remainder. As a result, they do not have non-empty base and the Remainder contains only the input variables of the false gate, making it easy to identify the bug. On the other hand, for bug types 1 and 3, both the input signals $(a, b)$ and the output signal $(z)$ of the false gate appear in the Remainder. Furthermore, a non-empty base indicates that the polynomial propagating through the circuit during forward rewriting cannot "go through" the bug. As we can see in the next section, multiple of these bugs will affect each other.
At this point the reader should fully appreciate the difference between forward rewriting (using $F O>$ order) and backward rewriting (using $B O>$ order). For a bug-free circuit both approaches will produce a zero remainder, indicating that the circuit implements the correct function. However, for the buggy circuit the remainder will be different. By construction in $B O>$ order, the remainder will contain only the primary inputs, but not the input and output signals of the false gate. Hence it does not provide sufficient information about the type of the bug and its location. For example, assuming the same bug as in Example 2, the backward rewriting would produce $\operatorname{Rem}_{1}=a_{0} b_{0} a_{1} b_{1}+a_{1} b_{1}$, with no indication as to the source of the bug. Furthermore, a single bug can make the remainder very huge, making the analysis of the source of the bug difficult and the debugging process very hard. In contrast, in forward reduction, the remainder contains the signal name of the faulty gate and the location of the bug does not affect the size of the remainder.

## IV. Multiple Bugs Analysis

The debugging method using forward reduction described in the previous section can be extended to multiple bugs. In general, arithmetic bugs can be divided into independent and dependent bugs. Independent bugs are those that will not affect each other; typically they appear in different cones. Since each cone is verified separately, each independent bug can be treated as a single bug in its own cone.
Dependent bugs can be classified into four cases, shown in Figure 4. The blank circle in the figure represents the correct gate. The shaded circle represents a false gate, i.e., the gate that was erroneously replaced by another gate. It can be of any type discussed in Table I, and even extended to other gates. Assume that in a correct implementation the gate $e_{1}$ is an AND gate and the remaining gates are XOR. With this, the specification of the cone with output $z$ is Spec $=d_{1} d_{2}+e_{2}+e_{3}+e_{4}+z$.

Case 1 : non-connected bugs Case 2 : series bugs

Case 3 : parallel bugs

Fig. 4: Different cases for dependent bugs.
The remainder in the case of multiple bugs depends not only on the cases shown in Figure 4, but also on the type of the bugs, listed in Table I (c.f. Section III). Recall that intermediate input variables appear exactly once in each fanout-free cone. For this reason, the remainder for the circuit with bugs of type $2,4,5,6$ of Table I (i.e., those that have only false inputs left in the remainder) will be composed of disjoint sets of complete input pairs. As a result, the bugs of these types will not affect each other, regardless how they are connected (c.f. Case 1, 2, 3, 4 in Figure 4). One just needs to identify the complete input pairs in the remainder to detect the bugs. On the other hand, as mentioned in Section III, for bug types 1 and 3 of Table I, both the input signals and the output signal of the false gate will appear in the Remainder. For this reason, when multiple bugs appear together, they will affect each other.

Example 3: Consider Case 1 in Figure 4, where the bugs at $e_{1}$ and $f_{2}$ are not connected directly. Assume that the AND gate $e_{1}$ is replaced by an XOR, and the XOR gate $f_{2}$ is replaced by an AND. The remainder computed by Algorithm I, Rem $=d_{1} d_{2}+e_{1}+e_{3}+$ $e_{4}+f_{2}$ can be partitioned into two disjoint groups, based on the input and output variables, such that each group forms a complete input-output pair: $\left(d_{1} d_{2}+e_{1}\right)$ and $\left(e_{3}+e_{4}+f_{2}\right)$. In this case, it is always possible to achieve such a partition, since the two bugs are not connected directly. Specifically, the analysis of the non-zero base tells us that $d_{1}, d_{2}, e_{3}, e_{4}$ are the input signals (the head monomials), and $e_{1}, f_{2}$ are output signal (the tail monomials). Therefore, both bugs can be detected, one for $e_{1}$ gate and the other for $f_{2}$ gate.

Analysis of the other cases, where the false signals interact with each other, is more complex, as illustrated by the following example.

Example 4: Consider Case 2 in Figure 4, when the XOR gates $f_{1}$ and $z$ are replaced by the AND gates. Here we cannot find a complete input-output pair in Remainder $=e_{1}+e_{2}+f_{2}+z$, because signal $f_{1}$ is not only the false output of gate $f_{1}$ but also the false input of gate $z$. In this case we need to search for incomplete pairs in order to properly identify the bugs. For Cases 3 and 4, with the same Remainder $=e_{1}+e_{2}+e_{3}+e_{4}+z$, the input-output pairs cannot be found. The result depends on how $z$ can be expressed as a combination of $f_{1}, f_{2}$ that partitions the remainder into groups. The detailed discussion of these cases is beyond the scope of this paper.

## V. Results and Conclusions

The debugging technique described in this paper was implemented in Python and interfaced with the computer algebra tool, Singular [17], to affect the polynomial reduction. The experiments were performed on an Intel® Core ${ }^{\text {TM }}$ CPU i5-3470 @ $3.20 \mathrm{GHz} \times 4$ with 15.6 GB memory, using Mastrovito multipliers up to 256 bits as
benchmarks [14]. The more complex and challenging Montgomery multipliers [16] have only been partially tested and not reported here.
TABLE II: Results of Mastrovito multipliers with single bug per cone.

| Largest cone <br> Operand size | \# polys | Runtime for <br> bug-free <br> cone (sec) | Avg. runtime <br> for buggy <br> cone (sec) | Max runtime <br> for buggy <br> cone (sec) |
| :---: | :---: | :---: | :---: | :---: |
| $z_{5}, 16$-bit | 167 | 0.36 | 0.36 | 0.37 |
| $z_{21}, 64$-bit | 539 | 6.2 | 6.3 | 6.3 |
| $z_{63}, 128$-bit | 1167 | 19.3 | 19.4 | 19.5 |
| $z_{10}, 256$-bit | 2033 | 46.3 | 46.5 | 46.6 |

Table II shows the verification results for a single bug inserted randomly in the circuit and illustrates the fact that the location of the bug does not affect the verification performance. To ensure that the location of the bug is the only variable factor in this experiment, for each circuit we extracted the largest output logic cone and inserted a single bug in it. The experiment was repeated for each cone 20 times, each time randomly changing the location of the bug, so they can be anywhere in the circuit and of any type shown in Table I. The average time of the experiments was computed and the worst case runtime it took to locate the bug recorded. As we can see in the Table, the longest and average times are similar, which means that the time to locate and correct the bug does not depend on its location in the circuit. Furthermore, the time to verify the bug-free circuit is almost the same as the debugging of a single bug. In other approaches, the difference between these two times can be significant [1] [18].
TABLE III: Results of Mastrovito multipliers with multiple bugs.

| Operand <br> size | Runtime for <br> bug-free <br> circuit (sec) | Result of [19] | Number <br> of bugs | Avg. runtime for <br> multiple bugs <br> circuit (sec) |
| :---: | :---: | :---: | :---: | :---: |
| 8 | 0.33 | 0.09 | 16 | 0.37 |
| 16 | 0.84 | 0.42 | 24 | 0.92 |
| 32 | 3.89 | 0.83 | 32 | 4.23 |
| 64 | 30.39 | 28.90 | 40 | 31.30 |
| 128 | 283.72 | 924.3 | 48 | 286.94 |
| 163 | 667.38 | $3,546.0$ | 56 | 676.07 |
| 256 | $2,111.43$ | $6,728.0$ | 64 | $2,135.92$ |

Table III shows the debugging results for Mastrovito multipliers with multiple bugs. It gives the time it takes to verify the bug-free circuit; makes comparison with [19]; shows the number of bugs and the time to debug multiple bugs. The data of Table III, in conjunction with that in Table II, shows that the runtime for verifying the entire circuit is much less than the runtime of verifying a single cone multiplied by the number of cones. This is because the verification of each cone is performed in parallel since the cones are independent from each other.

Column 3 of Table III shows the performance of [19] that used backward reduction with the $B O>$ order and Groebner basis. As we can see, our results are superior for circuit sizes above 64 bits. This suggests that $F O>$ order can be a better choice for GF verification. The major advantage of the $F O>$ order, however, is the performance of debugging, as shown in columns 2 and 5 of Table III. We randomly inserted bugs (dependent or independent) in circuit. The runtime difference between the bug-free circuit and a buggy circuit is negligible. Even in the largest case of 256 -bit Mastrovito multiplier, with 64 bugs inserted, the runtime difference is insignificant.
In summary, our verification scheme based on forward reduction algorithm offers an effective method for identifying and removing bugs in GF circuits. Unlike in other methods, its performance does not depend on the location of the bug, and the time to locate the bug is comparable to verifying a bug-free circuit. Future work will extend the method to handle other GF circuits and more advanced multipliers, such as Montgomery. We will also consider other types of bugs, such as miswiring, missing gates, and others.

Acknowledgment: This work has been funded by the National Science Foundation, grant CCF-1617708.

## References

[1] J. Lv, P. Kalla, and F. Enescu, "Efficient Groebner Basis Reductions for Formal Verification of Galois Field Arithmatic Circuits," IEEE Trans. on CAD, vol. 32, no. 9, pp. 1409-1420, September 2013.
[2] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch, and G. Greuel, "STABLE: A new QF-BV SMT solver for hard verification problems combining Boolean reasoning with computer algebra," in DATE, 2011, pp. 155-160.
[3] A. Sayed-Ahmed, D. Große, U. Kühne, M. Soeken, and R. Drechsler, "Formal verification of integer multipliers by combining Groebner basis with logic reduction," in DATE'16, 2016, pp. 1-6.
[4] M. Ciesielski, C. Yu, W. Brown, D. Liu, and A. Rossi, "Verification of Gate-level Arithmetic Circuits by Function Extraction," in 52nd DAC. ACM, 2015, pp. 52-57.
[5] T. Pruss, P. Kalla, and F. Enescu, "Efficient Symbolic Computation for Word-level Abstraction from Combinational Circuits for Verification Over Finite Fields," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. PP, no. 99, p. 1, November 2015.
[6] C. Yu and M. J. Ciesielski, "Efficient parallel verification of Galois field multipliers," ASP-DAC'17, 2017.
[7] D. Cox, J. Little, and D. O’Shea, Ideals, Varieties, and Algorithms. Springer, 1997.
[8] C. Yu, W. Brown, D. Liu, A. Rossi, and M. J. Ciesielski, "Formal verification of arithmetic circuits using function extraction," IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 35, no. 12, pp. 2131-2142, 2016.
[9] F. Farahmandi and P. Mishra, "Automated Test Generation for Debugging Arithmetic Circuits," in Proceedings of the conference on Design, automation and test in Europe (DATE). EDA Consortium, 2016.
[10] Farimah Farahmandi and Prabhat Mishra, "Automated debugging of arithmetic circuits using incremental Groebner basis reduction," in ICCD 2017, Boston, MA, USA. (to appear), 2017.
[11] S. Ghandali, C. Yu, D. Liu, B. Walter, and M. Ciesielski, "Logic Debugging of Arithmetic Circuits," in IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2015, pp. 113-118.
[12] C. Paar and J. Pelzl, Understanding cryptography: a textbook for students and practitioners. Springer Science \& Business Media, 2009.
[13] T. Nagell, Introduction to Number Theory. Almqvist \& Wiksell Stockholm, 1951.

14] B. Sunar and Ç. K. Koç, "Mastrovito multiplier for all trinomials," Computers, IEEE Transactions on, vol. 48, no. 5, pp. 522-527, 1999.
[15] P. L. Montgomery, "Modular multiplication without trial division," Mathematics of Computation, vol. 44, no. 170, pp. 519-521, 1985.
[16] C. K. Koc and T. Acar, "Montgomery multiplication in GF(2k)," Designs, Codes and Cryptography, vol. 14, no. 1, pp. 57-69, 1998.
[17] W. Decker, G.-M. Greuel, G. Pfister, and H. Schönemann, "Singular 3-1-6 A Computer Algebra System for Polynomial Computations," Tech. Rep., 2012, http://www.singular.uni-kl.de
[18] T. Su, C. Yu, A. Yasin, and M. Ciesielski, "Formal verification of truncated multipliers using algebraic approach and re-synthesis," in 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2017, pp. 415-420.
[19] T. Pruss, P. Kalla, and F. Enescu, "Equivalence Verification of Large Galois Field Arithmetic Circuits using Word-Level Abstraction via Gröbner Bases," in $D A C^{\prime} 14,2014$, pp. 1-6.


[^0]:    ${ }^{1}$ This is true for most structures such as Mastrovito multipliers, but may not be true for more complex ones, such as Montgomery; we currently limit our attention to those structures that can be made reconvergent fanout free.

