# Accurate Estimation of Soft Error Rate (SER) in VLSI Circuits

Atul Maheshwari, Israel Koren and Wayne Burleson

Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 01003, USA

#### Abstract

Trends in CMOS technology have resulted in circuits with higher soft error rate (SER), making it imperative to accurately estimate the SER of VLSI circuits. In this paper a comparative study is presented between the  $Q_{crit}$  method and the simulation method for estimating the circuit level SER. It is shown that for small circuits with uniformly distributed output values (e.g. Flip-flop, binary counter), both methods provide similar estimates for SER. However, for other circuits the  $Q_{crit}$ -based method provides SER estimates significantly different from the results of the simulation method. Errors of up to 34% have been observed for a microprocessor scoreboard circuit. This is due to the fact that the  $Q_{crit}$  method assumes that each node in the circuit is equally likely to be 0 or 1. The  $Q_{crit}$  method can also miss out logical masking within the circuit. Finally, a feasible method based on Monte-Carlo simulation is presented to estimate chip level SER in terms of Failure in Time (FIT) rate.

## 1 Introduction

Reliable operation of VLSI circuits is necessary to avoid catastrophic consequences especially for systems operating under adverse environment conditions. Information in electronic circuits is stored and communicated via a collection of electric charges. Any event which upsets the stored or communicated charge can cause errors in the circuit output. These errors are called transient faults, soft errors (SE) or single event upsets (SEU). The event causing the upset can be an energetic nuclear particle or an electrical source. The nuclear particles which create these upsetting events are either due to cosmic rays which bombard the earth constantly from space or radioactive atoms which exist in trace amounts in all materials due to atomic decay. Atmospheric nuclear particles include  $\alpha$ -particles, protons and neutrons. In this paper we limit our study to  $\alpha$ -particles and neutrons.

Traditionally, memories have been the main victims of soft errors because small transistor sizes are used to increase memory density, resulting in lower capacitance, and hence higher soft error rate (SER) [1, 2]. However, memories can be protected by error detecting/correcting codes. With scaling it has been observed that unprotected combinational logic is becoming more vulnerable to transient faults [3, 4].

SER estimation of combinational logic is a challenging task as it depends on several factors ranging from the time of particle strike, the node being hit and the particle energy. Moreover, on several occasions the error does not reach a latch or is masked by other electrical signals or in some cases does not result in a system failure. Several SER estimation

methods for VLSI circuits have been proposed. These include performing an analysis based on the critical charge ( $Q_{crit}$  method) or performing simulations using current injection waveforms (Simulation method).

In this paper a comparison between the two methods of estimating SER is presented. It is then shown that both of these methods can be used to estimate SER in terms of FIT rate. Finally, it is shown that for several circuits, SER estimation provided by the  $Q_{crit}$ method can be significantly inaccurate. The paper is organized as follows: After a brief introduction, fault models for  $\alpha$ -particle and neutron strikes used in the SER estimation are descrided in Section 2. A brief description of the two methods used for estimating SER is provided in Section 3. A comparison between the two methods and a chip level SER estimation method based on simulations is presented in Section 4. Section 5 describes the similarities and differences between the two methods using example circuits. The paper is concluded in Section 6.

## 2 Fault model

As shown in Figure 1 an  $\alpha$ -particle strike on a diffusion node can be modeled as a current-source. An approximate analytical model of a current transient is proposed in [10]. The model includes parameters which represent the maximum current, the collection time constant of the junction, and the time constant for initially establishing the ion track. Several other models have also been proposed [11, 12].



Figure 1. (a)  $\alpha$ -particle hit on a PMOS transistor (b) The hit modeled as a current source

The particle strike model presented in [10] requires significant information about the process. A simpler model for  $\alpha$ -particle strike in a  $0.18\mu$  technology is shown in Figure 2. In this model, a piecewise linear (PWL) description of the injected current is used. The major pulse is 50ps wide followed by a slow linear decay (150ps wide). Figure 3 shows the current injected for neutrons. A neutron strike results in a pulse with higher peak current but smaller width compared to the current pulse injected due to an  $\alpha$ -particle strike. The neutron current pulse does not have the decay characteristics of the  $\alpha$ -particle current waveform. These fault models for  $\alpha$ -particles and neutrons are used for SER estimation in this paper.



Figure 2. PWL model for  $\alpha$ -particle strike

Figure 3. PWL model for neutron strike

## **3** SER estimation methods

#### **3.1** $Q_{crit}$ method

The soft error rate of a circuit is a statistical quantity and can be defined as the average rate of upsets for all collected charges, all injection times and all nodes in the circuit. SER can be expressed as [5]:

$$SER = \Phi_{\alpha} \sum_{n=1}^{N} A_n \sum_{i=1}^{k} prob(Q_{i,n}) \Delta Q \sum_{\substack{j=t_0^{inj}}}^{M} \frac{upset_{j,i,n}}{M}$$
(1)

where  $\Phi_{\alpha}$  denotes the  $\alpha$  particle flux,  $A_n$  is the active area of the node n, N is the number of nodes in the circuit,  $prob(Q_{i,n})$  is the probability that a charge  $Q_i$  is collected per alpha particle at node n, k is the number of charge levels (corresponding to charges  $Q_0, Q_1, ..., Q_i, ...Q_k$ ),  $\Delta Q$  is the charge level increment, M is the number of time injections (corresponding to times  $t_0^{inj}, t_1^{inj}, ..., t_j^{inj}, ...t_{M-1}^{inj}, T_{cycle}$ ),  $upset_{j,i,n}$  equals 1 if and only if node n is upset by charge  $Q_i$  at time  $t_j^{inj}$  and  $T_{cycle}$  is the cycle time.

The concept of a charge threshold necessary to alter the state of a memory cell is well defined and is called the critical charge  $Q_{crit}$ . If a charge is injected into the node of a nonmemory device,  $Q_{crit}$  is defined by the state of the next memory cell or latching element located further downstream.  $Q_{crit}$  is the minimum charge that needs to be injected into a node of a circuit to result in a detectable upset. Equation (1) can be expressed in terms of the critical charge as [4]

$$SER_{circuit} = \Phi_{\alpha} \sum_{n=1}^{N} A_n \sum_{j=t_0^{inj}}^{M} \delta_{n,j} \sum_{Q=Qcrit_{n,j}}^{Q_{max}} prob(Q) \Delta Q$$
(2)

where  $\delta_{n,j}$  is the ratio of the charge injection interval to the cycle time,  $Qcrit_{n,j}$  denotes the critical charge of node n at charge injection time  $t_j^{inj}$  and  $Q_{max}$  is the maximum charge collection possible for the technology under consideration.

To compute the SER of a circuit, one needs:

- the charge-collection probability of the technology under consideration
- the waveform of the particle-induced injected current

• the critical charge  $Q_{crit}$  of each node in the circuit as a function of the charge injection time.

The circuit level SER estimation begins with estimating the Qcrit for all the nodes in the circuit at each charge injection time using SPICE-based circuit simulation. Once these are estimated, the SER of the circuit can be estimated using the charge collection probability of the technology as shown in equation (2).

#### 3.2 Simulation method

The simulation method is based on the inject and evaluate paradigm. Faults are injected in the circuit in the form of an artifical current source and simulated to check for errors. A simulation based method to estimate the probability of error (POF) is presented in [7, 8, 9].

The POF of the circuit is given by

$$POF_{circuit} = \sum_{i=1}^{n} w_i POF_i \quad , \quad where \qquad w_i = \frac{A_i}{\sum_{i=1}^{n} A_i} \tag{3}$$

Here  $A_i$  is the active area of the node *i*.  $POF_i$  is given by

$$POF_i = \frac{1}{k} \sum_{i=1}^k E_i$$
, where  $k = p \cdot q \cdot r$  (4)

 $E_i$ , the outcome of a fault injection experiment is given by

$$E_i = \begin{cases} 1 & \text{if the injection into node } i \text{ results in a fault getting latched} \\ 0 & \text{otherwise} \end{cases}$$
(5)

and

p is the number of input or state combinations,

q is the number of particle injection levels considered,

r is the number of time instances at which faults are injected,

n is the number of nodes in the circuit.

The POF of a circuit is thus a measure of the conditional probability of error given that a particle hits the circuit.  $POF_{circuit}$  is a weighted sum of the POFs of all the nodes in the circuit, similar to the SER method as seen in equation (2).

## 4 Comparison of the two methods

The simulation method considers all the factors that affect SER and hence is quite accurate. However, it is significantly inefficient and prohibitive for larger circuits. By performing randomly selected simulations (Monte-Carlo method), an accurate estimate of SER can be obtained very efficiently [13]. The simulation method typically assumes uniform distribution of charge collection from 0 to  $Q_{max}$ . However, non-uniform charge collection distribution can be used to achieve a more accurate estimate for the SER.

The  $Q_{crit}$  method assumes that the SER is independent of the input vector. It also assumes that each node is equally likely to be at 0 or 1. This can lead to significant errors in circuits that have non-uniform distributions for node values. Moreover, the  $Q_{crit}$  method does not take logical masking into account. In many cases it is possible that the error caused by a particle strike does not reach the latch as it gets logically masked. Logical masking depends on the circuit structure and on the inputs to the circuit.

A chip-level SER estimation approach based on critical charge is described in [6]. Chiplevel SER can be estimated from circuit SER as

$$SER_{chip}^{Q_{crit}} = \sum_{circuits} SER_{circuit} \cdot TD \cdot LD \tag{6}$$

where  $SER_{circuit}$  is the circuit level SER, TD and LD are the timing and logic derating factors, respectively. Timing derating is defined as the fraction of time a circuit is susceptible to errors that will be able to propagate. Logic derating is the probability that the error caused by the circuit impacts the behavior of the chip. Determining logic derating is quite difficult and needs significant chip level simulations.

The simulation method can also be used to estimate chip-level SER. The circuit-level SER estimation using the POF of each node can be calculated as:

$$SER_{circuit} = \Phi_{\alpha} \sum_{n}^{nodes} A_n \cdot POF_n$$
 (7)

and the chip-level SER can be obtained as:

$$SER_{chip}^{sim} = \sum_{circuits} SER_{circuit} \cdot LD$$
 (8)

The SER measures given by equations (6) and (8) are in terms of Failure in Time (FIT). Chip-level SER estimate using the simulation method is more accurate as the estimate of  $SER_{circuit}$  for each circuit is more accurate.

## 5 Example circuits

In this section the two methods of estimating SER are compared using some example circuits.

#### 5.1 Master Slave Flip Flop (MSFF)

A simple master slave flip flop is shown in Figure 4. For estimating SER using the  $Q_{crit}$  method, simulations are performed to estimate the  $Q_{crit}$  of each node as a function of time. Figure 5 shows the behavior of  $Q_{crit}$  with time for an internal clock node and a data storage node in the MSFF. Similar behavior is determined for other nodes. Using equation (2), the SER for  $\alpha$ -particles and neutrons is estimated. Figure 6 shows that the two methods agree on the SER of the circuit.

#### 5.2 Binary counter

A 4-bit binary counter is shown in Figure 7. The SER estimation using the two methods is shown in Figure 8. As seen, the two methods provide a similar estimate for the SER. The binary counter is a very balanced design and most of the nodes are equally likely to be at 0 or 1. The 4-bit binary counter used in this example consists of about 80 nodes and 100 transistors.



Figure 4. A Master Slave Flip Flop (MSFF)



Figure 5.  $Q_{crit}$  as a function of time

Figure 6. SER estimation comparison for MSFF

## 5.3 Decoder

Figure 9 shows the schematic for a 3-8 decoder. The decoder has about 200 nodes and 230 transistors. The SER estimation of the circuit depicted in Figure 10 shows a significant difference (20%) between the SER estimation of the two methods. The  $Q_{crit}$  method assumes that each node is equiprobable to be a 0 or a 1. In the case of the decoder node "out0" is more likely to be 0 than 1. The simulation method on the other hand considers all possible primary inputs and does not assume a uniform probability distribution for a circuit node and hence provides a more accurate estimate.

#### 5.4 Alpha 21164 scoreboard

A truncated implementation of the Alpha 21164 scoreboard [14] is shown in Figure 11. The scoreboard circuit has about 60 nodes and 125 transistors. As shown, the output of a multiplexer and the output of a dynamic OR gate are the inputs to the conflict detecting domino stage. The output of the conflict detecting domino stage is latched to give the final output. As shown in Figure 12, the two methods give very different estimates of SER. The SER obtained by the  $Q_{crit}$  method is significantly higher than that of the simulation method. This is because, depending on the inputs, the errors in the MUX or the OR gate will get logically masked. Since the  $Q_{crit}$  method does not consider input vectors, it gives a 34% higher value of SER.



Figure 7. A 4-bit binary counter



Figure 9. A 3-8 decoder



Figure 11. Alpha 21164 scoreboard



Figure 8. SER estimation comparison for binary counter







Figure 12. SER estimation comparison for the scoreboard

## 6 Conclusion

Estimating SER of VLSI circuits is a challenging and important task. In this paper we have presented a comparative study between the  $Q_{crit}$  method and the simulation method for estimating the circuit level SER. The  $Q_{crit}$  method assumes that each node in the circuit is equally likely to be 0 or 1 and can miss out logical masking within the circuit resulting in significant errors (up to 34%). A feasible method based on Monte-Carlo simulations was also presented in this paper for estimating chip level SER in terms of Failure in Time (FIT) rate. As shown in Table 1 the Monte-Carlo(MC) simulation based method provides a resonably accurate estimate of SER with run times much smaller than the  $Q_{crit}$ -based method.

| Circuit    | Nodes | Method      | $\mathrm{SER}	ext{-}lpha$ | SER-neutron        | Runtime |
|------------|-------|-------------|---------------------------|--------------------|---------|
|            | (N)   |             | $(SER/\alpha$ -flux)      | (SER/Neutron-flux) | (Hours) |
| Counter    | 80    | $Q_{crit}$  | 45.93                     | 29.34              | 4.25    |
|            |       | Simulations | 47.58                     | 31.11              | 82.5    |
|            |       | Sim - MC    | 49.24                     | 29.71              | 1.5     |
| Decoder    | 200   | $Q_{crit}$  | 149.3                     | 98.4               | 11.5    |
|            |       | Simulations | 192.1                     | 134.6              | 48.67   |
|            |       | Sim - MC    | 185.8                     | 121.1              | 1.10    |
| Scoreboard | 60    | $Q_{crit}$  | 328.3                     | 241.4              | 3.5     |
|            |       | Simulations | 267.5                     | 198.4              | 108     |
|            |       | Sim - MC    | 245.8                     | 181.1              | 2.5     |

Table 1. Comparison of SER estimation methods for various circuits

# References

- T. C. May and M. H. Woods, "Alpha-particle-induced soft errors in dynamic memories," *IEEE Transactions on Electronic Devices*, Jan 1979, vol. 26, no. 1, pp. 2-9.
- [2] J. F. Ziegler et al, "IBM Experiments in soft fails in computer electronics(1978-1994)," IBM Journal of Research and Development, 1996, Vol. 40, pp. 3-17.
- [3] P. Shivakumar et al., "Modeling the Effect of Technology Trends on the Soft Error Rate of Combinatorial Logic," Proc. of Dependable Systems and Networks, 2002, pp. 23-26.
- [4] N. Seifert et al, "Impact of Scaling on Soft-Error Rates in Commercial Microprocessors," *IEEE Transactions on Nuclear Science*, Vol. 49, No. 6, Dec. 2002, pp. 3100-3106.
- [5] N. Seifert et al, "Frequency Dependence of Soft Error Rates for Sub-Micron CMOS Technologies," Proc. of International Electron Devices Meeting, 2001, pp. 14.4.1-14.4.4
- [6] H. T. Nguyen and Y. Yagil, "A Systematic Approach to SER Estimation and Solutions," *Proc. of International Reliability Physics Symposium*, 2003, pp. 60-70.

- [7] M. Singh, R. Rachala, and I. Koren, "Transient fault sensitivity analysis of analogto-digital converters (ADCs)," Proc. of IEEE Annual Workshop on VLSI, 2001, pp. 140-145.
- [8] M. Singh and I. Koren, "Fault Sensitivity Analysis and Reliability Enhancement of Analog-to-Digital Converters," *IEEE Transactions on VLSI circuits*, Nov. 2003, pp. 839-852.
- [9] A. Maheshwari, W. Burleson and R. Tessier, "Trading-off Transient Fault-tolerance and Power Consumption in Deep Submicron (DSM) VLSI Circuits," *IEEE Transactions on VLSI circuits*, vol 12, no. 3, March 2004, pp. 299-311
- [10] G. C. Messenger, "Collection of charge on junction nodes from ion tracks," IEEE Transactions on Nuclear Science, 1982, pp. 2024-2031.
- [11] G. Laguna and R. Treece, "VLSI modeling and design for radiation environments," Proceedings of IEEE International Conference on Computer Design, 1986, pp. 380-384.
- [12] L. B. Freeman, "Critical charge calculations for a bipolar SRAM array," *IBM Journal of Research and Development*, vol. 40, No. 1., Jan. 1996, pp. 119-129.
- [13] A. Maheshwari, I. Koren and W. Burleson, "Techniques for Transient Fault Sensitivity Analysis and Reduction in VLSI Circuits," Proc. International Symposium on Defect and Fault Tolerance, 2003, pp. 597-604.
- [14] B. Benschneider et. al, "A 300 MHz 64-b quad issue CMOS RISC Microprocessor," Journal of Solid State Circuits, vol. 30, no. 11, Nov. 1995, pp. 1203-1214.