## ANALYZING THE CONNECTIVITY AND BANDWIDTH OF MULTI-PROCESSORS WITH MULTI-STAGE INTERCONNECTION NETWORKS 1 University of Massachusetts, Amherst, MA 01003 Dept. of Electrical and Computer Engineering Israel Koren 2 and Zahava Koren #### Abstract the presence of a few faulty elements, we need to predict the performance of result of manufacturing defects or failures that occur while the system is some of the system elements to become faulty. processors, memory modules and interconnection switches, we must expect the degraded system. already in operation. When implementing multi-processing systems consisting of a large number of If the system is allowed to continue its operation in These faults may be the network. Finally, we compare the two systems through some numerical redundant system and for a system with redundancy in its interconnection and connectivity. We then derive expressions for these measures for a nona multi-stage interconnection network in the presence of faulty elements. In this paper we analyze the performance of multi-processor systems with We propose the use of two measures for performance, namely, bandwidth #### I. Introduction network can be a crossbar network, a multiple bus network or a multi-stage memory modules through an interconnection network. This interconnection shared-memory multi-processors where all processors can access a set of cessors. One important class of these multi-processing systems includes the tion of multi-processing systems consisting of a very large number of proaided design tools like silicon compilers, enable the design and implementainterconnection network. Recent advances in VLSI technology and development of new computer- <sup>&</sup>lt;sup>1</sup>This work was supported in part by NSF under contract DCR-85-09423 <sup>2</sup>On leave from the Technion, Haifa 32000, Israel repair and/or replacement can take place. would still like to use the system at a degraded rate of performance until a the faulty elements can not be immediately repaired or replaced and we failures that occur while the system is already in operation. In many cases processors, memory modules or interconnection switches) are expected to become faulty. These faults may be the result of manufacturing defects or When implementing a complex multi-processor, some of its elements (like relatively short down-time period may be intolerable. too long. An example might be a real-time computing system where even a be replaced but the mean time to repair or replacement is considered to be be present (e.g., [4]). A different situation is when the faulty elements can large area VLSI chips or even wafers where some defective elements may situations. In one case faulty elements can be neither repaired nor replaced An example might be a multi-processor integrated on a small number of performs in the presence of faults. This question may arise in different An important question then is how well the gracefully degrading system stage network. memory multi-processors, namely, those interconnected through a multi-In this paper we attempt to answer the above question for one type of shared- suggested in recent publications (e.g., [1], [3], [5], [6] and [8]). A survey of these and other schemes is presented in [2]. tolerance into the architecture of these interconnection networks have been unreachable from certain processors. Several schemes for introducing faultfore, a single fault in any internal switch or link will render some memories unique path between any processor and any memory module and thereinherently very sensitive to failures of any kind. They usually provide a ternative to the expensive crossbar networks. However, these networks are Multi-stage interconnection networks were proposed as a cost-effective al- that the bandwidth of their proposed fault-tolerant multi-stage network is the performance of the network. For example, it has been shown in [8] comparable even to that of a crossbar network. disjoint paths provided by these schemes may in some cases also increase ing elements and their interconnections (e.g., [5], [6] and [8]). The multiple destination pair so that all single faults and many multiple faults can be adding an extra stage of switches (e.g., [1]), or by augmenting the switchtolerated. This is achieved by augmenting an existing topology either by Most of the proposed schemes provide redundant paths between every source/ Section V. Final conclusions are presented in Section VI. a network with built-in redundancy, namely, the Extra Stage Cube network several basic assumptions and notations. In Section III the non-redundant objective functions for measuring the performance of these systems. In the [1]. These networks are then compared through some numerical examples in network is analyzed. A similar analysis is then repeated in Section IV for next section we present the proposed performance measures and introduce (with and without redundancy) in the presence of faults. We also propose In this paper we examine the performance of multi-stage multi-processors ## II. Preliminaries and Notations to the above mentioned simpler case. For the sake of clarity and brevity however, we restrict our discussion here number of processors and finally, the network is built from $a \times b$ switches. not necessarily a power of 2, the number of memories is different from the analysis can be generalized to the case where the number of processors is a multi-stage interconnection network designed out of $2 \times 2$ switches. Our Consider N processors (where $N=2^k$ ) connected to N memories through faults in the interconnection network. we have to consider faulty processors and faulty memories in addition to to be fault-free. Clearly, this assumption is invalid in our environment and the interconnection network while the processors and memories are assumed in most previous studies it has been assumed that faults can occur only in some networks with internal redundancy have been previously analyzed but per stage and/or use more complex switches. Non-redundant networks and Redundant networks have a larger number of stages and/or more switches k = log N stages, each containing N/2 switches as illustrated in Fig. An $N \times N$ interconnection network with no redundancy is constructed of that switch faults are covered as well. is the link fault model (e.g., [2]), however, we allow multiple link faults so of a faulty (fault-free) link. Our fault model for the interconnection network the probability of a faulty (fault-free) memory and by $q_l(p_l)$ the probability defects at t=0 or became faulty later on. Similarly, we denote by $q_m$ $(p_m)$ t=0 then $q_r$ is the probability that manufacturing defects have occurred in the processor. If t > 0 then $q_r$ is the probability that the processor either had instant t and let $p_r = 1 - q_r$ denote the probability of a fault-free processor. If Let q denote the probability that a processor is faulty at some given time Figure 1: An 8 × 8 non-redundant interconnection network. simultaneously. processors) that can be transmitted through the interconnection network the interconnection network, and the number of memory requests (from the of fault-free processors and memories, the number of fault-free paths within capabilities of the gracefully degrading multi-stage system, e.g., the number and links can fail. The performance measure to be used should capture the for multi-stage networks in this environment where processors, memories We are interested in comparing the performance of alternative architectures connection even if the memories involved are distinct. A processor-memory connection can be blocked by a previously established nection networks paths are shared by two or more processor-memory pairs. effect of blocking which results from the fact that in multi-stage interconthat a request blocked at any stage is lost. The bandwidth measures the each processor generates, with probability $p_a$ , a request during a cycle, and of requests for the shared memory which are accepted per cycle, given that work is the bandwidth [7]. The bandwidth is defined as the expected number A commonly used measure for the performance of an interconnection net- and still connected to some fault-free memories. System connectivity can be the and hence, it provides only limited insight regarding the connectivity of the multi-processing system, i.e., the number of fault-free processors which are The bandwidth measure is concerned only with the interconnection network amount of traffic in the network (i.e., the probability of request is therefore, insufficient for our purposes. It is strongly affected by Pa) used in addition to the bandwidth measure. availability. In what follows we present two measures for connectivity to be basis for measuring the processing power of the system or its computational a complete fault-free $N_r \times N_m$ interconnection network exists. defined for memories. Note that this tuple does not necessarily imply that are connected to at least one fault-free memory module and $N_m$ is similarly $(N_r, N_m)$ where $N_r$ denotes the average number of fault-free processors which dication on how many distinct processors and memories are still connected fault-free. A shortcoming of this objective function is that it provides no in-We propose therefore, as an additional measure for connectivity, the tuple ational processor-to-memory paths where both processor and memory are One proposed measure for connectivity is C - the average number of oper- stage multi-processor in the presence of faulty elements. The bandwidth and the two connectivity measures, namely, C and the tuple $(N_r, N_m)$ , can together adequately characterize the capabilities of a multi- # III. A Non-redundant Interconnection Network appears in [7]. In what follows we generalize it to an environment in which faults may be present. as previously defined. An analysis of the bandwidth for a fault-free network derive expressions for its bandwidth and the different connectivity measures In this section we analyze a non-redundant interconnection network and such a request has been accepted at the input links to the same switch. request on an output link of a switch is calculated from the probability that iteratively, following a path leading to this memory , i.e, the probability of a non-faulty and has a request at its input. This last probability is calculated number of memories N by the probability that a given memory module is ories. Therefore, the network bandwidth can be obtained by multiplying the ory requests are independent and uniformly distributed among the N mem-We adopt here the simplifying assumption that the destinations of the mem- so that stage 0 is the last stage and its output links are connected to the memories, stage (k-1) is the first one and its inputs are connected to the in state 1. We assign numbers to the k = log N stages in a descending order The probability of a request on a link is thus the probability that this link is not) a request for the memory. A faulty link is considered to be in state 0. To simplify our discussion we say that a link is in state 1 (0) if it has (has an expression for the probability that it is at state 1, i.e., $P\{X^{(i)}=1\}$ . same. Hence, it is sufficient to consider only a single output link and derive probability that an incoming request will be routed to any output link is the that memory requests are uniformly distributed among the memories, the $X^{(i)}, Y^{(i)}$ . Its input links are the outputs of (two different) switches in stage (i+1) and are denoted by $X^{(i+1)}$ and $Y^{(i+1)}$ . Based on our assumption processors (see Fig. 1). Consider a switch in stage i and denote its outputs through any of the two input links, the state probability $P\{X^{(i)}=1\}$ of the Since a request for a memory module can reach the output link of a switch given output link has to be calculated from the joint probabilities of these $$P\{(X^{(i+1)}, Y^{(i+1)}) = (u, v)\}; u, v = 0, 1.$$ to be equivalent, the incoming requests are routed to any of the two output the two inputs and the output of a switch are, destinations of the incoming requests. Since memory modules are assumed account the status (faulty or fault-free) of the (physical) input links and the links with probability 0.5. Consequently, the transition probabilities between This calculation is performed using transition probabilities which take into $$P\{X^{(i)} = 1 / (X^{(i+1)}, Y^{(i+1)}) = (0, 0)\} = 0$$ $$P\{X^{(i)} = 1 / (X^{(i+1)}, Y^{(i+1)}) = (0, 1)\} = \frac{1}{2} p_{i}$$ $$P\{X^{(i)} = 1 / (X^{(i+1)}, Y^{(i+1)}) = (1, 0)\} = \frac{1}{2} p_{i}$$ $$P\{X^{(i)} = 1 / (X^{(i+1)}, Y^{(i+1)}) = (1, 1)\} = p_{i} - \frac{1}{4} p_{i}^{2}$$ (3.1) links are considered as input link faults at the next stage. Note that only input link faults are taken into account. Faults at the output The state probability $P\{X^{(i)}=1\}$ of the output link is given by $$P\{X^{(i)} = 1\} = \frac{1}{2} p_i \left[ P\{(X^{(i+1)}, Y^{(i+1)}) = (0, 1)\} + P\{(X^{(i+1)}, Y^{(i+1)}) = (1, 0)\} \right]$$ $$+ p_i \left(1 - \frac{1}{4} p_i\right) \times P\{(X^{(i+1)}, Y^{(i+1)}) = (1, 1)\}$$ (3.2) For the non-redundant network the inputs into each switch are independent $$P\{(X^{(i+1)}, Y^{(i+1)}) = (u, v)\} = P\{X^{(i+1)} = u\} \times P\{Y^{(i+1)} = v\}; \quad u, v = 0, 1$$ (3.3) Using (3.3) and the following equation $$P\{Y^{(i+1)} = 0\} = P\{X^{(i+1)} = 0\} = 1 - P\{X^{(i+1)} = 1\}$$ we obtain from (3.2) after some algebraic manipulations, $$P\{X^{(i)}=1\}=p_l\times P\{X^{(i+1)}=1\}-\frac{1}{4}\ p_l^2\times (P\{X^{(i+1)}=1\})^2 \qquad (3.4)$$ $2 \times 2$ switches are assumed. This expression is identical to the one derived in [7] if fault-free (i.e., $p_i = 1$ ) probabilities, starting from the processors outputs up to the memory inputs. This simple recursion formula enables us to calculate the successive state For the processors output links we have $$P\{X^{(k)} = 1\} = p_a \ p_r \tag{3.5}$$ Recursively, we calculate $P\{X^{(0)}=1\}$ . To compute the bandwidth note that the memory and its input link can be faulty as well, hence, $$BW = N \times P\{X^{(0)} = 1\} \times p_m p_l$$ (3.6) processor, respectively. nected to at least one memory, and of memories connected to at least one memory pairs, and by $N_r$ and $N_m$ - the average number of processors connectivity, as measured by C - the average number of connected processor-The next part of this section is devoted to the analysis of the network con- tween a processor and a memory and consequently, the calculation of Cmemory pairs by the probability of a fault-free path. The latter probability is straightforward. C is obtained by multiplying the number of processor-In a non-redundant interconnection network there is exactly one path be- $$p_r p_l^{k+1} p_m \tag{3.7}$$ where (k+1) is the number of links along the path. Therefore, $$C = N^2 \times p_r \ p_l^{k+1} \ p_m \tag{3.8}$$ connected to processor 0. $\Phi_r$ can now be expressed in terms of the events calculate the probability $\Phi_r$ . the same processor, we must utilize the inclusion and exclusion principle to by multiplying N by $\Phi_r$ . Due to the overlapping of the paths leading to free and is connected to at least one fault-free memory. Nr is obtained $E_j$ as the probability that at least one of the events $E_j$ occurs, Let $\Phi_r$ be the probability that a given processor (say processor 0) is fault-Define $E_j$ as the event in which memory j is $$\Phi_r = p(E_1 \cup E_2 \cup ... \cup E_N) \tag{3.9}$$ The inclusion and exclusion formula states that $$p(\cup E_j) = \sum_{i=1}^{N} (-1)^{(i-1)} W(i)$$ (3.10) where W(i) is the sum over all $\binom{N}{i}$ subsets $\{j_1, j_2, ..., j_i\}$ of size i, of the probability that all paths in the subset are operational, namely, $$W(i) = \sum_{\{j_1, \dots, j_i\}} P(E_{j_1} \cap E_{j_2} \cap \dots \cap E_{j_i})$$ (3.11) once. Let d denote the number of distinct links in the subset, then subset have in common, since each link must be taken into account exactly not only on i but also on the number of links that the paths in the given For a subset of i paths to be operational, all links in the subset must be fault-free. Hence, the required probability $P(E_{j_1} \cap E_{j_2} \cap ... \cap E_{j_i})$ depends $$P(E_{j_1} \cap E_{j_2} \cap ... \cap E_{j_i}) = p_m^i \ p_r \ p_i^d \tag{3.12}$$ $$W(i) = p_m^i p_r \sum_d S_{i,d} p_i^d$$ links. Note that d can be expressed as the sum $d_0 + d_1 + ... + d_k$ , where $d_n$ is the number of distinct links in the subset at level n. Also note that for any subset of size i, $d_k = 1$ and $d_0 = i$ . Using combinatorial arguments which are omitted here for the sake of brevity, we obtain the following equation for where $S_{i,d}$ is the number of subsets of size i which consist of exactly d distinct $$S_{i,d} = \sum_{d_0 + d_1 + \dots + d_k = d} {2 \choose d_{k-1}} 2^{d_1 + \dots + d_{k-2} + 2d_{k-1} - i} \prod_{n=1}^{k-1} {d_n \choose d_{n-1} - d_n}$$ (3.13) As an example, for the $8 \times 8$ interconnection network in Fig. 1 we obtain, $$W(1) = p_{m} p_{l} p_{r} 8p_{l}^{2}$$ $$W(2) = (p_{m} p_{l})^{2} p_{r} p_{l} (4p_{l}^{2} + 8p_{l}^{3} + 16p_{l}^{4})$$ $$W(3) = (p_{m} p_{l})^{3} p_{r} p_{l} (8p_{l}^{3} + 16p_{l}^{4} + 32p_{l}^{5})$$ $$W(4) = (p_{m} p_{l})^{4} p_{r} p_{l} (2p_{l}^{3} + 4p_{l}^{4} + 48p_{l}^{5} + 16p_{l}^{6})$$ $$W(5) = (p_{m} p_{l})^{5} p_{r} p_{l} (24p_{l}^{5} + 32p_{l}^{6})$$ $$W(6) = (p_{m} p_{l})^{6} p_{r} p_{l} (4p_{l}^{5} + 24p_{l}^{6})$$ $$W(7) = (p_{m} p_{l})^{7} p_{r} p_{l} 8p_{l}^{6}$$ $$W(8) = (p_{m} p_{l})^{6} p_{r} p_{l} p_{l}^{6}$$ Substituting W(1),...,W(N) into (3.10) and then multiplying by N yields $N_r$ . $N_m$ is obtained similarly by interchanging $p_r$ and $p_m$ . # IV. The Extra Stage Interconnection Network dency among the links. dence between the two inputs to any switch, enabling the straightforward calculation of the joint probabilities in (3.2). The incorporation of redunand a different analysis is required, depending on the network's topology. more) paths connecting any given processor-memory pair, introduces dependancy into the multi-stage interconnection network, resulting in two (or The analysis of the non-redundant network was simplified by the indepen-Equation (3.3) is no longer valid in the general case on one hand and the internal stages on the other hand. All internal stages (stage 0) require a different treatment. will be analyzed in one way while the first stage (stage k) and the last stage network we have therefore, to distinguish between the first and last stages upon the failure of a single link. When calculating the bandwidth of the to avoid the disconnection of a fault-free processor (or a fault-free memory) at the input and output stages, respectively. The purpose of these circuits is is further complicated by the existence of multiplexers and demultiplexers $k+1 = \log N + 1$ stages and is depicted in Fig. 2. The analysis of this network in this section the Extra Stage Cube Network (ESC) [1] which includes As an example for an interconnection network with redundancy we analyze the analysis mathematically intractable. joint probabilities of eight links (and so on for larger ESC networks) making of stage 2. These probabilities might in turn, require the knowledge of the example, to calculate the joint probability of output links 0 and 1 of stage the four links at level (i+2) through which all incoming requests pass. For To calculate the state probability $P\{X^{(i)} = 1\}$ for an internal stage we must utilize the joint probabilities $P\{(X^{(i+1)}, Y^{(i+1)}) = (u, v)\}; u, v = 0, 1$ . Since the input links $X^{(i+1)}$ and $Y^{(i+1)}$ are dependent (there is at least one their joint probability has to be calculated from the joint probabilities of processor which may send its memory requests through either one of them), 2 we need the joint probabilities of the output links 0, 1, 2 and 3 links 0 and 1 of stage 2 in Fig. 2 are dependent since processor 0 (and 1) leading to two switches are dependent at any stage. For example, output to any given memory and therefore, no more than two links out of every four However, each processor connected to the ESC has only two alternative paths Figure 2: An 8 × 8 Extra Stage Cube network (ESC). 2 and 3 are dependent. The pair 0,1 is however, independent of the pair 2,3. can send requests to memory 0 through either one of them. Similarly, links In general, for the internal stages in the ESC network, we have $$P\{(X^{(i)}, Y^{(i)}) = (u, v)\}$$ $$= \sum_{(s_0, \dots, s_3) = (1111)} P\{(X^{(i+1)}, Y^{(i+1)}, Z^{(i+1)}, W^{(i+1)}) = (s_0, s_1, s_2, s_3)\}$$ $$\times P\{(X^{(i)}, Y^{(i)}) = (u, v) / (X^{(i+1)}, Y^{(i+1)}, Z^{(i+1)}, W^{(i+1)}) = (s_0, s_1, s_2, s_3)\}$$ $$= \sum_{(s_0, \dots, s_3) = (1111)} P\{(X^{(i+1)}, Y^{(i+1)}) = (s_0, s_1)\} \times P\{(Z^{(i+1)}, W^{(i+1)}) = (s_2, s_3)\}$$ $$\times P\{X^{(i)} = u/(X^{(i+1)}, Z^{(i+1)}) = (s_0, s_2)\} \times P\{Y^{(i)} = v/(Y^{(i+1)}, W^{(i+1)}) = (s_1, s_3)\}$$ $$u, v = 0, 1$$ the calculation of the joint probabilities of stage k slightly more complicated their states, similarly to (3.3). However, the existence of multiplexers makes between the two processors enables us to calculate the joint probabilities of the first stage (stage k). Once we reach the first stage, the independence can be calculated recursively beginning at the last stage (stage 0) down to Consequently, only joint probabilities of two links are required and these that the joint probabilities for the two outputs of any switch in the first than in the non-redundant network. By observing the ESC network we see $$P\{(X^{(k)}, Y^{(k)}) = (0,0)\} = (P\{X^{(k+1)} = 0\})^2$$ $$+q_i^3 \left[ 2 P\{X^{(k+1)} = 0\} \times P\{X^{(k+1)} = 1\} + q_i(P\{X^{(k+1)} = 1\})^2 \right]$$ $$P\{(X^{(k)}, Y^{(k)}) = (0,1)\} = (1-q_i^3) P\{X^{(k+1)} = 1\} \times P\{X^{(k+1)} = 0\}$$ $$+(2p_iq_i^3 + p_i^2q_i^2) (P\{X^{(k+1)} = 1\})^2$$ $$P\{(X^{(k)}, Y^{(k)}) = (1,0)\} = P\{(X^{(k)}, Y^{(k)}) = (0,1)\}$$ $$P\{(X^{(k)}, Y^{(k)}) = (1,1)\} = p_i^2(2-p_i)^2 (P\{X^{(k+1)} = 1\})^2$$ $$P\{(X^{(k)}, Y^{(k)}) = (1,1)\} = p_i^2(2-p_i)^2 (P\{X^{(k+1)} = 1\})^2$$ where $P\{X^{(k+1)} = 1\}$ is given by (3.5) and $P\{X^{(k+1)} = 0\} = 1 - P\{X^{(k+1)} = 1\}$ . ing transition probabilities: For the last stage (stage 0) which includes demultiplexers we use the follow- $$P\{X^{(0)} = 1 / (X^{(1)}, Y^{(1)}) = (0, 0)\} = 0$$ $$P\{X^{(0)} = 1 / (X^{(1)}, Y^{(1)}) = (0, 1)\} = \frac{1}{2} p_{l}$$ $$P\{X^{(0)} = 1 / (X^{(1)}, Y^{(1)}) = (1, 0)\} = \frac{1}{2} (1 - q_{l}^{2})$$ $$P\{X^{(0)} = 1 / (X^{(1)}, Y^{(1)}) = (1, 1)\} = \frac{5}{4} p_{l} - \frac{1}{2} p_{l}^{2}$$ $$(4.3)$$ The bandwidth is then calculated from $$BW = N \times P\{X^{(0)} = 1\} \times p_m \ p_l \tag{4.4}$$ between a given processor-memory pair exists. Each processor-memory pair in the ESC network is connected by two disjoint paths (except for both memory pairs (i.e., $N^2$ ) times the probability that at least one fault-free path The measure C can be expressed as the product of the number of processor-In what follows we calculate the two measures for network connectivity $$P\{At \ least \ one \ path \ is \ fault-free\} = P\{First \ path \ is \ fault-free\}$$ $$+P\{Second \ path \ is \ fault-free\} - P\{Both \ paths \ are \ fault-free\} \ (4.5)$$ $$= p_r(1-q_l^2)p_l^4(1-q_l^2)p_m + p_rp_l^6p_m - p_rp_l^{10}p_m = p_rp_mp_l^6(5-4p_l+p_l^2-p_l^4)$$ Multiplying by $N^2$ yields $C$ . $\Phi_r$ as the probability that a given processor (say processor 0) is connected to $N_r$ and $N_m$ are calculated following the same steps as in Section 3. Define from processor 0 is fault-free. Recalling that each processor is connected to all memories through 2N paths, equations (3.9) (3.10) become, at least one memory, and $E_j$ as the event in which the j-th path emanating $$\Phi_r = P\{E_1 \cup E_2 \cup ... \cup E_{2N}\} = \sum_{i=1}^{2N} (-1)^{i-1} W(i)$$ (4.6) expressed as $d_0 + d_1 + ... + d_{k+1}$ where $d_{k+1} = 1$ and $d_0$ is the number of memories that the paths in the subset lead to. Given d and $d_o$ , where W(i) is defined in (3.11). To obtain the probability that a given subset of paths $\{j_1, ..., j_i\}$ is fault-free, note that in the ESC network each path has (k+2) links, hence the number of distinct links in a subset of paths can be $$P(E_{j_1} \cap ... \cap E_{j_i}) = p_r \ p_m^{d_0} \ p_i^{d-d_0-1} \ (1-q_i^2)^{d_0+1} \tag{4.7}$$ This equation differs from (3.12) because of the existence of multiplexers and demultiplexers in the ESC network. Denote by $S_{i,d,d_0}$ the number of subsets of size i which consist of exactly d distinct links, out of which $d_0$ are $$W(i) = p_r \sum_{d,d_0} S_{i,d,d_0} \ p_m^{d_0} \ p_l^{d-d_0-1} \left(1 - q_l^2\right)^{d_0+1} \tag{4.8}$$ Using combinatorial arguments which are omitted here, we obtain $$S_{i,d,d_0} = \begin{pmatrix} 2 \\ d_k \end{pmatrix} 2^{d_1 + \dots + d_{k-1} + 2d_k - i} \prod_{n=1}^k \begin{pmatrix} d_n \\ d_{n-1} - d_n \end{pmatrix} \times \frac{weight(d_0/d_1, \dots, d_k)}{\sum_{\delta} weight(\delta/d_1, \dots, d_k)}$$ where $weight(\delta/d_1, \dots, d_k) = \sum_{x_1, x_2} \begin{pmatrix} d_1 - d_2 \\ x_2 \end{pmatrix} \begin{pmatrix} 2d_2 - d_1 \\ x_1 \end{pmatrix}$ $$\times \begin{pmatrix} d_1 - d_2 - x_2 \\ 2d_1 - i - x_1 - 2x_2 \end{pmatrix} 2^{2d_1 - i - x_1 - 2x_2} \begin{pmatrix} 2d_2 - \delta - x_1 \\ 2d_2 - \delta - x_1 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix}^{x_2}$$ (4.10) Substituting W(1),...,W(2N) into (4.6) and then multiplying by N yields $N_r$ . $N_m$ is obtained similarly by interchanging $p_r$ and $p_m$ . ## V. Numerical Results previously analyzed networks. The bandwidth of the two systems of size In this section we present some numerical comparisons between the two to the sharing of the redundant paths by other processor-memory pairs does not increase its bandwidth over that of a non-redundant network, due between every pair of processor-memory that the ESC network provides, if there are no faulty elements (case (1) in Fig. 3). The additional path this comparison is that both networks have exactly the same bandwidth are depicted in Fig. 3. A very important conclusion that can be drawn from for three different sets of values of the probabilities $p_i$ , $p_r$ and $p_m$ . The results $16 \times 16$ has been calculated as a function of $p_a$ (the probability of request), as is evident from Fig. 3, the advantage of the ESC network over the nonin the ESC network reduce the effect of faulty links on the bandwidth. in bandwidth in the presence of faulty elements. Here, the redundant paths redundant one increases as the probability $p_l$ of a fault-free link decreases. The situation is different when faulty elements are present in the system (cases (ii) and (iii) in Fig. 3). The ESC network shows a smaller reduction a function of $p_l$ for the two systems (for three cases similar to those in Fig. 3). This figure shows that the ESC network is less sensitive to link faults non-redundant one and both systems produced the same $N_r$ curves. the latter were removed, the ESC network showed no advantage over the from the effect of the additional multiplexers and demultiplexers. connectivity. We have also tried to separate the effect of the extra stage than the non-redundant one when $N_r$ is employed as a measure for system Fig. 4 depicts the average number of fault-free connected processors $N_r$ as larger number of fault-free memories. fault-free processor in the ESC network is connected (on the average) to a the ESC network than in the non-redundant system, but in addition, each conclude that not only is the number of connected processors $N_r$ larger in redundancy over the non-redundant one. Combining Figures 4 and 5 we can same three cases. Here again we can see the advantage of the network with in Fig. 4, this connectivity measure is shown as a function of pt for the of fault-free connected processor-memory pairs is depicted in Fig. The other measure for system connectivity, i.e., C - the average number ## VI. Conclusions in this paper. The first is a non-redundant network and the second is the connection network in the presence of faulty elements has been analyzed The performance of two multi-processor systems with a multi-stage inter-Extra Stage Cube network which was selected as an example for a network system have been suggested as measures and used to analyze and compare the performance of these two systems. with redundancy. The bandwidth and connectivity of the multi-processing the performance of these architectures in the presence of faulty elements. It stage networks. Such an analysis will allow a more accurate comparison of can be applied to other schemes for incorporating redundancy into multican also suggest ways to develop new fault-tolerant architectures. The approach to performance analysis which was introduced in this paper, ### VII. References - Ξ G. B. Adams and H. J. Siegel, "The Extra Stage Cube: Tolerant Interconnection Network for Supersystems," IEEE Trans. on Computers, Vol. C-31, May 1982, pp. 443-454. A Fault- - <u>2</u> G. B. Adams, D. P. Agrawal and H. J. Siegel, "A Survey and Computer, Vol. 20, June 1987, pp. 14-27. parison of Fault-Tolerant Multistage Interconnection Networks," Com- - ઢ M. Jeng and H. J. Siegel, "A Fault-Tolerant Multistage Interconnection Network for Multiprocessor Systems using Dynamic Redundancy," Proc. of the 1986 Symp. on Distributed Computing Systems, pp. 70-77. - <u>4</u> I. Koren and D.K. Pradhan, "Modeling the Effect of Redundancy on Vol. C-36, March 1987, pp.344-355. Yield and Performance of VLSI Systems," IEEE Trans. on Computers, - 5 V. P. Kumar and S. M. Reddy, "Augmented Shuffle-Exchange Multistage Interconnection Networks," Computer, Vol. 20, June 1987, pp. - 6 K. Padmanabhan and D. H. Lawrie, "A Class of Redundant Path Mul-C-32, Dec. 1983, pp. 1099-1108. tistage Interconnection Networks," IEEE Trans. on Computers, Vol. - 三 J. H. Patel, "Performance of Processor-Memory Interconnection for Multiprocessors," IEEE Trans. on Computers, Vol. C-30, Oct. 1981, - **©** N. Tzeng, P. Yew and C. Zhu, "A Fault-Tolerant Scheme for Multistage Arch., June 1985, pp. 368-375. Interconnection Networks," Proc. of the 12-th Annual Symp. on Comp. Figure 3: The bandwidth of two $16 \times 16$ networks as a function of $p_a$ for (i) $p_l = p_r = p_m = 1$ , (ii) $p_l = 0.95$ , $p_r = 0.85$ , $p_m = 0.9$ and (iii) $p_l = 0.9$ , $p_r = 0.75$ , $p_m = 0.8$ . $16 \times 16$ networks as a function of $p_l$ for (i) $p_r = p_m = 1$ , (ii) $p_r = 0.85$ , $p_m = 0.9$ and (iii) $p_r = 0.75$ , $p_m = 0.8$ . Figure 4: The average number of connected fault-free processors $N_r$ for two Figure 5: (ii) $p_r = 0.85$ , $p_m = 0.9$ and (iii) $p_r = 0.75$ , $p_m = 0.8$ . pairs for two 16 × 16 networks as a function of $p_l$ for (i) $p_r = p_m = 1$ , The average number of connected fault-free processor-memory