Uniprocessor CheckPointing.

CheckPointing

Two common forms of time redundancy in processors consist repeating an instruction or a section of a program which has failed. These techniques require relatively few hardware and software resources and have proved to be very efficient against transient faults.

Checkpointing consist of saving the process state every so often. If the process fails, it can be started from the last checkpoint, rather than having to go back to the very beginning. Here, the issue is where to place the checkpoints along the execution path, so that some measure of the performance is optimized.

Running the Applet

User will need to fill textboxes, shown in orange color.

First textbox is "Probability that fault is permanent". This is defined as S in the formulas shown below.. As we know failure may be either transient or permanent. It is a fact that
Prob(failure is transient) + Prob(failure is permanent) = 1.

Second is : "Average Number of Instructions per program". This is defined as 'L'

Third is : "Required setup time, when a permanent failure happens." This is defined as delta1.

Fourth is : "Required Diagnosis and repair time when a permanent failure happens". This is defines as 'delta2'

Second ScrollPane is also an input field which consist of each instructions failure, recover rates. Also time and frequnce values. Key point here is total frequnce for all instructions should be 1.

Finally output :
Outputs are in an order way: P ^{( C )}, P ^{(RB1 )}, P ^{(RB2 )}and P ^{(PF )} , tau1, tau2 that we tried to formulate below.

Some other variables that have been defined in the formulas are :

W       --> Mean time spent for one instruction . This can be found by addition of each instructions time value multiplied by its frequence.
M        --> Number of instructions between two checkpoints. In this java applet it is limited by at most 10. In real it may be 150 or more.
delta1 --> Required setup time, which is a user input
delta2 --> Required diagnosis and repair time., which was an   user input as well.
f          --> Frequency of a specific type of instruction.
t         -->   Time cost of a specific type of instruction.
tau1    -->   Required time for one rollback run.
tau2    -->   Required time for two rollback run.

The aim of this applet is finding probability measures for a process, which may consist of at most 10 different instructions. If we define these mutually exclusive events occuring during the execution of an instruction:

H ^{( C )}- The instruction is completed successfully when first executed.
H ^{( RB1 )}- The instruction fails and the failure is identified; the instruction is completed successfully after the program rollback to the last checkpoint.
H ^{( RB2 )}- The instruction rolledback, but had a second failure, rollbacked to the same checkpoint for the second time.
H ^{(PF )}- The instruction fails and the failure is identified but the program rollback fails, resulting in a program failure after which the program is reloaded and restarted.

We denote P ^{( C )}, P ^{(RB1 )}, P ^{(RB2 )}and P ^{(PF )} the probabilities of events H ^{( C )}, H ^{(RB1 )}, H ^{(RB2 )}and H ^{(PF )}.

These probabilities should satisfy: P ^{( C )}+ P ^{(RB1 )}+ P ^{(RB2 )}+ P ^{(PF )}= 1.

We denote P₀(lambda, t) = e ^{(-lambda * t );}
We denote P₀₀(lambda, µ , t) = µ / ( µ+lambda ) +( lambda / ( µ+lambda ) ) * e ^{( (-lambda+µ) * t )}
We denote P¯₀₀(lambda, µ, t₁, t) = P₀₀(lambda, µ , t) - e ^{(-lambda * t₁
)}* P₀₀(lambda, µ , t-t₁)

Let P_i ^{( C )}, P_i^(RB1
), P_i ^{(RB2 )}and P_i^(PF
)denote the conditional probabilities of the above events, given that the instruction is of type of i. Once these conditional probabilities are calculated, the unconditional probabilities are obtained by averaging over i with respect to f₁, ....f_N

                       P ^{(J )}= (    f₁ * P₁^(J
)    +     f₂ * P₂^(J
)    +........+    f_N * P_N^{(J )})   where (j = C,RB1,RB2,PF)

We proceed now to calculate these conditional probabilities. First:

P_i ^{( C )}= P₀(lambda_i, T _i) = e ^{(- lambda_i
* T_i )}
P_i ^{(RB1 )}= P¯₀₀( (1-S) * lambda_i, µ_i, t_i, t_i) * P₀(S*lambda_i, T _i) * (P_i^{(
C )} / M) * ( ( (1 -P^C)^M)/(1 -P^C) )
P_i ^{(RB2 )}= P¯₀₀( (1-S) * lambda_i, µ_i, t_i, t_i) * P₀(S*lambda_i, T _i) * P_i^(RB1
)
P_i ^{(PF )}= 1 - P_i^{(
C )}- P_i ^{( RB1 )}- P_i ^(RB2
)

If we denote T¬ =( f₁*T₁ + f₂* T₂ +...+ f_N* T_N)

Mean time required to successfully execute an instruction with a one rollback will be :

tau1 = T¬ + P ^{(RB1 )}* (delta1 + ((M+1)/2) * (T¬) ) + P ^{(PF )}* (delta1 + ((M+1)/2) * (T¬) + delta2 +(L+1)*(W/2) )

Mean time required to successfully execute an instruction with two rollback will be :

tau2 = T¬ + P ^{(RB1 )}* (delta1 + ((M+1)/2) * (T¬) )+ P ^{(RB2 )}* ( (2* delta1) + (M+1)* (T¬) )+ P ^{(PF )}* ( (2*delta1)+(M+1)* (T¬) + delta2 + (L+1)*(W/2) )