Multithreaded Simulation with M-Sim

Multi-threaded Simulation with M-Sim

In this lab we shall experiment with M-sim’s multithreading capability. Recall that M-Sim is capable of executing multiple threads concurrently, according to the SMT model. Simultaneous multithreading (SMT) is an increasingly popular technique for increasing the throughput of superscalar processors. While superscalar processors execute multiple instructions concurrently, implementing multithreading in addition allows the processor to execute multiple instructions from multiple threads concurrently. This practice is defined as SMT. It allows sharing of several of the key datapath components between multiple threads. For example, a shared issue queue creates the possibility of more efficient execution, since a new instruction can enter the queue if at least one thread is capable of issuing an instruction, rather than relying on a single thread for issue. This can be useful in chat applications or other instances where multiple, asynchronous communications are occurring simultaneously. Many modern microprocessors implement SMT, including Intel’s Pentium 4, the latest MIPS architecture, and IBM’s Power 5. See [1] for more information.

Part 1: Single Threaded Execution vs. Multi-threaded execution: Single Instance vs. Multiple Instances

In this section, we will perform single threaded, 2-threaded SMT, and 4-threaded SMT for each of the 5 benchmarks from the previous exercise. That is, we will run one instance of the benchmark, then two threads of the same benchmark, then 4 threads of the same benchmark.As an example sequence, run the following sets of commands for the anagram benchmark, recording the Throughput IPC (the very last statistic in the output – We won’t bother saving to a ‘results’ file) in Table 1.

~/msim/msim_v2.0/$ ./sim-outorder anagram-your_name.arg

~/msim/msim_v2.0/$ ./sim-outorder anagram-your_name.arg anagram-your_name.arg

IMPORTANT NOTE: You may run out of architectural registers when trying to run multithreaded simulations. M-sim sets the total number of registers using "-rf:size num_phy_regs" option. The num_phy_regs value includes the number of architectural registers (num_arch_regs) and the number of renaming registers (num_rename_regs). The value of num_rename_regs must be greater than 0 in order to do register renaming. That is to say, num_rename_regs = num_phy_regs - num_arch_regs*thread_num must be greater than 0. While running 4 threads under default values, the usable renaming registers is calculated as num_rename_regs (128 - 32*4=0) since the default num_phy_regs is only 128.

For the benchmarks we’ve been using, 512 will be enough in every case, though you should perform the calculation above to verify this yourself. As an example, for the 4-threaded anagram example, execute:

~/msim/msim_v2.0/$ ./sim-outorder –rf:size 512 anagram-your_name.arg anagram-your_name.arg anagram-your_name.arg anagram-your_name.arg

Now fill out the following table:

Benchmark	Single Thread	2-threaded SMT	4-threaded SMT
anagram. alpha
go.alpha
compress95.alpha
cc1.alpha
perl.alpha

Table 1: Multi-instance SMT

Now answer the following questions:

1) From the information in Table 1, what is the general trend in Throughput IPC as multi-instance SMT is performed? Why do you think this might be?

2) Can you imagine a program for which 4-threaded SMT might outperform 2-threaded SMT? Do you think these types of programs are the majority?

3) Given your observations, why do you think the Intel Pentium 4 and IBM POWER 5 processors use 2-threaded SMT?

Part 2: Single Threaded Execution vs. Multi-threaded execution: Concurrent Programs of Different Type

In this section, we’ll perform an SMT simulation using two different programs. There can be compatibility issues between programs that may make some combinations of programs crash, and using the benchmarks we’ve been working with, perl and go (in that order) are two that have been demonstrated to run together.

Execute the following command:

~/msim/msim_v2.0/$ ./sim-outorder perl-your_name.arg go-your_name.arg

Now fill out Table 2:

Benchmark	2-threaded SMT
perl.alpha & go.alpha
go.alpha & go.alpha
perl.alpha & perl.alpha

Table 2: Multi-program SMT

Now answer the following question:

1) What does the information in Table 2 demonstrate about resource contention among threads?

References:

1) http://en.wikipedia.org/wiki/Simultaneous_multithreading