Pipeline Structure Modeling with M-Sim

 

In this lab we shall experiment with some of the pipeline structure models M-sim makes available for superscalar microprocessors.  Recall that one of M-Sim’s primary extensions to simplescalar is explicit, cycle-accurate modeling of key pipeline structures.  In particular, M-Sim models the Reorder buffer (ROB), the Issue Queue (IQ), the Load/Store Queue (LSQ), and separate integer and floating point register files.  The ROB ensures in order commitment of program instructions, despite out-of-order execution.  It is modeled as a FIFO buffer, as instructions are allocated at the tail and freed at the head when they are ultimately committed.  The IQ controls entry into the pipeline.  It is modeled as an array of entries, each with two states (free/allocated).  The LSQ is used to handle memory disambiguation which protects against RAW, WAW, and WAR hazards.  Integer and floating point register files contain physical register arrays, with stateful entries (free, allocated, allocated and written back, and architectural).

 

With regard to the structures of interest, the SMT process consists of issue queue selection, followed by register file access, the start of execution, entry into a per-thread load/store queue (for loads and stores), write back to register files after execution, and finally commitment from a per-thread ROB.

 

Part 1: Pipeline Structure Modeling: Issue BW

 

In this part, we shall see how M-Sim handles competition among threads for bandwidth in the issue queue.  By experimenting with the size of the issue BW, we will see how the Throughput IPC is impacted in the multi-program SMT experiment from the previous lab (perl and go).  The default issue BW is 4 IPC.

 

~/msim/msim_v2.0/$ ./sim-outorder –issue:width 4 perl-your_name.arg go-your_name.arg

 

Now fill out the following table:

 

 

Issue queue (BW)

2-threaded SMT Throughput IPC

1

 

2

 

4

 

8

 

16

 

 


Table 1: Multi-program SMT

 

Now answer the following questions:

 

1)      Why do you think the Throughput IPC flattens out after an issue BW of 4 inst/cycle? What may be an alternate bottleneck for IPC above this point?

2)    Does this indicate a lack of contention for Bandwidth above 4 inst/cycle?

 

 Part 2: Pipeline Structure Modeling: ROB Size

 

In this part we’ll consider the effect of ROB size on Throughput IPC for each of our 5 benchmarks.  The default value of 128 will be varied.

 

 

Benchmark

ROB Size: 4

ROB Size: 16

ROB

Size: 64

ROB

Size: 256

ROB

Size: 1024

anagram.

Alpha

 

 

 

 

 

go.alpha

 

 

 

 

 

compress95.alpha

 

 

 

 

 

cc1.alpha

 

 

 

 

 

perl.alpha

 

 

 

 

 

 


Table 2: Throughput IPC

 

Now answer the following questions:

 

1)      Why do you think the influence of ROB size on Throughput IPC increases dramatically towards the default value, then has little effect? Does this indicate a steady state buffer content size of narrow range?

2)    For which benchmark is the steady state ROB content size probably the smallest? For which is it the largest?

 

Part 3: Pipeline Structure Modeling: LSQ Size

 

In this part we’ll consider the effect of LSQ size on Throughput IPC for each of our 5 benchmarks.  The default value of 48 will be varied.

 

 

Benchmark

LSQ Size: 3

LSQ Size: 12

LSQ

Size: 48

LSQ

Size: 192

LSQ

Size: 768

anagram.

Alpha

 

 

 

 

 

go.alpha

 

 

 

 

 

compress95.alpha

 

 

 

 

 

cc1.alpha

 

 

 

 

 

perl.alpha

 

 

 

 

 

 


Table 2: Throughput IPC

 

Now answer the following questions:

 

1)      Why do you think the influence of LSQ size on Throughput IPC increases dramatically towards the default value, then has little effect? Does this indicate a steady state buffer content size of narrow range?

2)    Go contains about twice the percentage of loads/stores as cc1.  Can you come up with a plausible explanation for why cc1’s steady state LSQ content size could still be slightly greater?