Compiling using SimpleScalar

This exercise involves writing C/C++ programs and using PISA compiler to compile some programs which then can be used to benchmark computer systems. A benchmark, broadly speaking, is any program whose purpose is to measure a performance characteristic of a computer system. In this part we will concentrate on the performance of a CPU, using measures such as the number of floating point operations per second (FLOPS), the number of integer operations per second (IOPS) and the number of instructions per second (IPS). It is important to keep in mind that even though these measures are primarily related to the CPU performance other factors such as main memory speed, can influence the results we obtain. This is related to the exact nature of the program we are using.

Click here  PISA_compiler_installation.doc to view installation procedure on configuring PISA Compiler. The files needed can be downloaded by clicking here simpletools-2v0.tar    simpleutils-2v0.tar    gcc-2.7.2.3.ss.tar.gz

You will begin by writing a number of simple C language programs. The main performance indicator we will need is the execution time of our programs. On modern computers you will find that most tasks take a very short time and therefore cannot be measured accurately enough for our purposes. To overcome this we will often need to repeat the task we are using for benchmarking many times.

 

1)      Write a hello world program in C. Make sure that you embed the hello world print statement after an empty for loop which goes from 1 to 1000,0000 (one million). Now Compile using home/simplescalar/bin/sslittle-na-sstrix-gcc. The command format is

     sslittle-na-sstrix-gcc -o hello hello.c

Then use sim-safe to simulate the executable file which is hello. You should execute sim-safe to save the simulation output as well as the program output to collect the statistics (simulation elapsed time and total number of instructions) and make sure that the program executes correctly i.e., no errors.

 Report the total number of instructions executed and the total elapsed time to execute the hello world program. Fill the table1 below.

 Now compile the above program by turning optimization on. Use O2 flag in the above command. Now the command format will look

sslittle-na-sstrix-gcc -o O2 hello_O2 hello.c

 (Make sure that you dont overwrite the unoptimized version by given executable the new name which in this           case is hello_O2)

 Now use sim-safe to simulate the executable file which is hello_O2. You should execute sim-safe to save the simulation output as well as the program output to collect the statistics (elapsed time and total number of instructions) and make sure that the program executes correctly i.e., no errors.

 Report the total number of instructions executed and the total elapsed time to execute the hello world program with O2 optimization. Fill the table1 below.

 Now repeat the above procedure using O3 flag.  Report the total number of instructions executed and the total elapsed time to execute the hello world program with O3 optimization. Fill the table1 b.

 

Hello

Total # of instructions

Total Elapsed time

No optimization

 

 

-O2

 

 

-O3

 

 


 Did you observe any difference between the unoptimized version and O2 version of hello world?

 Did you observe any difference between the unoptimized version and O3 version of hello world?

 Did you observe any difference between O2 and O3 version of hello world?

Briefly describe what does O2 and O3 do? (Hint: search google using the keywords compiler optimization levels )

Plot a histogram showing different levels of optimization.

2)      Now we will create two very simple synthetic benchmarks. For the first benchmark (call it test1.c) write the following arithmetic expression in your loop  a = b + c * d /e f.  The loop goes from 1 to 1000,0000 (one million). Initialize a,b,c,d,e,f as integers with values of b,c,d,e and f as 2,3,10,5,8 respectively.

Then use sim-safe to simulate the executable file. You should execute sim-safe to save the simulation output as well as the program output to collect the statistics (elapsed time and total number of instructions) and make sure that the program executes correctly i.e., no errors.

Report the total number of instructions executed and the total elapsed time to execute test1. Fill table2.

Now use sim-profile to to simulate the executable file. You should execute sim-profile to save the simulation output as well as the program output to collect the statistics (Integer operations) and make sure that the program executes correctly i.e., no errors.

Report the total number of Integer operations IOPS for test1. Fill table2.

Repeat the above procedure using O2 optimization. Fill table2.

Repeat the above procedure using O3 optimization. Fill table2

                   

Test1

Total # of instructions

Total Elapsed time

Integer  operations

%

Floating  operations

%

No optimization

 

 

 

 

-O2

 

 

 

 

-O3

 

 

 

 

 

 Did you observe any difference between the unoptimized version and O2 version of test1?

 Did you observe any difference between the unoptimized version and O3 version of test1?

 Did you observe any difference between O2 and O3 version of test1?

 Plot a histogram showing different levels of optimization.

    3)  For the second program (call it test2.c) change the integer arithmetic to floating point arithmetic (you will really only need to change variable declarations from int to float). Note that you will not be able to use the first program to obtain FLOPS (floating point operations).

Repeat all what you did for test1.c except that now you should report floating point instructions per sec FLOPS instead of IOPS. Fill table3

Repeat the above procedure using O2 optimization. Fill table3.

         Repeat the above procedure using O3 optimization. Fill table3.

              

Test2

Total # of instructions

Total Elapsed time

Floating  operations

%

Integer  operations

%

No optimization

 

 

 

 

-O2

 

 

 

 

-O3

 

 

 

 

            
  
         Did you observe any difference between the unoptimized version and O2 version of test2?

Did you observe any difference between the unoptimized version and O3 version of test2?

          Did you observe any difference between O2 and O3 version of test2?

           Plot a histogram showing different levels of optimization.