ECEn 424 Homework Set #7
Upload a pdf file containing your solutions to the problems below to
Learning Suite before 11:00pm on the assigned date.
- Problem 6.24 from the text
- Problem 424-4:
Chapter 5 presents a series of performance measurements showing
the performance benefits of a sequence of optimizations to the
original combine1() function. For this problem, you will repeat that
sequence of optimizations starting with this function:
void dotproduct1(vec_ptr u, vec_ptr v, data_t *dest)
{
long int i;
*dest = 1.0;
for (i = 0; i < vec_length(u); i++)
{
data_t val1;
data_t val2;
get_vec_element(u, i, &val1);
get_vec_element(v, i, &val2);
*dest = *dest + val1 * val2;
}
}
Start with with this tar file
that includes dotproduct1() and a version of the getcpe timing code
used in the previous homework set. Compiling and running the initial
getcpe program should give you the CPE for the original code.
Consistent with the treatment in the text, you should create a
different version of the function for each of the six required
optimizations listed below. Follow the naming conventions of the
test -- dotproduct5() should be the version with 2x loop
unrolling. For this assignment, you need only consider the cose
where data_t is a double.
- 2. Move the call to vec_length out of the loop
- 3. Directly access the vector data
- 4. Accumulate results in a local variable
- 5. Unroll the loop by 2
- 6. Unroll the loop by 2 with 2-way parallelism
- 7. Unroll the loop by 2 and reassociate
Your submission should include the C source code for all 6 new
versions of the dotproduct function along with the reported CPE of
each. (No other source code needs to be submitted.)
After you obtain all 7 required CPE measurements, you should write a
paragraph comparing your results with those in the book and, where
possible, explaining the differences. Finally, state what can be
inferred from your results about the functional units in the
processor of the system you used. (Can you determine both latency
and throughput bounds, for example?)
- Problem 424-5:
You are to determine the branch misprediction penalty for at
least one machine that you have access to. The technique employed
is described on pages 215-216 of the text. The absdiff() and
measurement code in this tar
file will serve as a good starting point. First, you should
write a short paragraph describing the technique used to measure the
branch misprediction penalty. (Study the code and read the book.)
Secondly, report the penalty you calculated and identify the
platform you obtained it on. Finally, find a compiler option that
generates code for the absdiff() function that uses a conditional
move instruction rather than a branch. (Compile getbrpen.c with the
-S option and examine the code generated for absdiff().) Report the
optimization level you used and results from running the brnchpen
program with this version of absdiff(). How much is performance
improved when the conditional move instruction is used?
Clarifications
Problem 6.24: No programming or source code is required.
Problem 424-4: You may or may not see performance boosts at every
step. Make a few runs for each data-point and discuss your results in
your submission. Once you have the file dotprod.tar, place it in your
directory of choice. From within that directory, typing "tar xvf *"
will create a new "dotprod" subdirectory with all the files required
to build the getcpe executable. Typing "make" within that subdirectory
should produce an executable named getcpe. It should compile correctly
under Linux or Mac OS X. The initial version will measure the
performance of the dotproduct1(). As you add each new version of the
function to getcpe.c, you'll need to change the measurement code (in measure()
in getcpe.c) so that the new version of your function is called.
Problem 424-5: Copy the tar file to your working directory, and type "tar
xvf *" to create a "brnchpen" subdirectory, and type "make" to produce
a "brnchpen" executable. It should compile correctly under Linux or on
a Mac. You are strongly encouraged to try this measurement on systems
with different processors -- you might be surprised how different the
results can be.
Last updated 16 January 2018