ECEn 424 Review Topics: Chapter 5

What optimization blockers are identified in the text, what form can they take in C code, and how does each limit the ability of the compiler to optimize?
What measure of performance is used in the text as a guide in improving code? Why is that measure appropriate and how is it computed?
What kind of optimizations are applied in the running example of "combine" functions in Chapter 5, and what kind of performance improvements result from each?
Modern CPUs:
- What is instruction-level parallelism?
- What is different about a superscalar processor?
- What is done out of order in an out-of-order processor, and what hardware is required to support this complexity?
- What does speculative execution refer to, how is it achieved, and what are its advantages and disadvantages?
- What is retirement and how is it related to speculative execution?
- What is register renaming and why is it desirable?
- Given a short loop (in C or assembly) and details about available resources and functional-unit latencies, can you produce the corresponding data-flow graph and identify its critical path? (In other words, can you predict the CPE we would see for that code on a modern Intel CPU?)
What is loop unrolling and what are the tradeoffs and the limitations of its use?
How does the use of multiple accumulators lead to increased performance, and what are the limitations of this approach?
What is a reassociation transformation and how can it result in improved performance?
In terms of CPE, what is the very best performance that was achieved on the combine functions in Chapter 5, and what techniques were used to obtain that speedup?
What is register spilling and how would you recognize it in assembly code?
What can a programmer do in the source code to increase the likelihood that the compiler will use conditional moves rather than conditional branches?
How can the load latency be measured, and how long is it for processors described in the book?
What are store buffers and why do they exist? What special concerns arise for stores in the context of speculative execution?
Why might a dependence between a load and a recent store have a negative impact on performance?
What basic strategies should we follow in optimizing program performance?
How does a profiler such as GPROF obtain information about program information, and how can we use such a tool to guide optimization? How accurate is the information GPROF provides?
Self test: Can you complete all practice problems in Chapter 5?

Updated for 3rd Edition of CS:APP