Design and evaluation of an optimistic CPU: the warp engine
Permanent link to Research Commons versionhttps://hdl.handle.net/10289/14898
Instruction pipelining, out-of-order execution, and branch prediction are techniques that improve performance in processors by manipulating the flow of instructions. These control flow manipulations alone are not adequate to allow large numbers of instructions to execute in parallel because performance is limited by accesses to the relatively slow memory system. Performance can be improved by speculating on the outcomes of control decisions, and the values of data in memory, returning results early. This thesis investigates the requirements of an architecture that speculates on control flow decisions and data values to improve performance through instruction level parallelism. A new architecture, the WarpEngine, that speculates on control flow decisions and data values is presented. This architecture is shown to have the potential to extract performance through parallelism an order of magnitude larger than that obtained by contemporary microprocessors. Control speculation is achieved using a novel tree-based mechanism that produces multiple flows of control. This scalable mechanism is shown to generate a large group of instructions that can execute in parallel. Also, it is essential that memory accesses are allowed to occur out of programmed order. This form of data speculation is shown to break false data dependencies, improving performance. The use of state saving resources is examined and the limitations of in-order retirement schemes are shown. These results indicate that the management of these resources is critical to obtaining good performance. Virtual ordered simulation is introduced as a new simulation methodology for modelling out-of-order and speculative architectures. This novel simulation technique is unique because each instruction is only inspected and processed once, and unlike other simulation methodologies unlimited resources can be modelled. Individual components can be constrained in isolation so that their effect on performance can be examined in detail. Investigations performed assuming unbounded resources provide new insight into the limits imposed by individual processor components. The architecture presented shows potential for performance well beyond that of contemporary and research architectures. The insights into the limitations of processor components apply to many computer architectures.
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
- Higher Degree Theses