DYNAMO vs. ADOREA Tale of Two Dynamic Optimizers.pptVIP

  • 4
  • 0
  • 约 37页
  • 2017-09-01 发布于浙江
  • 举报

DYNAMO vs. ADOREA Tale of Two Dynamic Optimizers.ppt

Performance of In-Thread Opt. (USIII+) Helper Thread Prefetching for Multi-Core Main thread Second core Prefetches initiated Cache miss avoided L2 Cache Miss time ? First Core Trigger to activate (About 65 cycles delay) Spin Waiting Spin again waiting for the next trigger Performance of Dynamic Helper Thread (on Sun UltraSparc IV+) Evaluation Environment for TLS Benchmarks SPEC2000 written in C, -O3 optimization Underlying architecture 4-core, chip-multiprocessor (CMP) speculation supported by coherence Simulator Superscalar with detailed memory model simulates communication latency models bandwidth and contention ? Detailed, cycle-accurate simulation C C P C P Interconnect C P C P * Dynamic Tuning for TLS * 1.17x 1.23x 1.37x Parallel Code Overhead Summary of ADORE ADORE uses Hardware Performance Monitoring (HPM) capability to implement a light weight runtime profiling system. Efficient profiling and phase detection is the key to the success of dynamic native binary optimizers. ADORE can speed up real-world large applications optimized by production compilers. ADORE works on two architectures: Itanium and SPARC. COBRA is a follow-up system of ADORE. It works on Itanium and x86. ADORE/COBRA can also optimize for multi-cores. ADORE has recently been applied to dynamic TLS tuning. Conclusion “It was the best of times, it was the worst of times…” -- opening line of “A Tale of Two Cities” best of times for research: new areas where innovations are needed worst of times for research: saturated area where technologies are matured or well-understood, hard to innovate, … * Morgan Kaufmann Publishers * Chapter 1 — Computer Abstractions and Technology * Today the wide deployment of multicore processors has brought forth large potentials of computing power. Potentially, it could lead to significant performance benefit. (click) Exploiting such potentials from the hardware demands thread-level parallelism of the application. * In many sequential applications

文档评论(0)

1亿VIP精品文档

相关文档