High-throughput sequence alignment using Graphics Processing文档文档.pptVIP

  • 2
  • 0
  • 约7.35千字
  • 约 37页
  • 2018-02-27 发布于湖北
  • 举报

High-throughput sequence alignment using Graphics Processing文档文档.ppt

High-throughput sequence alignment using Graphics Processing文档文档

Suffix Trees Example: BANANA$ Searching for suffixes of ‘ANANA’ A $ NA NA 1 0 5 3 BANANA$ NA$ $ 2 4 NA$ $ * Suffix Trees Example: BANANA$ Searching for suffixes of ‘ANANA’ A $ NA NA 1 0 5 3 BANANA$ NA$ $ 2 4 NA$ $ * Suffix Trees Example: BANANA$ Searching for suffixes of ‘ANANA’ A $ NA NA 1 0 5 3 BANANA$ NA$ $ 2 4 NA$ $ * Suffix Trees Example: BANANA$ Searching for suffixes of ‘ANANA’ A $ NA NA 1 0 5 3 BANANA$ NA$ $ 2 4 NA$ $ * Suffix Trees Example: BANANA$ Searching for suffixes of ‘ANANA’ A $ NA NA 1 0 5 3 BANANA$ NA$ $ 2 4 NA$ $ * Memory Limitations Suffix trees take up a fair bit of memory GPUs have 100’s of MBs, but this is still small Divide the target sequence into ‘k’ segments with overlaps * Cache Optimisation Memory latency high, cache performance crucial We’re walking a tree here, not crunching numbers down an array Can store read-only data in 2D textures; nVidia caching scheme optimises access Re-order and squish tree nodes into ‘texel blocks’ such that: Nodes near root are level-ordered (BFS) Nodes further down are ordered with descendants * Cache Optimisation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 16 18 20 22 24 26 28 30 17 19 21 23 25 27 29 31 Texture cache organized in 2x2 blocks. Try to place all children of a node are in the same cache block Shamelessly cribbed from: /software/cmatch/FastExactStringMatching.ppt * Cache Optimisation Reference Sequence stored in 4x216 blocks of a 2D array Sequence: A B C D E F G H … ………. A E B F C G D H ………. α Φ β Χ Γ Ψ Δ Ω Why? It worked well. * Cache Optimisation Memory layouts heuristically determined nVidia cache details not public Cache optimisation improves execution speed ‘by several fold’. * Conclusions GPGPU isn’t just good for ‘arithmetic intensive’ applications 5-11x speed-up for NGS data * Conclusions Fine Print: 5-11x is for the Suffix Tree kernel on the GPU Reality is different! 3.5x speed-up for re

文档评论(0)

1亿VIP精品文档

相关文档