并行文件系统性能异常的问题诊断方法分析-analysis of problem diagnosis method for performance abnormality of parallel file system.docxVIP

  • 4
  • 0
  • 约3.58万字
  • 约 46页
  • 2018-06-28 发布于上海
  • 举报

并行文件系统性能异常的问题诊断方法分析-analysis of problem diagnosis method for performance abnormality of parallel file system.docx

并行文件系统性能异常的问题诊断方法分析-analysis of problem diagnosis method for performance abnormality of parallel file system

AbstractParallel file system can experience performance anomaly problems that can be hard to diagnose an isolate. Often, the most interesting and trickiest problems to trace are not necessarily the outright crash (fail-stop) failures, but rather those that result in a “limping-but-alive” system, i.e., the system continues to operate, but with degraded performance. Targeting the “limping-but-alive” problem diagnosis in parallel file systems used for high performance cluster computing (HPC), puting forward a black-box method.By observing the behavior of stripe-based parallel file systems, find that they have some characteristics in common. Under a performance anomaly in cluster, performance metrics exhibit observable anomalous behavior on the culprit servers. Base on that, puting forward a diagnosis method by peer-comparison. This approach uses the Kullback-Leibler (Kl) divergence to compare the performance metrics of all the servers, then indicting the faulty node. By further analysis, it can also find the root-cause.The method diagnoses different performance problems by identifying, gathering and analyzing OS-level, back-box performance metrics on every node in the cluster. Using peer-comparison diagnosis approach compares the statistical attributes of these metrics across I/O servers, to identify the anomalous node. This method avoids any modification to the source codes by being transparent to applications. This approach works commonly across stripe-based parallel file system. And the approach has good accuracy.At last, the approach is demonstrated by injecting performance problems into the file system in both Capfs and Lustre clusters.Keywords: parellel file system, performance anomaly problems diagnosed, metrics collected, peer-comparison目录摘要............................................................................................................ IAbstract.......................................................................................................

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档