分布式系统中容错技术摘要.pptVIP

下载本文档

24
0
约1.93万字
约 66页
2016-03-02 发布于湖北
举报
版权申诉

分布式系统中容错技术摘要.ppt

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

Lecture 7- 第 7 章分布式系统中容错技术一台计算机由各种各样的硬件和软件组成，这些部件时不时地会出现故障或错误，导致死机或运行失败。这些故障或错误往往是随机出现的，计算机用户无法预料这些情况的出现，有时甚至察觉不到错误的出现。如果一个计算机系统能够对非预期的软件/硬件故障有适当的对策和应变措施，则我们说这个系统具备一定的容错(Fault tolerance) 能力。分布式系统的特殊之处在于故障的局部化，即系统的某个(些) 局部成份出现故障，这种故障可能会影响到系统的局部功能，而对系统的其它部分毫无影响。本章讨论内容包括容错处理的基本概念、要求和模型，如何实现可靠的通信，以及当发现故障时如何排除并恢复运行。当系统不能提供所承诺的服务时就认为系统失效一个系统在正常工作时会在若干种运行状态之间变迁，一旦出现异常，则该系统进入错误(Error) 状态。一个系统的错误状态可能是导致系统失效的原因造成错误的原因称为故障。 Passive (Primary-Backup) Replication Request Communication: the request is issued to the primary RM and carries a unique request id. Coordination: Primary takes requests atomically, in order, checks id (resends response if not new id.) Execution: Primary executes stores the response Agreement: If update, primary sends updated state/result, req-id and response to all backup RMs. Response: primary sends to the front end Fault Tolerance in Passive Replication The system implements linearizability, since the primary sequences operations in order. If the primary fails, a backup becomes primary by leader election, and the replica managers that survive agree on which operations had been performed at the point when the new primary takes over. The above requirement is met if the replica managers (primary and backups) are organized as a group and if the primary uses view-synchronous group communication to send updates to backups. The system remains linearizable after the primary crashes Active Replication Request Communication: The request contains a unique identifier and is multicast to all by a reliable totally ordered multicast. Coordination: Group comm. ensures that requests are delivered to each RM in the same order (but may be at different times!). Execution: Each replica executes the request. (Correct replicas return same result since they are running the same program, i.e., they are replicated protocols or replicated state machines) Agreement: No agreement phase is needed, because of multicast delivery semantics of requests Resp