- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
基于Goldschmidt算法高性能双精度浮点除法器设计
基于Goldschmidt算法高性能双精度浮点除法器设计
摘要:针对双精度浮点除法通常运算过程复杂、延时较大这一问题,提出一种基于Goldschmidt算法设计支持IEEE754标准的高性能双精度浮点除法器方法。首先,分析Goldschmidt算法运算除法的过程以及迭代运算产生的误差;然后,提出了控制误差的方法;其次,采用了较节约面积的双查找表法确定迭代初值,迭代单元采用并行乘法器结构以提高迭代速度;最后,合理划分流水站,控制迭代过程使浮点除法可以流水执行,从而进一步提高除法器运算速率。实验结果表明,在40nm工艺下,双精度浮点除法器采用14位迭代初值流水结构,其综合cell面积为84902.2618μm2,运行频率可达2.2GHz;相比采用8位迭代初值流水结构运算速度提高了32.73%,面积增加了5.05%;计算一条双精度浮点除法的延迟为12个时钟周期,流水执行时,单条除法平均延迟为3个时钟周期,与其他处理器中基于SRT算法实现的双精度浮点除法器相比,数据吞吐率提高了3~7倍;与其他处理器中基于Goldschmidt算法实现的双精度浮点除法器相比,数据吞吐率提高了2~3倍。
关键词:浮点除法器;Goldschmidt算法;倒数查找表;高性能除法器;数字信号处理器
中图分类号: TP301.6; TP342.2 文献标志码:A
Abstract: Focusing on the issue that division is complex and needs a large delay to compute, a kind of method for designing the unit of highperformance double precision floating point divider based on Goldschmidts algorithm was proposed and it supported IEEE754 standard. Firstly, it was analyzed that how to compute division using Goldschmidts algorithm and the error produced during iterative operation. Then, the method for controlling error was proposed. Secondly, bipartite reciprocal tables were adopted to calculate initial value of iteration with area saving, and parallel multipliers were adopted in the iterative unit for accelerating. Lastly, the executed station was divided reasonably and it made floating point divider supporting pipeline execution with state machine controlling. So, the speed of divider was improved. The experimental results show that the double precision floating point divider adopted 14bit iterative initial value pipeline structure, its synthesis cell area is 84902.2618μm2, the running frequency is up to 2.2GHz with 40nm technology. Compared with 8bit iterative initial value pipeline structure, computing speed is increased by 32.73% and area is increased by 5.05%. The delay of a double precision floating division instruction is 12 cycles, and it is decreased to 3 cycles in pipeline execution. Compared with the divider based on SRT algorithm implemented in other proces
原创力文档


文档评论(0)