- 61
- 0
- 约 5页
- 2017-06-04 发布于湖北
- 举报
第 1 卷 第3期 集 成 技 术 Vol. 1 No. 3
2012年9月 JOURNAL OF INTEGRATION TECHNOLOGY Sep. 2012
高通量测序数据分析现状与挑战
张文力1,2
1 (中国科学院计算技术研究所 北京 100190)
2 (计算机体系结构国家重点实验室 北京 100190)
摘 要 基因是遗传的物质基础。生物体的生、长、病、老、死等一切生命现象都与基因有关。基因测序是解读生命的
一种途径。随着新一代高通量测序技术的发展,每天会产生TB甚至更多的序列数据。合理诠释这些大规模及复杂高维度
的数据成为获取数据后一个更大的难点,是当前生物研究的关键步骤,具有巨大的现实意义。海量高通量测序数据的存
储、处理和分析都极大地挑战着当前的计算机系统和计算模式。本文将结合调研情况,尤其是华大基因的实例调研,讨
论当前高通量测序数据分析的现状、问题和多方采取的措施。然而,面对高通量测序数据带来的挑战,仍需要多方密切
合作和长久深入的研究。
关键词 基因组;高通量测序;数据分析;云计算;工作流
Status and Challenges on Data Analysis of High Throughput Sequencing
ZHANG Wen-li1,2
1( Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China )
2( State Key Laboratory of Computer Architecture, Beijing 100190, China )
Abstract Gene is the genetic material basis. All life phenomena, like disease and death, are related to Gene. Gene
sequencing is a way to read life. With the development of new generation high-throughput sequencing technology, TB or
more sequence data will be generated daily. It’s more difficult to interpret these big and complex data than to acquire them.
Sequence data interpretation is a critical step in current biological research and has great practical significance. It’s a great
challenge for current computer systems and computing models to store, process and analysis massive high throughput
sequence data. With survey, especially from BGI (Beijing Genome Institute), the current status, problems and measures
taken to process high throughput sequence data will be discussed. However, the challenge is too big to be solved unless
原创力文档

文档评论(0)