基于多特征的水平转移基因的预测分析-predictive analysis of horizontal transfer gene based on multi - features.docxVIP

  • 7
  • 0
  • 约5.53万字
  • 约 56页
  • 2018-05-18 发布于上海
  • 举报

基于多特征的水平转移基因的预测分析-predictive analysis of horizontal transfer gene based on multi - features.docx

基于多特征的水平转移基因的预测分析-predictive analysis of horizontal transfer gene based on multi - features

AbstractHorizontalgenetransfer(HGT,alsocalledlateralgenetransfer)isatransferofgeneticmaterialfromonelineagetoanotherotherthanoffspringandhasplayedakeyroleinspeciesevolutionandmicrobialgenomediversification.Transferscanoccurbothbetweencloselyanddistantlyrelatedspeciesorstrains,andarethoughttobefrequentevents.Amongstthesingle-celledorganisms,perhapsHGTisthedominantformofgenetictransfer.Inaddition,HGThasalsobeenproposedtoresultintheemergenceofnovelhumandiseasesandposesseveralriskstohumans.Assequencedatahasaccumulated,evidenceforrampantHGThasincreaseddramatically.Thus,detectingHGThasenormouspracticalsignificanceforprovidingabetterunderstandingoftheimpactofHGTongenomeevolutionandforidentifyingnewdrugtargets.Todate,anumberofcomputationalmethodsforhorizontallytransferredgenesfindinghavebeenproposedinthepastdecades,howevernoneofthemhasprovidedareliabledetectoryet.Atpresent,therearetwoprimarystrategiestodetecthorizontallytransferredgenes:phylogeneticapproachesandparametricapproaches,butphylogeneticapproachesaretime-consumingandinsufficientlyrobust.Inexistingparametricapproaches,onlyonesinglecompositionalpropertycanparticipateinthedetectionprocess,ortheresultsobtainedthrougheachsinglepropertywerejustsimplycombined.It’sknownthatdifferentpropertiesmaymeandifferentinformation,sothesinglepropertycan’tsufficientlycontaintheinformationencodedbygenesequences.Inaddition,theclassimbalanceprobleminthedatasetswhichalsoresultsingreaterrorsforthegenedetectionhasn’tbeenconsideredbythepublishedmethodsthatbasedonmachinelearning.Inlightofallthecaveats,inthisstudy,wehavedevelopedanewstrategy(Hgtident)whichusedsupportvectormachine(SVM)todetecthorizontallytransferredgenesbycombiningtheunusualpropertieseffectively,andimproveddetectionaccuracyeffectively.Hgtidentincludestheintroductionofmorerepresentativedatasets,optimizationofSVMmodel,featureselectionbasedongeneticalgorithm(GA),handlingofimbalanceprobleminthedatasetsandextensiveperformanceevaluationviasystematiccross-validationmethods.Throughfeat

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档