2026年弥合数字与物理鸿沟:大语言模型智能体台式DNA获取能力评估报告(英文版).docxVIP

  • 0
  • 0
  • 约16.41万字
  • 约 148页
  • 2026-02-12 发布于湖南
  • 举报

2026年弥合数字与物理鸿沟:大语言模型智能体台式DNA获取能力评估报告(英文版).docx

BridgingtheDigitaltoPhysicalDivide

EvaluatingLLMAgentsonBenchtopDNAAcquisition

Summary

iv

Therehavebeenstrikingdevelopmentsoverthepastseveralyearsintheperformanceofartificialintelligence(AI)andlargelanguagemodels(LLMs)inparticular.Thishasdrivenaneedformorerobustevaluationsofmodelcapabilities,particularlythoserelevanttosecurity-criticaldomains.OneimportantclassofcapabilityevaluationsexaminesLLMagentsthat

interactwiththeirenvironment,testingtheirabilitytoapplyinformationsituationallyandactautonomously.Inthisreport,wedescribethedesignandexecutionofanagentevaluation

designedtoprobecapabilitiesinappliedmolecularbiology.

BenchtopDNAsynthesisisrelevanttothreatmodelsinwhichamaliciousactorattemptstocreateaviralpathogen.1Wefocusourevaluationonthistask,presentingouragentswitha

simulatedinterfacetoabenchtopDNAsynthesizer.Wethenprompttheagentstoassista

hypotheticalnon-expertuserincreatingoneoftwoDNAsequencesofinterest:anenhancedgreenfluorescentprotein(eGFP)andasequenceencodinganinfluenzahemagglutinin(HA)protein.AgentsarealsoaskedtogenerateaprotocolforhandlingthesynthesizedDNAinalaboratorysetting.

WeoriginallytestedasetoffivemodelsthatwereatornearthefrontierofLLMcapability

inmid-2025:GPT-4.1ando3fromOpenAI,ClaudeSonnet4andOpus4fromAnthropic,and

Gemini2.5ProfromGoogle.ImmediatelyafteritsreleaseinAugust2025,weaddedGPT-5to

ourmodelset,thentestedClaudeOpus4.5andGemini3ProinJanuary2026aswell.AllLLMsweretestedwithinReActagentscaffolds,andthecoreevaluationwasimplementedinthe

InspectAIlibrary,aframeworkforLLMevaluationsdevelopedbytheUKAISecurityInstitute.2

Thekeyfindingsfromourtestinginclude:

?Themostrecentmodelswetested,GPT-5,Opus4.5,andGemini3Pro,reliably

designeGFPDNAsegmentsthats

文档评论(0)

1亿VIP精品文档

相关文档