2026年弥合数字与物理鸿沟：大语言模型智能体台式DNA获取能力评估报告（英文版）.docxVIP

下载本文档

0
0
约16.41万字
约 148页
2026-02-12 发布于湖南
举报

2026年弥合数字与物理鸿沟：大语言模型智能体台式DNA获取能力评估报告（英文版）.docx

BridgingtheDigitaltoPhysicalDivide

EvaluatingLLMAgentsonBenchtopDNAAcquisition

Summary

Therehavebeenstrikingdevelopmentsoverthepastseveralyearsintheperformanceofartificialintelligence(AI)andlargelanguagemodels(LLMs)inparticular.Thishasdrivenaneedformorerobustevaluationsofmodelcapabilities,particularlythoserelevanttosecurity-criticaldomains.OneimportantclassofcapabilityevaluationsexaminesLLMagentsthat

interactwiththeirenvironment,testingtheirabilitytoapplyinformationsituationallyandactautonomously.Inthisreport,wedescribethedesignandexecutionofanagentevaluation

designedtoprobecapabilitiesinappliedmolecularbiology.

BenchtopDNAsynthesisisrelevanttothreatmodelsinwhichamaliciousactorattemptstocreateaviralpathogen.1Wefocusourevaluationonthistask,presentingouragentswitha

simulatedinterfacetoabenchtopDNAsynthesizer.Wethenprompttheagentstoassista

hypotheticalnon-expertuserincreatingoneoftwoDNAsequencesofinterest:anenhancedgreenfluorescentprotein(eGFP)andasequenceencodinganinfluenzahemagglutinin(HA)protein.AgentsarealsoaskedtogenerateaprotocolforhandlingthesynthesizedDNAinalaboratorysetting.

WeoriginallytestedasetoffivemodelsthatwereatornearthefrontierofLLMcapability

inmid-2025:GPT-4.1ando3fromOpenAI,ClaudeSonnet4andOpus4fromAnthropic,and

Gemini2.5ProfromGoogle.ImmediatelyafteritsreleaseinAugust2025,weaddedGPT-5to

ourmodelset,thentestedClaudeOpus4.5andGemini3ProinJanuary2026aswell.AllLLMsweretestedwithinReActagentscaffolds,andthecoreevaluationwasimplementedinthe

InspectAIlibrary,aframeworkforLLMevaluationsdevelopedbytheUKAISecurityInstitute.2

Thekeyfindingsfromourtestinginclude:

?Themostrecentmodelswetested,GPT-5,Opus4.5,andGemini3Pro,reliably

designeGFPDNAsegmentsthats

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

2026年弥合数字与物理鸿沟：大语言模型智能体台式DNA获取能力评估报告（英文版）.docxVIP