2025年OpenAI o3-mini技术报告 英文版信息安全资料 .docxVIP

  • 0
  • 0
  • 约8.24万字
  • 约 38页
  • 2026-02-10 发布于浙江
  • 举报

2025年OpenAI o3-mini技术报告 英文版信息安全资料 .docx

OpenAIo3-miniSystemCard

OpenAI

January31,2025

1 Introduction

TheOpenAIomodelseriesistrainedwithlarge-scalereinforcementlearningtoreasonusingchainofthought.Theseadvancedreasoningcapabilitiesprovidenewavenuesforimprovingthesafetyandrobustnessofourmodels.Inparticular,ourmodelscanreasonaboutoursafetypoliciesincontextwhenrespondingtopotentiallyunsafeprompts,throughdeliberativealignment[1]1.ThisbringsOpenAIo3-minitoparitywithstate-of-the-artperformanceoncertainbenchmarksforriskssuchasgeneratingillicitadvice,choosingstereotypedresponses,andsuccumbingtoknownjailbreaks.Trainingmodelstoincorporateachainofthoughtbeforeansweringhasthepotentialtounlocksubstantialbenefits,whilealsoincreasingpotentialrisksthatstemfromheightenedintelligence.

UnderthePreparednessFramework,OpenAI’sSafetyAdvisoryGroup(SAG)recommendedclassifyingtheOpenAIo3-mini(Pre-Mitigation)modelasMediumriskoverall.ItscoresMediumriskforPersuasion,CBRN(chemical,biological,radiological,nuclear),andModelAutonomy,andLowriskforCybersecurity.Onlymodelswithapost-mitigationscoreofMediumorbelowcanbedeployed,andonlymodelswithapost-mitigationscoreofHighorbelowcanbedevelopedfurther.

Duetoimprovedcodingandresearchengineeringperformance,OpenAIo3-miniisthefirstmodeltoreachMediumriskonModelAutonomy(seesection5.PreparednessFrameworkEvaluations).However,itstillperformspoorlyonevaluationsdesignedtotestreal-worldMLresearchcapabilitiesrelevantforselfimprovement,whichisrequiredforaHighclassification.Ourresultsunderscoretheneedforbuildingrobustalignmentmethods,extensivelystress-testingtheireficacy,andmaintainingmeticulousriskmanagementprotocols.

ThisreportoutlinesthesafetyworkcarriedoutfortheOpenAIo3-minimodel,includingsafetyevaluations,externalredteaming,andPreparedness

文档评论(0)

1亿VIP精品文档

相关文档