基准测试
事件包括SWE-bench、MMLU-Pro、GPQA、IFEval等用于评估模型性能的测试
199 次提及143 个连接最近出现: 2026-04-29
关系图谱
关系 (199)
应用于 (78)
通义千问 MaxInfoTokAI科研助手AI安全Fast-dVLAAgentAI搜索GLM-5Gemini 3具身智能MythosADeLeRLHFSFTMASFactoryDeepScientist V1.5Claude 3.5 SonnetClaude CodeGPT-5 Mini-RSleepFMMythos PreviewClaude MythosVector DatabaseInternVL2-8BMemPalaceTransformerLeVERBPaperWritingBenchHappyHorse 1.0SynergyQwen2.5Psi-R2Claude Opus 4.6Chain-of-ThoughtStarVLALatentUMM2.7Claude Sonnet 4.6AI编程助手机器人Claude Opus 4.6vidu Q3Qwen3DeepSeek R1Gemini 2.0Gemini 2.5 ProAI客服AI编程助手ElephantStreamingVLAGPT-Rosalind图像生成自动驾驶o3GPT-5HarnessLLaMA 4LLaMA 3Images 2.0GPT Image 2Qwen3.6-27BGPT-5.5MiMo-V2.5-ProPRETAlphaFold医疗AIDASES框架GPT-4o代码生成AI对齐DeepSeek V3WorldScape边缘计算代码生成清华大学SenseNova U1LDA-1BMotuBrain
使用 (55)
Qwen3MMDuet2DGM-HyperagentsClaude Opus 4.6GPT-5GLM-5MiniMaxGPT-4oClaudeGeminiLLaMA 3深度求索 DeepSeekChatGPTEchoZ-1.0GPT-5.2MetaClawTurboQuantRaBitQCLIPDINOLongCat-NextGemini 2.5 ProHappyHorse 1.0AgentLLaMA 4Kimi MoonshotClaude CodePsi-R2Claude MythosClaude 4.5 SonnetGemini 3.1 Providu Q3面壁智能PRM-as-a-JudgeRoboPulseAnthropicTuriX Superpower阿里巴巴Meta AIOpenAIGemini 3 ProLingBot-MapGO-2GPT-RosalindStreamingVLAAURAAtomVLAGPT-5.5MCPDeepSeek V4Google DeepMindVideoAuto-R1LIVRMotuBrainSenseNova U1
发布 (18)
领导 (12)
使用技术 (9)
顾问 (1)
相关文章 (199)
下滑加载更多...(已显示 30 / 199)