第39期 openai/evals

今日摘要

OpenAI Blog：OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Code…

GitHub karpathy：AI agents running research on single-GPU nanochat training automatically

GitHub anthropics：Public repository for Agent Skills

GitHub openai：Skills Catalog for Codex

GitHub karpathy：A positive developer community for builders and agents.

总结 + 观点：A collection of notebooks/recipes showcasing som…｜中文观点：anthropics/claude-cookbooks 更值得从实际采用价值来判断，而不是…

总结 + 观点：Evals is a framework for evaluating LLMs and LLM…｜中文观点：比起表面参数，openai/evals 更需要观察它是否在推理质量、检索效果或可用性上带来…

总结 + 观点：Official, Anthropic-managed directory of high qu…｜中文观点：anthropics/claude-plugins-official 的核心不在新鲜感，而…

总结 + 观点：Claude Code is an agentic coding tool that lives…｜中文观点：对 anthropics/claude-code，更该看它能不能改善多步骤协作、记忆管理和…

总结 + 观点：A lightweight, powerful framework for multi-agen…｜中文观点：对 openai/openai-agents-python，更该看它能不能改善多步骤协作、…

The next phase of enterprise AI

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.

链接：https://openai.com/index/next-phase-of-enterprise-ai

观点：围绕 The next phase of enterprise AI，真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。

karpathy/autoresearch

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：AI agents running research on single-GPU nanochat training automatically

链接：https://github.com/karpathy/autoresearch

观点：围绕 karpathy/autoresearch，真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。

anthropics/skills

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：Public repository for Agent Skills

链接：https://github.com/anthropics/skills

观点：anthropics/skills 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

openai/skills

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：Skills Catalog for Codex

链接：https://github.com/openai/skills

观点：openai/skills 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

karpathy/KarpathyTalk

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：A positive developer community for builders and agents.

链接：https://github.com/karpathy/KarpathyTalk

观点：karpathy/KarpathyTalk 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

anthropics/claude-cookbooks

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

链接：https://github.com/anthropics/claude-cookbooks

观点：anthropics/claude-cookbooks 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

openai/evals

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

链接：https://github.com/openai/evals

观点：比起表面参数，openai/evals 更需要观察它是否在推理质量、检索效果或可用性上带来真实改进。

anthropics/claude-plugins-official

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：Official, Anthropic-managed directory of high quality Claude Code Plugins.

链接：https://github.com/anthropics/claude-plugins-official

观点：anthropics/claude-plugins-official 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

anthropics/claude-code

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

链接：https://github.com/anthropics/claude-code

观点：对 anthropics/claude-code，更该看它能不能改善多步骤协作、记忆管理和稳定交付，而不是只看 demo 效果。

openai/openai-agents-python

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：A lightweight, powerful framework for multi-agent workflows

链接：https://github.com/openai/openai-agents-python

观点：对 openai/openai-agents-python，更该看它能不能改善多步骤协作、记忆管理和稳定交付，而不是只看 demo 效果。

openai/codex-plugin-cc

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：Use Codex from Claude Code to review code or delegate tasks.

链接：https://github.com/openai/codex-plugin-cc

观点：openai/codex-plugin-cc 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

anthropics/courses

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：Anthropic's educational courses

链接：https://github.com/anthropics/courses

观点：anthropics/courses 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

openai/codex

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：Lightweight coding agent that runs in your terminal

链接：https://github.com/openai/codex

观点：openai/codex 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

karpathy/LLM101n

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：LLM101n: Let's build a Storyteller

链接：https://github.com/karpathy/LLM101n

观点：karpathy/LLM101n 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

karpathy/nanoGPT

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：The simplest, fastest repository for training/finetuning medium-sized GPTs.

链接：https://github.com/karpathy/nanoGPT

观点：比起表面参数，karpathy/nanoGPT 更需要观察它是否在推理质量、检索效果或可用性上带来真实改进。

Session is shutting down in 90 days

来源：Hacker News Frontpage

标签：#research_community #core

作者：

原文：Session 宣布将在 90 天内停运，呼吁用户迁移并支持相关基础设施。

链接：https://getsession.org/donate

观点：P2P 通讯工具被“续命模式”拖垮很常见。Session 关停是又一个提醒：去中心化也需要商业可持续性。

Launch HN: Relvy (YC F24) On-call runbooks, automated

来源：Hacker News Frontpage

标签：#research_community #core

作者：

原文：YC F24 公司 Relvy 发布 on-call runbook 自动化产品，目标是让 SRE 的故障 playbook 自动执行。

链接：https://www.relvy.ai

观点：on-call 自动化最大的门槛是信任，不是技术。Relvy 能不能跑起来取决于企业愿不愿意把二线操作权限交给 agent。

High-Precision Estimation of the State-Space Complexity of Shogi via the Monte Carlo Method

来源：arXiv cs.AI

标签：#research_community #core

作者：

原文：用 Monte Carlo 高精度估计将棋（Shogi）的状态空间复杂度，是经典 AI 博弈分析的新一轮重估。

链接：https://arxiv.org/abs/2604.06189

观点：这种论文的意义是方法而不是结论：它提供了一套可移植到其他复杂离散博弈的数值估计流程。

Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules

来源：arXiv cs.AI

标签：#research_community #core

作者：

原文：研究发现模型在用户想绕开不合理规则时倾向“盲目拒绝”，讨论 alignment 如何过度泛化。

链接：https://arxiv.org/abs/2604.06233

观点：这条是 alignment 的“另一面”：拒绝也是一种决策，如果拒绝没有上下文感，就会伤到合理诉求。值得对齐工程师细读。

Toward Reducing Unproductive Container Moves: Predicting Service Requirements and Dwell Times

来源：arXiv cs.AI

标签：#research_community #core

作者：

原文：港口集装箱重排优化问题，用 ML 预测服务需求和停留时间以减少无效搬运。

链接：https://arxiv.org/abs/2604.06251

观点：这是一个很好的“老工业 + AI”例子：价值不在算法新颖，而在它能把非常具体的运营成本砍下来。