今日摘要

OpenAI Blog:A pilot program to support independent safety and alignment research and develop the next generation of talent

OpenAI Blog:Explore our ambitious, people-first industrial policy ideas for the AI era—focused on expanding opportunity, sharing prosperity, a…

OpenAI Engineering:OpenAI engineering 列表显示,Responses API agent computer environment,这意味着模型调用正在往更完整的 agent runtime

OpenAI Engineering:OpenAI RSS Model Spec agent

Anthropic Engineering:Anthropic agentic coding benchmark,波动甚至可能超过榜单模型之间的差距。这对 agent eval

总结 + 观点:Anthropic harness agent runtime、上下文和安全边界设计问题。|中文观点:从 Harness design for long-running application…

总结 + 观点:Karpathy 2025 LLM RLVR reasoning test-time compu…|中文观点:比起表面参数,2025 LLM Year in Review 更需要观察它是否在推理质量、…

总结 + 观点:OpenAI outlines the next phase of enterprise AI,…|中文观点:围绕 The next phase of enterprise AI,真正重要的是它会不会…

总结 + 观点:AI agents running research on single-GPU nanocha…|中文观点:围绕 karpathy/autoresearch,真正重要的是它会不会影响团队的模型选型、…

总结 + 观点:Public repository for Agent Skills|中文观点:anthropics/skills 的核心不在新鲜感,而在它是否能提升工程效率、部署稳定性…

Announcing the OpenAI Safety Fellowship

来源:OpenAI Blog

标签:#ai_engineering_blogs #core

作者:

原文:A pilot program to support independent safety and alignment research and develop the next generation of talent

链接:https://openai.com/index/introducing-openai-safety-fellowship

观点:Announcing the OpenAI Safety Fellowship 更值得从实际采用价值来判断,而不是只看它有没有制造新的讨论热度。

Industrial policy for the Intelligence Age

来源:OpenAI Blog

标签:#ai_engineering_blogs #core

作者:

原文:Explore our ambitious, people-first industrial policy ideas for the AI era—focused on expanding opportunity, sharing prosperity, and building resilient institutions as advanced intelligence evolves.

链接:https://openai.com/index/industrial-policy-for-the-intelligence-age

观点:Industrial policy for the Intelligence Age 更值得从实际采用价值来判断,而不是只看它有没有制造新的讨论热度。

From model to agent: Equipping the Responses API with a computer environment

来源:OpenAI Engineering

标签:#uncategorized #core

作者:

原文:OpenAI engineering 列表显示,Responses API agent computer environment,这意味着模型调用正在往更完整的 agent runtime

链接:https://openai.com/index/equip-responses-api-computer-environment/

观点:From model to agent: Equipping the Responses API with a comp... 的核心不在新鲜感,而在它是否能提升工程效率、部署稳定性或开发者工作流。

Inside our approach to the Model Spec

来源:OpenAI Engineering

标签:#uncategorized #core

作者:

原文:OpenAI RSS Model Spec agent

链接:https://openai.com/index/our-approach-to-the-model-spec

观点:从 Inside our approach to the Model Spec 看,后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

Quantifying infrastructure noise in agentic coding evals

来源:Anthropic Engineering

标签:#uncategorized #core

作者:

原文:Anthropic agentic coding benchmark,波动甚至可能超过榜单模型之间的差距。这对 agent eval

链接:https://www.anthropic.com/engineering/infrastructure-noise

观点:比起表面参数,Quantifying infrastructure noise in agentic coding evals 更需要观察它是否在推理质量、检索效果或可用性上带来真实改进。

Harness design for long-running application development

来源:Anthropic Engineering

标签:#uncategorized #core

作者:

原文:Anthropic harness agent runtime、上下文和安全边界设计问题。

链接:https://www.anthropic.com/engineering/harness-design-long-running-apps

观点:从 Harness design for long-running application development 看,后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

2025 LLM Year in Review

来源:Andrej Karpathy

标签:#uncategorized #core

作者:

原文:Karpathy 2025 LLM RLVR reasoning test-time compute

链接:https://karpathy.bearblog.dev/year-in-review-2025/

观点:比起表面参数,2025 LLM Year in Review 更需要观察它是否在推理质量、检索效果或可用性上带来真实改进。

The next phase of enterprise AI

来源:OpenAI Blog

标签:#ai_engineering_blogs #core

作者:

原文:OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.

链接:https://openai.com/index/next-phase-of-enterprise-ai

观点:围绕 The next phase of enterprise AI,真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。

karpathy/autoresearch

来源:GitHub karpathy

标签:#github_orgs #extended

作者:

原文:AI agents running research on single-GPU nanochat training automatically

链接:https://github.com/karpathy/autoresearch

观点:围绕 karpathy/autoresearch,真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。

anthropics/skills

来源:GitHub anthropics

标签:#github_orgs #extended

作者:

原文:Public repository for Agent Skills

链接:https://github.com/anthropics/skills

观点:anthropics/skills 的核心不在新鲜感,而在它是否能提升工程效率、部署稳定性或开发者工作流。

openai/skills

来源:GitHub openai

标签:#github_orgs #extended

作者:

原文:Skills Catalog for Codex

链接:https://github.com/openai/skills

观点:openai/skills 更值得从实际采用价值来判断,而不是只看它有没有制造新的讨论热度。

karpathy/KarpathyTalk

来源:GitHub karpathy

标签:#github_orgs #extended

作者:

原文:A positive developer community for builders and agents.

链接:https://github.com/karpathy/KarpathyTalk

观点:karpathy/KarpathyTalk 更值得从实际采用价值来判断,而不是只看它有没有制造新的讨论热度。

anthropics/claude-cookbooks

来源:GitHub anthropics

标签:#github_orgs #extended

作者:

原文:A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

链接:https://github.com/anthropics/claude-cookbooks

观点:anthropics/claude-cookbooks 更值得从实际采用价值来判断,而不是只看它有没有制造新的讨论热度。

openai/evals

来源:GitHub openai

标签:#github_orgs #extended

作者:

原文:Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

链接:https://github.com/openai/evals

观点:比起表面参数,openai/evals 更需要观察它是否在推理质量、检索效果或可用性上带来真实改进。

anthropics/claude-plugins-official

来源:GitHub anthropics

标签:#github_orgs #extended

作者:

原文:Official, Anthropic-managed directory of high quality Claude Code Plugins.

链接:https://github.com/anthropics/claude-plugins-official

观点:anthropics/claude-plugins-official 的核心不在新鲜感,而在它是否能提升工程效率、部署稳定性或开发者工作流。

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

来源:Hacker News Frontpage

标签:#research_community #core

作者:

原文:SkyPilot 团队讲“研究驱动 agent”:agent 在写代码前先做 literature review 与背景调查。

链接:https://blog.skypilot.co/research-driven-agents/

观点:这篇反映的是 agent 设计理念的新阶段:从“快点动手”转向“先搞懂”,对调试困难的复杂任务尤其重要。

Show HN: I built a Cargo-like build tool for C/C++

来源:Hacker News Frontpage

标签:#research_community #core

作者:

原文:一个类 Cargo 的 C/C++ 构建工具,目标是把 Rust 生态里简洁的依赖管理体验搬到 C/C++。

链接:https://github.com/randerson112/craft

观点:C/C++ 的痛点不是语言本身,而是构建与依赖管理的荒野。这类工具就算不火,也在持续把 baseline 往上拉。

Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases

来源:Meta Engineering

标签:#engineering_ai_infra_blogs #extended

作者:

原文:Meta 讲他们如何把 50+ 产品里各自分叉的 WebRTC 合并回主线,减少长期维护负担。

链接:https://engineering.fb.com/2026/04/09/developer-tools/escaping-the-fork-how-meta-modernized-webrtc-across-50-use-cases/

观点:这篇对做 infra 的人价值大:大公司怎么摆脱“自己 fork 维护”的债务,这是每个成熟工程团队早晚要面对的选择。

Emperor penguin and Antarctic fur seal now endangered

来源:Hacker News Frontpage

标签:#research_community #core

作者:

原文:IUCN 把帝企鹅和南极毛皮海豹列为濒危,主因是气候变化导致栖息地退化。

链接:https://iucn.org/press-release/202604/emperor-penguin-and-antarctic-fur-seal-now-endangered-due-climate-change-iucn

观点:跟 AI 无关但值得让日报保留:这类数据是 AI 应用(气候建模、野外监测)最重要的现实锚点之一。

Deep Agents Deploy: an open alternative to Claude Managed Agents

来源:LangChain Blog

标签:#ai_engineering_blogs #core

作者:

原文:LangChain 的 Deep Agents Deploy 进入 beta,定位是模型无关的开源 agent harness,对标 Claude Managed Agents。

链接:https://blog.langchain.com/deep-agents-deploy-an-open-alternative-to-claude-managed-agents/

观点:对“怕被单一厂商锁定”的团队,这条是关键选项:它把 managed agent 的能力下放到可自托管、可切模型的栈里。