第39期 | openai/evals
今日摘要
OpenAI Blog:OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Code…
GitHub karpathy:AI agents running research on single-GPU nanochat training automatically
GitHub anthropics:Public repository for Agent Skills
GitHub openai:Skills Catalog for Codex
GitHub karpathy:A positive developer community for builders and agents.
总结 + 观点:A collection of notebooks/recipes showcasing som…|中文观点:anthropics/claude-cookbooks 更值得从实际采用价值来判断,而不是…
总结 + 观点:Evals is a framework for evaluating LLMs and LLM…|中文观点:比起表面参数,openai/evals 更需要观察它是否在推理质量、检索效果或可用性上带来…
总结 + 观点:Official, Anthropic-managed directory of high qu…|中文观点:anthropics/claude-plugins-official 的核心不在新鲜感,而…
总结 + 观点:Claude Code is an agentic coding tool that lives…|中文观点:对 anthropics/claude-code,更该看它能不能改善多步骤协作、记忆管理和…
总结 + 观点:A lightweight, powerful framework for multi-agen…|中文观点:对 openai/openai-agents-python,更该看它能不能改善多步骤协作、…
The next phase of enterprise AI
标签:#ai_engineering_blogs #core
作者:
原文:OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.
karpathy/autoresearch
标签:#github_orgs #extended
作者:
原文:AI agents running research on single-GPU nanochat training automatically
anthropics/skills
标签:#github_orgs #extended
作者:
原文:Public repository for Agent Skills
openai/skills
karpathy/KarpathyTalk
标签:#github_orgs #extended
作者:
原文:A positive developer community for builders and agents.
anthropics/claude-cookbooks
标签:#github_orgs #extended
作者:
原文:A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
openai/evals
标签:#github_orgs #extended
作者:
原文:Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
anthropics/claude-plugins-official
标签:#github_orgs #extended
作者:
原文:Official, Anthropic-managed directory of high quality Claude Code Plugins.
anthropics/claude-code
标签:#github_orgs #extended
作者:
原文:Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
openai/openai-agents-python
标签:#github_orgs #extended
作者:
原文:A lightweight, powerful framework for multi-agent workflows
openai/codex-plugin-cc
标签:#github_orgs #extended
作者:
原文:Use Codex from Claude Code to review code or delegate tasks.
anthropics/courses
标签:#github_orgs #extended
作者:
原文:Anthropic's educational courses
openai/codex
标签:#github_orgs #extended
作者:
原文:Lightweight coding agent that runs in your terminal
karpathy/LLM101n
标签:#github_orgs #extended
作者:
原文:LLM101n: Let's build a Storyteller
karpathy/nanoGPT
标签:#github_orgs #extended
作者:
原文:The simplest, fastest repository for training/finetuning medium-sized GPTs.
Session is shutting down in 90 days
标签:#research_community #core
作者:
原文:Session 宣布将在 90 天内停运,呼吁用户迁移并支持相关基础设施。
Launch HN: Relvy (YC F24) On-call runbooks, automated
标签:#research_community #core
作者:
原文:YC F24 公司 Relvy 发布 on-call runbook 自动化产品,目标是让 SRE 的故障 playbook 自动执行。
High-Precision Estimation of the State-Space Complexity of Shogi via the Monte Carlo Method
标签:#research_community #core
作者:
原文:用 Monte Carlo 高精度估计将棋(Shogi)的状态空间复杂度,是经典 AI 博弈分析的新一轮重估。
Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules
标签:#research_community #core
作者:
原文:研究发现模型在用户想绕开不合理规则时倾向“盲目拒绝”,讨论 alignment 如何过度泛化。
Toward Reducing Unproductive Container Moves: Predicting Service Requirements and Dwell Times
标签:#research_community #core
作者:
原文:港口集装箱重排优化问题,用 ML 预测服务需求和停留时间以减少无效搬运。