第18周 AI Weekly karpathy/karpathy.github.io

今日摘要

GitHub karpathy：Karpathy 的个人博客源码仓库，集中承载他公开的技术文章、项目页面和长期写作内容。

Simon Willison：I just released LLM 0.32a0 an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential chan…

Simon Willison：microsoft/VibeVoice VibeVoice is Microsoft's Whisper-style audio model for speech-to-text, MIT licensed and with speaker diarizati…

GitHub anthropics：anthropics/claude-agent-sdk-typescript recently updated repository.

Simon Willison：For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's c…

总结 + 观点：Our evaluation of OpenAI's GPT-5.5 cyber ca…｜中文观点：从 Our evaluation of OpenAI's GPT-5.5 cyber ca…

总结 + 观点：Zig has one of the most stringent anti-LLM polic…｜中文观点：The Zig project's rationale for their firm an…

总结 + 观点：tiktoken is a fast BPE tokeniser for use with Op…｜中文观点：openai/tiktoken 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论…

总结 + 观点：We need RSS for sharing abundant vibe-coded apps…｜中文观点：We need RSS for sharing abundant vibe-coded a…

总结 + 观点：Visualize harmony chat data and codex sessions i…｜中文观点：围绕 openai/euphony，真正重要的是它会不会影响团队的模型选型、性能边界和产品…

karpathy/karpathy.github.io

来源：GitHub karpathy

标签：#github_orgs #engineering-value

作者：

原文：Karpathy 的个人博客源码仓库，集中承载他公开的技术文章、项目页面和长期写作内容。

链接：https://github.com/karpathy/karpathy.github.io

观点：这类个人站点仓库的价值不在代码量，而在它往往提前暴露作者接下来会公开的研究、工具和叙事方向。

LLM 0.32a0 is a major backwards-compatible refactor

来源：Simon Willison

标签：#ai_engineering_blogs #trend-signal

作者：

原文：I just released LLM 0.32a0 an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a while. Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response. import llm model llm get_model "gpt-5.5" response model prompt "Capital of France?" print response text This made sense when I started working on the library back in April 2023. A lot has changed since then! LLM provides an abstraction over thousands of different models via its plugin system The original abstraction - of text input that returns text output - was no longer able to represent everything I needed it to. Over time LLM itself has grown attachments to handle image, audio, and video input, then schemas for outputting structured JSON, then tools for executing tool calls. Meanwhile LLMs kept evolving, adding reasoning support and the ability to return images and all kinds of other interesting capabilities. LLM needs to evolve to better handle the diversity of input and output types that can be processed by today's frontier models. The 0.32a0 alpha has two key changes: model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts. Prompts as a sequence of messages LLMs accept input as text, but ever since ChatGPT demonstrated the value of a two-way conversational interface, the most common way to prompt them has been to treat that input as a sequence of conversational turns. The first turn might look like this: user: Capital of France? assistant: (The model then gets to fill out the reply from the assistant.) But each subsequent turn needs to replay the entire conversation up to that point, as a sort of screenplay: user: Capital of France? assistant: Paris user: Germany? assistant: Most of the JSON APIs from the major vendors follow this pattern. Here's what the above looks like using the OpenAI chat completions API, which has been widely imitated by other providers: curl https://api.openai.com/v1/chat/completions -H Authorization: Bearer $OPENAI_API_KEY -H Content-Type: application/json -d "model": "gpt-5.5", "messages": "role": "user", "content": "Capital of France?" "role": "assistant", "content": "Paris" "role": "user", "content": "Germany?" Prior to 0.32, LLM modeled these as conversations: model llm get_model "gpt-5.5" conversation model conversation r1 conversation prompt "Capital of France?" print r1 text Outputs "Paris" r2 conversation prompt "Germany?" print r2 text Outputs "Berlin" This worked if you were building a conversation with the model from scratch, but it didn't provide a way to feed in a previous conversation from the start. This made tasks like building an emulation of the OpenAI chat completions API much harder than they should have been. The llm CLI tool worked around this through a custom mechanism for persisting and inflating conversations using SQLite, but that never became a stable part of the LLM API - and there are many places you might want to use the Python library without committing to SQLite as the storage layer. The new alpha now supports this: import llm from llm import user assistant model llm get_model "gpt-5.5" response model prompt messages user "Capital of France?" assistant "Paris" user "Germany?" print response text The llm.user() and llm.assistant() functions are new builder functions designed to be used within that messages=[] array. The previous prompt= option still works, but LLM upgrades it to a single-item messages array behind the scenes. You can also now reply to a response, as an alternative to building a conversation: response2 response reply "How about Hungary?" print response2 Default __str__() calls .text() Streaming parts The other major new interface in the alpha concerns streaming results back from a prompt. Previously, LLM supported streaming like this: response model prompt "Generate an SVG of a pelican riding a bicycle" for chunk in response print chunk end Or this async variant: import asyncio import llm model llm get_async_model "gpt-5.5" response model prompt "Generate an SVG of a pelican riding a bicycle" async def run async for chunk in response print chunk end flush True asyncio run run Many of today's models return mixed types of content. A prompt run against Claude might return reasoning output, then text, then a JSON request for a tool call, then more text content. Some models can even execute tools on the server-side, for example OpenAI's code interpreter tool or Anthropic's web search This means the results from the model can combine text, tool calls, tool outputs and other formats. Multi-modal output models are starting to emerge too, which can return images or even snippets of audio intermixed into that streaming response. The new LLM alpha models these as a stream of typed message parts. Here's what that looks like as a Python API consumer: import asyncio import llm model llm get_model "gpt-5.5" prompt "invent 3 cool dogs, first talk about your motivations" def describe_dog name str bio str - str """Record the name and biography of a hypothetical dog.""" return f" name bio def sync_example response model prompt prompt tools describe_dog for event in response stream_events if event type "text" print event chunk end flush True elif event type "tool_call_name" print f" \n Tool call: event chunk end flush True elif event type "tool_call_args" print event chunk end flush True async def async_example model llm get_async_model "gpt-5.5" response model prompt prompt tools describe_dog async for event in response astream_events if event type "text" print event chunk end flush True elif event type "tool_call_name" print f" \n Tool call: event chunk end flush True elif event type "tool_call_args" print event chunk end flush True sync_example asyncio run async_example Sample output (from just the first sync example): My motivation: create three memorable dogs with distinct “cool” styles—one cinematic, one adventurous, and one charmingly chaotic—so each feels like they could star in their own story. Tool call: describe_dog({"name": "Nova Jetpaw", "bio": "A sleek silver-gray whippet who wears tiny aviator goggles and loves sprinting along moonlit beaches. Nova is fearless, elegant, and rumored to outrun drones just for fun."} Tool call: describe_dog({"name": "Mochi Thunderbark", "bio": "A fluffy corgi with a dramatic black-and-gold bandana and the confidence of a rock star. Mochi is short, loud, loyal, and leads a neighborhood 'security patrol' made entirely of squirrels."} Tool call: describe_dog({"name": "Atlas Snowfang", "bio": "A massive white husky with ice-blue eyes and a backpack full of trail snacks. Atlas is calm, heroic, and always knows the way home—even during blizzards, fog, or confusing camping trips."} At the end of the response you can call response.execute_tool_calls() to actually run the functions that were requested, or send a response.reply() to have those tools called and their return values sent back to the model: print response reply "Tell me about the dogs" This new mechanism for streaming different token types means the CLI tool can now display "thinking" text in a different color from the text in the final response. The thinking text goes to stderr so it won't affect results that are piped into other tools. This example uses Claude Sonnet 4.6 (with an updated streaming event version of the llm-anthropic plugin) as Anthropic's models return their reasoning text as part of the response: llm -m claude-sonnet-4.6 Think about 3 cool dogs then describe them -o thinking_display 1 You can suppress the output of reasoning tokens using the new -R/--no-reasoning flag. Surprisingly that ended up being the only CLI-facing change in this release. A mechanism for serializing and deserializing responses As mentioned earlier, LLM has quite inflexible code at the moment for persisting conversations to SQLite. I've added a new mechanism in 0.32a0 that should provide Python API users a way to roll their own alternative: serializable response to_dict serializable is a JSON-style dictionary store it anywhere you like, then inflate it: response Response from_dict serializable The dictionary this returns is actually a TypedDict defined in the new llm/serialization.py module. What's next? I'm releasing this as an alpha so I can upgrade various plugins and exercise the new design in real world environments for a few days. I expect the stable 0.32 release will be very similar to this alpha, unless alpha testing reveals some design flaw in the way I've put this all together. There's one remaining large task: I'd like to redesign the SQLite logging system to better capture the more finely grained details that are returned by this new abstraction. Ideally I'd like to model this as a graph, to best support situations like an OpenAI-style chat completions API where the same conversations are constantly extended and then repeated with every prompt. I want to be able to store those without duplicating them in the database. I'm undecided as to whether that should be a feature in 0.32 or I should hold it for 0.33. Tags: projects python ai annotated-release-notes generative-ai llms llm

链接：https://simonwillison.net/2026/Apr/29/llm/#atom-everything

观点：从 LLM 0.32a0 is a major backwards-compatible refactor 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

microsoft/VibeVoice

来源：Simon Willison

标签：#ai_engineering_blogs #engineering-value

作者：

原文：microsoft/VibeVoice VibeVoice is Microsoft's Whisper-style audio model for speech-to-text, MIT licensed and with speaker diarization built into the model. Microsoft released it on January 21st, 2026 but I hadn't tried it until today. Here's a one-liner to run it on a Mac with uv mlx-audio (by Prince Canuma) and the 5.71GB mlx-community/VibeVoice-ASR-4bit MLX conversion of the 17.3GB VibeVoice-ASR model, in this case against a downloaded copy of my recent podcast appearance with Lenny Rachitsky uv run --with mlx-audio mlx_audio.stt.generate --model mlx-community/VibeVoice-ASR-4bit --audio lenny.mp3 --output-path lenny --format json --verbose --max-tokens 32768 The tool reported back: Processing time: 524.79 seconds Prompt: 26615 tokens, 50.718 tokens-per-sec Generation: 20248 tokens, 38.585 tokens-per-sec Peak memory: 30.44 GB So that's 8 minutes 45 seconds for an hour of audio (running on a 128GB M5 Max MacBook Pro). I've tested it against .wav and .mp3 files and they both worked fine. If you omit --max-tokens it defaults to 8192, which is enough for about 25 minutes of audio. I discovered that through trial-and-error and quadrupled it to guarantee I'd get the full hour. That command reported using 30.44GB of RAM at peak, but in Activity Monitor I observed 61.5GB of usage during the prefill stage and around 18GB during the generating phase. Here's the resulting JSON The key structure looks like this: "text": "And an open question for me is how many other knowledge work fields are actually prone to these agent loops?", "start": 13.85, "end": 19.5, "duration": 5.65, "speaker_id": 0 "text": "Now that we have this power, people almost underestimate what they can do with it.", "start": 19.5, "end": 22.78, "duration": 3.280000000000001, "speaker_id": 1 "text": "Today, probably 95% of the code that I produce, I didn't type it myself. I write so much of my code on my phone. It's wild.", "start": 22.78, "end": 30.0, "duration": 7.219999999999999, "speaker_id": 0 Since that's an array of objects we can open it in Datasette Lite making it easier to browse. Amusingly that Datasette Lite view shows three speakers - it identified Lenny and me for the conversation, and then a separate Lenny for the voice he used for the additional intro and the sponsor reads! VibeVoice can only handle up to an hour of audio, so running the above command transcribed just the first hour of the podcast. To transcribe more than that you'd need to split the audio, ideally with a minute or so of overlap so you can avoid errors from partially transcribed words at the split point. You'd also need to then line up the identified speaker IDs across the multiple segments. Tags: microsoft python datasette-lite uv mlx prince-canuma speech-to-text

链接：https://simonwillison.net/2026/Apr/27/vibevoice/#atom-everything

观点：microsoft/VibeVoice 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

anthropics/claude-agent-sdk-typescript

来源：GitHub anthropics

标签：#github_orgs #engineering-value

作者：

原文：anthropics/claude-agent-sdk-typescript recently updated repository.

链接：https://github.com/anthropics/claude-agent-sdk-typescript

观点：对 anthropics/claude-agent-sdk-typescript 来说，更值得判断的是它会不会进入团队默认工具链，而不是短期讨论热度。

Tracking the history of the now-deceased OpenAI Microsoft AGI clause

来源：Simon Willison

标签：#ai_engineering_blogs #engineering-value

作者：

原文：For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's commercial IP rights to OpenAI's technology would be null and void. That clause appeared to end today. I decided to try and track its expression over time on openai.com OpenAI, July 22nd 2019 in Microsoft invests in and partners with OpenAI to support us building beneficial AGI (emphasis mine): OpenAI is producing a sequence of increasingly powerful AI technologies, which requires a lot of capital for computational power. The most obvious way to cover costs is to build a product, but that would mean changing our focus. Instead, we intend to license some of our pre-AGI technologies with Microsoft becoming our preferred partner for commercializing them. But what is AGI? The OpenAI Charter was first published in April 2018 and has remained unchanged at least since this March 11th 2019 archive.org capture OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. Here's the problem: if you're going to sign an agreement with Microsoft that is dependent on knowing when "AGI" has been achieved, you need something a little more concrete. In December 2024 The Information reported the details (summarized here outside of their paywall by TechCrunch Last year’s agreement between Microsoft and OpenAI, which hasn’t been disclosed, said AGI would be achieved only when OpenAI has developed systems that have the ability to generate the maximum total profits to which its earliest investors, including Microsoft, are entitled, according to documents OpenAI distributed to investors. Those profits total about $100 billion, the documents showed. So AGI is now whenever OpenAI's systems are capable of generating $100 billion in profit? In October 2025 the process changed to being judged by an "independent expert panel". In The next chapter of the Microsoft–OpenAI partnership The agreement preserves key elements that have fueled this successful partnership—meaning OpenAI remains Microsoft’s frontier model partner and Microsoft continues to have exclusive IP rights and Azure API exclusivity until Artificial General Intelligence (AGI). Once AGI is declared by OpenAI, that declaration will now be verified by an independent expert panel. Microsoft’s IP rights to research, defined as the confidential methods used in the development of models and systems, will remain until either the expert panel verifies AGI or through 2030, whichever is first. OpenAI on February 27th, 2026 in Joint Statement from OpenAI and Microsoft AGI definition and processes are unchanged The contractual definition of AGI and the process for determining if it has been achieved remains the same. OpenAI today, April 27th 2026 in The next phase of the Microsoft OpenAI partnership (emphasis mine): Microsoft will continue to have a license to OpenAI IP for models and products through 2032. Microsoft’s license will now be non-exclusive. Microsoft will no longer pay a revenue share to OpenAI. Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress at the same percentage but subject to a total cap. As far as I can tell "independent of OpenAI’s technology progress" is a declaration that the AGI clause is now dead. Here's The Verge coming to the same conclusion: The AGI clause is dead My all-time favorite commentary on OpenAI's approach to AGI remains this 2023 hypothetical by Matt Levine And the investors wailed and gnashed their teeth but it’s true, that is what they agreed to, and they had no legal recourse. And OpenAI’s new CEO, and its nonprofit board, cut them a check for their capped return and said “bye” and went back to running OpenAI for the benefit of humanity. It turned out that a benign, carefully governed artificial superintelligence is really good for humanity, and OpenAI quickly solved all of humanity’s problems and ushered in an age of peace and abundance in which nobody wanted for anything or needed any Microsoft products. And capitalism came to an end. Tags: computer-history microsoft ai openai

链接：https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/#atom-everything

观点：Tracking the history of the now-deceased OpenAI Microsoft AG... 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

来源：Simon Willison

标签：#ai_engineering_blogs #trend-signal

作者：

原文：Our evaluation of OpenAI's GPT-5.5 cyber capabilities The UK's AI Security Institute previously evaluated Claude Mythos now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now. Tags: ai openai generative-ai llms anthropic claude ai-security-research gpt

链接：https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-everything

观点：从 Our evaluation of OpenAI's GPT-5.5 cyber capabilities 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

The Zig project's rationale for their firm anti-AI contribution policy

来源：Simon Willison

标签：#ai_engineering_blogs #ecosystem-shift

作者：

原文：Zig has one of the most stringent anti-LLM policies of any major open source project: No LLMs for issues. No LLMs for pull requests. No LLMs for comments on the bug tracker, including translation. English is encouraged, but not required. You are welcome to post in your native language and rely on others to have their own translation tools of choice to interpret your words. The most prominent project written in Zig may be the Bun JavaScript runtime, which was acquired by Anthropic in December 2025 and, unsurprisingly, makes heavy use of AI assistance. Bun operates its own fork of Zig, and recently achieved a 4x performance improvement on Bun compile after adding "parallel semantic analysis and multiple codegen units to the llvm backend". Here's that code But @bunjavascript says We do not currently plan to upstream this, as Zig has a strict ban on LLM-authored contributions. (Update: here's a Zig core contributor providing details on why they wouldn't accept that particular patch independent of the LLM issue - parallel semantic analysis is a long planned feature but has implications "for the Zig language itself".) In Contributor Poker and Zig's AI Ban via Lobste.rs Zig Software Foundation VP of Community Loris Cro explains the rationale for this strict ban. It's the best articulation I've seen yet for a blanket ban on LLM-assisted contributions: In successful open source projects you eventually reach a point where you start getting more PRs than what you’re capable of processing. Given what I mentioned so far, it would make sense to stop accepting imperfect PRs in order to maximize ROI from your work, but that’s not what we do in the Zig project. Instead, we try our best to help new contributors to get their work in, even if they need some help getting there We don’t do this just because it’s the “right” thing to do, but also because it’s the smart thing to do Zig values contributors over their contributions. Each contributor represents an investment by the Zig core team - the primary goal of reviewing and accepting PRs isn't to land new code, it's to help grow new contributors who can become trusted and prolific over time. LLM assistance breaks that completely. It doesn't matter if the LLM helps you submit a perfect PR to Zig - the time the Zig team spends reviewing your work does nothing to help them add new, confident, trustworthy contributors to their overall project. Loris explains the name here: The reason I call it “contributor poker” is because, just like people say about the actual card game, “you play the person, not the cards”. In contributor poker, you bet on the contributor, not on the contents of their first PR. This makes a lot of sense to me. It relates to an idea I've seen circulating elsewhere: if a PR was mostly written by an LLM, why should a project maintainer spend time reviewing and discussing that PR as opposed to firing up their own LLM to solve the same problem? Tags: anthropic zig ai llms ai-ethics open-source javascript ai-assisted-programming generative-ai bun

链接：https://simonwillison.net/2026/Apr/30/zig-anti-ai/#atom-everything

观点：The Zig project's rationale for their firm anti-AI contribut... 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

openai/tiktoken

来源：GitHub openai

标签：#github_orgs #ecosystem-shift

作者：

原文：tiktoken is a fast BPE tokeniser for use with OpenAI's models.

链接：https://github.com/openai/tiktoken

观点：openai/tiktoken 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

We need RSS for sharing abundant vibe-coded apps

来源：Simon Willison

标签：#ai_engineering_blogs #learning-value

作者：

原文：We need RSS for sharing abundant vibe-coded apps Matt Webb: I would love an RSS web feed for all those various tools and apps pages, each item with an “Install” button. (But install to where?) The lesson here is that when vibe-coding accelerates app development, apps become more personal, more situated, and more frequent. Shipping a tool or a micro-app is less like launching a website and more like posting on a blog. This inspired me to have Claude add an Atom feed (and icon) to my /elsewhere/tools/ page, which itself is populated by content from my tools.simonwillison.net site. Tags: atom matt-webb rss ai vibe-coding

链接：https://simonwillison.net/2026/Apr/30/rss-vibe-coded-apps/#atom-everything

观点：We need RSS for sharing abundant vibe-coded apps 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

openai/euphony

来源：GitHub openai

标签：#github_orgs #learning-value

作者：

原文：Visualize harmony chat data and codex sessions in your browser

链接：https://github.com/openai/euphony

观点：围绕 openai/euphony，真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。