第98期 Our response to the TanStack npm supply chain attack

今日摘要

OpenAI Blog：See how sales teams can use Codex to create pipeline briefs, meeting prep packets, forecast reviews, account plans, and stalled-de…

OpenAI Blog：Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software developmen…

OpenAI Blog：Use Codex anywhere with the ChatGPT mobile app. Monitor, steer, and approve coding tasks in real time across devices and remote en…

OpenAI Blog：Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respo…

OpenAI Blog：Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access a…

总结 + 观点：A suite of plugins for legal workflows｜中文观点：对 anthropics/claude-for-legal，更该看它能不能改善多步骤协作、…

总结 + 观点：openai/openai-builder-lab recently updated repos…｜中文观点：openai/openai-builder-lab 的核心不在新鲜感，而在它是否能提升工程…

总结 + 观点：LLM training in simple, raw C/CUDA｜中文观点：karpathy/llm.c 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热…

总结 + 观点：OpenAI details its response to the TanStack “Min…｜中文观点：从 Our response to the TanStack npm supply cha…

总结 + 观点：See how finance teams can use Codex to build MBR…｜中文观点：How finance teams use Codex 的核心不在新鲜感，而在它是否能提升…

How sales teams use Codex

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：See how sales teams can use Codex to create pipeline briefs, meeting prep packets, forecast reviews, account plans, and stalled-deal diagnoses from real work inputs.

链接：https://openai.com/academy/codex-for-work/how-sales-teams-use-codex

观点：对 How sales teams use Codex，更该看它能不能改善多步骤协作、记忆管理和稳定交付，而不是只看 demo 效果。

Sea's View on the Future of Agentic Software Development with Codex

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software development in Asia.

链接：https://openai.com/index/sea-david-chen

观点：更值得关注的是 Sea's View on the Future of Agentic Software Development wit... 是否真正改变产品落地、工程效率、分发格局或平台控制力，而不只是制造声量。

Work with Codex from anywhere

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：Use Codex anywhere with the ChatGPT mobile app. Monitor, steer, and approve coding tasks in real time across devices and remote environments.

链接：https://openai.com/index/work-with-codex-from-anywhere

观点：围绕 Work with Codex from anywhere，真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。

Helping ChatGPT better recognize context in sensitive conversations

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.

链接：https://openai.com/index/chatgpt-recognize-context-in-sensitive-conversations

观点：围绕 Helping ChatGPT better recognize context in sensitive conver...，真正重要的是它会不会影响团队的模型选型、性能边界和产品体验。

Building a safe, effective sandbox to enable Codex on Windows

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions.

链接：https://openai.com/index/building-codex-windows-sandbox

观点：Building a safe, effective sandbox to enable Codex on Window... 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

anthropics/claude-for-legal

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：A suite of plugins for legal workflows

链接：https://github.com/anthropics/claude-for-legal

观点：对 anthropics/claude-for-legal，更该看它能不能改善多步骤协作、记忆管理和稳定交付，而不是只看 demo 效果。

openai/openai-builder-lab

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：openai/openai-builder-lab recently updated repository.

链接：https://github.com/openai/openai-builder-lab

观点：openai/openai-builder-lab 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

karpathy/llm.c

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：LLM training in simple, raw C/CUDA

链接：https://github.com/karpathy/llm.c

观点：karpathy/llm.c 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

Our response to the TanStack npm supply chain attack

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：OpenAI details its response to the TanStack “Mini Shai-Hulud” supply chain attack, outlines protections taken to secure systems and signing certificates, and explains why macOS users must update OpenAI apps by June 12, 2026. Learn what happened, what was affected, and how OpenAI is strengthening defenses against evolving software supply chain threats.

链接：https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack

观点：从 Our response to the TanStack npm supply chain attack 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

How finance teams use Codex

来源：OpenAI Blog

标签：#ai_engineering_blogs #core

作者：

原文：See how finance teams can use Codex to build MBRs, reporting packs, variance bridges, model checks, and planning scenarios from real work inputs.

链接：https://openai.com/academy/how-finance-teams-use-codex

观点：How finance teams use Codex 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

anthropics/financial-services

来源：GitHub anthropics

标签：#github_orgs #extended

作者：

原文：anthropics/financial-services recently updated repository.

链接：https://github.com/anthropics/financial-services

观点：anthropics/financial-services 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

openai/snap-o

来源：GitHub openai

标签：#github_orgs #extended

作者：

原文：Lightweight Android capture developer tools for macOS

链接：https://github.com/openai/snap-o

观点：openai/snap-o 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

karpathy/deep-vector-quantization

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：VQVAEs, GumbelSoftmaxes and friends

链接：https://github.com/karpathy/deep-vector-quantization

观点：karpathy/deep-vector-quantization 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

karpathy/transformers

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

链接：https://github.com/karpathy/transformers

观点：karpathy/transformers 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

karpathy/build-nanogpt

来源：GitHub karpathy

标签：#github_orgs #extended

作者：

原文：Video+code lecture on building nanoGPT from scratch

链接：https://github.com/karpathy/build-nanogpt

观点：karpathy/build-nanogpt 更值得从实际采用价值来判断，而不是只看它有没有制造新的讨论热度。

Running Python code in a sandbox with MicroPython and WASM

来源：Simon Willison

标签：#ai_engineering_blogs #core

作者：

原文：I've been experimenting with different approaches to running code in a sandbox for several years now, but my latest attempt feels like it might finally have all of the characteristics I've been looking for. I've released it as an alpha package called micropython-wasm and I'm using it for a code execution sandbox plugin for Datasette Agent called datasette-agent-micropython Why do I want a sandbox? What I want from a sandbox WebAssembly looks really promising here MicroPython in WebAssembly Building the first version Try it yourself Should you trust my vibe-coded sandbox? Why do I want a sandbox? My key open source projects - Datasette LLM even sqlite-utils - all support plugins. I absolutely love plugins as a mechanism for extending software. A carefully designed plugin system reduces the risk involved in trying new things to almost nothing - even the wildest ideas won't leave a lasting influence on the core application itself. My software can grow a new feature overnight and I don't even have to review a pull request! There's one major drawback: my plugin systems all use Python and Pluggy and plugin code executes with full privileges within my applications. A buggy or malicious plugin could break everything or leak private data. I'd love to be able to run plugin-style code in an environment where it is unable to read unapproved files, connect to a network, or generally operate in a way that's risky or harmful to the rest of the application or the user's computer. My interest covers more than just plugins. For Datasette in particular there are many features I'd like to support where arbitrary code execution would be useful. I've already experimented with this for Datasette Enrichments where code can be used to transform values stored in a table. I'd love to build a mechanism where you can run code on a schedule that fetches JSON from an approved location, runs a tiny bit of code to reformat it into a list of dictionaries, then inserts those as rows in a SQLite database table. What I want from a sandbox My goal is to execute code safely within my own Python applications. Here's what I need: Dependencies that cleanly install from PyPI including binary wheels across multiple platforms if necessary. I don't want people using my software to have to take any extra steps beyond directly installing my Python package. Executed code must be subject to both memory and CPU limits. I don't want while True: s "longer string" to crash my application or the user's computer. File access must be strictly controlled Either no filesystem access at all or I get to define exactly which files can be read and which files can be written to. Network access is controlled as well Sandboxed code should not be able to communicate with anything without going through a layer I fully control. Support for interaction with host functions A sandbox isn't much use if I can't carefully expose selected platform features to the code that it's running. It has to be robust, supported, and clearly documented I've lost count of the number of sandbox projects I've seen in repos with warnings that they aren't actively maintained! WebAssembly looks really promising here Web browsers operate in the most hostile environment imaginable when it comes to malicious code. Their job is to download and execute untrusted code from the web on almost every page load. Given this, JavaScript engines should be excellent candidates for sandboxes. Sadly those engines are also extremely complicated, and are not designed for easy embedding in other projects. Most of the V8-in-Python projects I've seen are infrequently maintained and come with warnings not to use them with completely untrusted code. WebAssembly is a much better candidate. It was designed from the start to support all of the characteristics I care about and has been tested in browsers for nearly a decade. The wasmtime Python library brings WASM to Python, is actively maintained, and has binary wheels. MicroPython in WebAssembly WebAssembly engines like wasmtime run WebAssembly binaries. Some programming languages like Rust are easy to compile directly to WebAssembly. Dynamic languages like JavaScript and Python are harder - they support language primitives like eval() which means they need a full interpreter available at runtime. To run Python we need a full Python interpreter compiled to WebAssembly, wired up in a way that makes it easy to feed it code, hook up host functions and access the results. Pyodide offers an outstanding package for running Python using WebAssembly in the browser, but using Pyodide in server-side Python isn't supported. The most recent advice I could find was from October 2024 stating "Pyodide is built by the Emscripten toolchain and can only run in a browser or Node.js". The other day I decided to take a look at MicroPython as an option for this. The MicroPython site says: MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library and is optimised to run on microcontrollers and in constrained environments. WebAssembly sure feels like a constrained environment to me! Building the first version I had GPT-5.5 Pro do some research for me which turned up this PR against MicroPython by Yamamoto Takahashi titled "Experimental WASI support for ports/unix". It then produced this research.md document so I let Codex Desktop and GPT-5.5 high loose on it to see what would happen: read the research.md document and build this. You will probably need to write a script that compiles a custom WASM version of MicroPython as part of this project - fetch the MicroPython code to a /tmp directory for this as part of that script. It worked. I now had a prototype Python library that could execute Python code inside a WebAssembly sandbox! The trickiest piece to solve was persistent interpreter state. The WASM build we are using here exposes a single entry point which starts the interpreter, runs the code and then stops the interpreter at the end. This works fine for one-off scripts, but for Datasette Agent I want variables and functions to stay resident in memory so I can reuse them across multiple code execution calls. A neat thing about working with coding agents is that you can get from an idea to a proof of concept quickly. I prompted: For keeping variables resident: what if we ran code inside micropython itself which called a host function get_next_python_code() and then passed that to eval() - and that host function blocked until new code was available, maybe by running in a thread with a queue? Could that or a similar idea help here? After some iteration we got to a version of this that works! In Python code you can now do this: from micropython_wasm import MicroPythonSession with MicroPythonSession as session print session run "x 10 \n print(x)" stdout print session run "x 5 \n print(x)" stdout print session run "print(x 2)" stdout Under the hood this starts a thread, sets up a request queue and then sends messages to that queue for the session.run() command, each time waiting on a reply queue for the result of that execution. Inside WASM the MicroPython interpreter blocks waiting for a __session_next__() host function to return the next line of code, which it runs eval() on before calling __session_result__({"id": request_id, "ok": True}) when each block has been successfully executed. The other piece of complexity was supporting host functions, so my Python library could selectively expose functions that could then be called by code running in MicroPython. Codex ended up solving this with 78 lines of C which ends up compiled into the 362KB WebAssembly blob I'm distributing with the package. I am by no means a C programmer, but I've read the C and had two different models explain it to me (here's Claude's explanation and I've subjected it to a barrage of tests. The great thing about working with WebAssembly is that if the C turns out to be fatally flawed the worst that can happen is the WebAssembly execution will fail with an exception. I can live with that risk. Memory limits are directly supported by wasmtime. CPU limits are a little harder: wasmtime offers a "fuel" concept to limit how many operations a WebAssembly call can execute, and that's the correct fit for this problem, but the units are hard to reason about. I'm experimenting with a 20 million default "fuel" setting now but I'm not confident that it's the most appropriate value. Try it yourself The micropython-wasm alpha is now live on PyPI You can try it from your own Python code as described in the README I've also added a simple CLI mode in version 0.1a2 which means you can try it using uvx without first installing it like so: uvx micropython-wasm -c print("Hello world") To see it run out of fuel: uvx micropython-wasm -c s while True: s "longer" Outputs: micropython-wasm: guest exited with code 1 You can also try it in Datasette Agent like this: uvx llm keys set openai Paste in an OpenAI key, then: uvx --with datasette-agent --with datasette-agent-micropython --prerelease allow datasette --internal internal.db -s plugins.datasette-llm.default_model gpt-5.5 --root -o Then navigate to http://127.0.0.1:8001/-/agent and run the prompt: show me some micropython You can try a live demo of that plugin running in Datasette Agent by signing into agent.datasette.io with your GitHub account. Should you trust my vibe-coded sandbox? Having complained about immature, loosely-maintained sandboxing libraries, it's deeply ironic that I've now built my own! I deliberately slapped an alpha release version on it, and I'm not ready to recommend it to anyone who isn't willing to take a significant risk. I've put it through enough testing that I'm OK using it myself. I've shipped my first plugin that uses it, datasette-agent-micropython I've also locked GPT-5.5 xhigh in that Datasette Agent plugin and challenged it to break out of the sandbox and so far it has not managed to. I'm hoping this implementation can convince some companies with professional security teams and high-stakes problems to commit to using Python in WebAssembly as a sandboxing approach and open source their own solutions. Tags: python sandboxing ai datasette webassembly generative-ai llms ai-assisted-programming codex datasette-agent micropython

链接：https://simonwillison.net/2026/Jun/6/micropython-in-a-sandbox/#atom-everything

观点：从 Running Python code in a sandbox with MicroPython and WASM 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

datasette-agent-edit 0.1a0

来源：Simon Willison

标签：#ai_engineering_blogs #core

作者：

原文：Release: datasette-agent-edit 0.1a0 I'm planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, updating large SQL queries, and editing SVG files. Agentic editing of text is a little tricky to get right. My favorite published design for this is for the Claude text editor which implements the following tools: view - view sections of a file, with line numbers added to every line. str_replace - find an exact old_str and replace it with new_str - fail if the original string is not unique insert - insert the specified text after the specified line number Rather than recreate these patterns for every plugin that needs them I decided to create this base plugin, datasette-agent-edit which implements the core tools in a way that allows them to be adapted for other plugins. Tags: ai datasette generative-ai llms llm-tool-use datasette-agent

链接：https://simonwillison.net/2026/Jun/7/datasette-agent-edit/#atom-everything

观点：datasette-agent-edit 0.1a0 的核心不在新鲜感，而在它是否能提升工程效率、部署稳定性或开发者工作流。

Siri AI at WWDC 2026

来源：Simon Willison

标签：#ai_engineering_blogs #core

作者：

原文：Given how badly burned anyone who took Apple's 2024 WWDC Apple Intelligence announcements at face value was, I'm holding to a strict "I'll believe it when I see it" policy for everything they announced today The new Siri AI features do at least look feasible with today's technology, especially since Apple are licensing a custom Gemini-derived model that they can run on their own Private Cloud Compute It sounds like they'll be taking advantage of vision LLMs to extract information from the user's screen, which neatly sidesteps the need for every existing application to ship custom code in order to integrate with Apple Intelligence. Vision LLMs were a much less mature category in June 2024. The new Core AI library looks like a good step in enabling developers to finally take full advantage of Apple's hardware for running their own models. It integrates with Meta's open source PyTorch ecosystem, using these Core AI PyTorch extensions Core AI PyTorch Extensions coreai-torch is a Python package that bridges PyTorch and Core AI. You can use it to bring up an existing PyTorch model exported as a torch.export.ExportedProgram into a Core AI AIProgram ready to run on Apple hardware, traversing the FX graph node-by-node and mapping ATen operators to Core AI operations. You can install an iOS 27 Developer Beta today, which supposedly has the new features - but you then have to make it through a waiting list for access to the new Siri AI. Aaron Perris from MacRumors reports having made it off the waitlist so we may start seeing credible reports on how well Siri AI works in the very near future. Update These Private Cloud Compute Gemini models are running in Google Cloud, and using NVIDIA hardware. According to Expanding Private Cloud Compute on Apple's Security Research blog: For the most demanding tasks, including agentic tool-use and complex reasoning, we worked with Google and NVIDIA to extend our PCC infrastructure to Google Cloud systems using NVIDIA GPUs, while maintaining Apple's powerful security and privacy protections. PCC on Google Cloud leverages many of the same architectural security patterns as PCC on Apple silicon to implement these layered protections: initial network data parsing for each request happens in a dedicated process within its own namespace, shared inference software is recycled with a short time-to-live duration, and attested keys are held in a separate, dedicated confidential VM isolated from external inputs. As with PCC on Apple silicon, all binaries will be published for public inspection. Tags: vision-llms apple generative-ai ai llms gemini nvidia google

链接：https://simonwillison.net/2026/Jun/8/wwdc/#atom-everything

观点：从 Siri AI at WWDC 2026 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

Unlocking AI flexibility in Europe: A guide to cross-region inference for EU data processing and model access

来源：AWS Machine Learning Blog

标签：#engineering_ai_infra_blogs #extended

作者：

原文：With access to the latest generative AI models and high-performance accelerated compute in high global demand, AWS customers need tools to take advantage of model availability and capacity across multiple AWS Regions, while still meeting their security and privacy requirements. cross-Region Inference (CRIS) on Amazon Bedrock meets these needs by automatically routing requests across multiple

链接：https://aws.amazon.com/blogs/machine-learning/unlocking-ai-flexibility-in-europe-a-guide-to-cross-region-inference-for-eu-data-processing-and-model-access/

观点：从 Unlocking AI flexibility in Europe: A guide to cross-region... 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。

It’s safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore

来源：AWS Machine Learning Blog

标签：#engineering_ai_infra_blogs #extended

作者：

原文：Amazon Bedrock AgentCore Runtime gives each agent session its own isolated microVM with a persistent workspace, secure tool access through Gateway, and built-in observability—so you can run Claude Code, Codex, Kiro, and Cursor in parallel without sharing secrets, ports, or filesystems. Close the lid, go to dinner, and pick up where you left off tomorrow.

链接：https://aws.amazon.com/blogs/machine-learning/its-safe-to-close-your-laptop-now-hosting-coding-agents-on-amazon-bedrock-agentcore/

观点：从 It’s safe to close your laptop now: Hosting coding agents on... 看，后续更应关注安全事故是否改变企业采购、接入和上线前的合规门槛。