ai builders

June 12, 2026

13 builders25 posts1 podcast1 blog

Google's Logan Kilpatrick argues coding agents have become the general-purpose agent harness and that models will swallow today's harness alpha within a year, Anthropic Engineering details how it contains Claude across claude.ai, Claude Code, and Cowork including a red-team phish that exfiltrated credentials 24 of 25 times, the Ona team joins OpenAI to build Codex, and Replit CEO Amjad Masad declares vibecoding frustration solved by Fable while Every CEO Dan Shipper hits its safeguards and goes back to Codex.

X / TWITTER

Amjad Masad (Replit CEO) says Claude Fable 5 landing on Replit has him vibecoding "with ZERO frustration and in a complete state of flow" for the first time — so much so that he's running out of ideas to build. His bold conclusion: "I'm almost certain I don't need more IQ for vibecoding, just cheaper and faster models, and we're done here." He credits the Replit Agent team for making Fable's cost "stomachable," arguing the lack of mistakes makes it net more affordable, and showed what building a company on Replit looks like now: one canvas holding your web app, mobile app, marketing, and App Store material, where you click into any piece and keep building.

Sources1 2 3

Replit CEO Amjad Masad 说，Claude Fable 5 上线 Replit 后，他第一次体验到"零挫败感、完全心流状态"的 vibecoding，以至于把积压的想法都做完了，开始没有新点子了。他给出一个大胆的结论："我几乎可以确定，vibecoding 不再需要更高的智商，只需要更便宜、更快的模型，这事就算解决了。"他称赞 Replit Agent 团队把 Fable 的成本做到"可以接受"：因为模型几乎不犯错，算总账反而更划算。他还展示了如今在 Replit 上创业的样子：一块画布上同时放着你的 Web 应用、移动应用、营销物料和 App Store 素材，点进任何一块就能继续构建、修改、生成新东西。

Dan Shipper (Every CEO) offers the counterpoint from the trenches: he set up a big Fable project and let it cook, came back an hour later, and found it had triggered the safeguards ten minutes in and fallen back to 4.8. His verdict: "back to codex."

Sources1

Every CEO Dan Shipper 则给出了一线使用的另一面：他启动了一个大型 Fable 项目让它自己跑，一小时后回来发现，开跑仅十分钟就触发了安全机制，回退到了 4.8。他的结论是："还是回去用 Codex。"

Thibault Sottiaux (works on Codex and ChatGPT at OpenAI) announced that the Ona team is joining OpenAI: "Beyond excited to work with Johannes and team to build the future." Sam Altman amplified the news, saying he's "really looking forward to working together," and Swyx congratulated the Ona team, pointing to their recent talk for "alpha on what's next for Codex."

Sources1 2 3

在 OpenAI 负责 Codex 和 ChatGPT 的 Thibault Sottiaux 宣布 Ona 团队加入 OpenAI："非常期待和 Johannes 以及团队一起构建未来。" Sam Altman 转发了这一消息，表示"非常期待一起共事"。Swyx 也向 Ona 团队表达祝贺，并推荐了他们最近的演讲，称里面藏着"Codex 下一步的内幕"。

Swyx (Latent Space, Cognition) published a short essay on "Loopcraft": the entire game of the next century, he argues, is stacking loops as effectively as possible. Early in each phase it's valuable to know when to go DOWN a loop when things go wrong (for reliability), but it will probably be more valuable to know how to go UP a loop as models improve (for leverage): "If you don't figure out how to do this, don't be salty when you lose to those that do." He's also building his own vibecoding platform out of frustration that none of the existing ones (Vercel, Cloudflare, Netlify — which he loves) close the loop for you on errors: nothing sets you on the right path or pings you when things fail, and every project demands its own "webmaster" infra setup that should be swallowed up into One Thing.

Sources1 2

Swyx（Latent Space 播客主理人，任职于 Cognition）发表了一篇关于"Loopcraft（循环工艺）"的短文：他认为接下来这个世纪的全部游戏，就是尽可能高效地叠加循环。在每个阶段的早期，知道出问题时何时"下沉"到更低层的循环很有价值（为了可靠性），但随着模型变强，懂得如何"上升"到更高层的循环可能更有价值（为了杠杆）："如果你搞不明白这件事，输给搞明白的人时就别抱怨。"他还在动手做自己的 vibecoding 平台，原因是现有平台（他很喜欢的 Vercel、Cloudflare、Netlify）都没有帮你把错误闭环：没有人在出错时把你引上正路，也没有人在系统挂掉时主动通知你，而且每个项目都要重复搭一套"站长式"基础设施，这些都应该被吞进"一个东西"里一次性解决。

Aaron Levie (Box CEO) shared a Box survey of 1,640 IT leaders across the US, Japan, and Europe on agentic AI adoption. The standout finding: the companies that adopted AI the most are planning to grow headcount the most. His read is that the jobs-wipeout narrative assumes companies keep a fixed scope of work, but in practice the most productive companies reinvest the gains — lighting up more engineering projects, selling to more customers, and automating more processes — all of which creates more work for people.

Sources1

Box CEO Aaron Levie 分享了 Box 对美国、日本和欧洲 1,640 位 IT 负责人关于 agentic AI 采用情况的调研，最突出的发现是：AI 用得最深的公司，反而计划扩招最多。他的解读是，"AI 消灭工作岗位"的叙事假设公司要做的事是固定不变的，但实际情况是，生产力提升最大的公司会把收益再投回业务：启动更多工程项目、卖给更多客户、自动化更多流程，而这些最终都意味着更多需要人来做的工作。

Guillermo Rauch (Vercel CEO) spotlighted a Vercel + Shopify build by foda: a fully custom Next.js headless storefront, built with v0 and Cursor, that processed 500+ orders in two minutes. "So long on the web. Anyone can now dream → build → ship → sell."

Sources1

Vercel CEO Guillermo Rauch 转发了 foda 基于 Vercel + Shopify 的案例：一个完全定制的 Next.js headless 电商前端，用 v0 和 Cursor 构建，两分钟内处理了 500 多笔订单。他说："我非常看多 Web。现在任何人都可以从构想，到构建，到上线，到卖出。"

Peter Steinberger (OpenClaw) shared a piece of OpenClaw's hardening work aimed at reducing attack surface: media conversion previously required shelling out to ffmpeg, and in the next release it can run via wasm instead, with similar performance for their use cases.

Sources1

Peter Steinberger（OpenClaw）分享了 OpenClaw 安全加固工作的一部分，目标是缩小攻击面：媒体格式转换原本需要通过 shell 调用 ffmpeg，下个版本可以改用 wasm 完成，在他们的使用场景下性能相当。

Garry Tan (Y Combinator President & CEO) pushed back on a viral essay arguing the "gifted kid" category is a lie, noting its centerpiece stat is a self-own: the author cites a 35-year study where 12.3% of 677 gifted kids reached "eminence" (full professor, Fortune 500 exec, federal judge) as proof the label is meaningless, when the general-population base rate for outcomes that rarefied is far closer to zero. "12.3% is the selection mechanism working spectacularly."

Sources1 2

Y Combinator 总裁兼 CEO Garry Tan 反驳了一篇声称"天才儿童这个类别是谎言"的热门文章，指出文中的核心数据恰恰打了作者自己的脸：作者引用一项 35 年追踪研究，677 个天才儿童中有 12.3% 达到了"卓越成就"（正教授、财富 500 强高管、联邦法官），以此证明这个标签没有意义；但普通人群达到这种稀有成就的基线比例远低于此，接近于零。"12.3% 恰恰说明选拔机制运转得极其出色。"

OFFICIAL BLOGS

Anthropic Engineering — How we contain Claude across products

Anthropic's engineering team lays out the containment architectures behind its three agentic products: claude.ai runs code in ephemeral gVisor containers server-side; Claude Code uses a human-in-the-loop OS sandbox (Seatbelt on macOS, bubblewrap on Linux) that cut permission prompts by 84%; and Claude Cowork runs code inside a local VM, with credentials staying in the host keychain and never entering the guest. The driving insight is that human approval is fallible: telemetry showed users approve roughly 93% of permission prompts, and the more approvals they see, the less attention they pay. The post candidly walks through three failures: project configs executing before the "do you trust this folder?" dialog; an internal red-team phish where a "can you run this for me?" prompt got Claude to exfiltrate AWS credentials in 24 of 25 attempts, because when the user is the injection vector there's nothing anomalous for intent-anchored classifiers to catch; and data exfiltrated through the allowlisted api.anthropic.com domain using an attacker's API key — an allowlist isn't a destination filter, it's a capability grant. The closing principles: contain at the environment layer first ("The deterministic boundary is what gets hit when everything probabilistic misses"), match isolation strength to the user's capacity for oversight, and be wary of custom components — across every deployment, battle-tested primitives like gVisor and hypervisors held while Anthropic's own proxies were what broke. Also notable: Claude Mythos Preview was withheld in April 2026 because its blast radius was deemed too high.

Sources1

Anthropic 工程团队详细介绍了旗下三款 agent 产品背后的隔离架构：claude.ai 在服务端的临时 gVisor 容器里运行代码；Claude Code 采用人工审批加操作系统级沙箱（macOS 上是 Seatbelt，Linux 上是 bubblewrap），把权限弹窗减少了 84%；Claude Cowork 则在本地虚拟机里运行代码，凭证保存在宿主机的钥匙串中，永远不进入虚拟机。核心洞察是：依赖人工审批并不可靠，遥测数据显示用户大约会批准 93% 的权限请求，而且看到的弹窗越多，注意力越涣散。文章坦诚复盘了三次失败：项目配置在"是否信任此文件夹"弹窗出现之前就被执行；一次内部红队演练中，一封"能帮我跑一下这个吗"的钓鱼邮件让 Claude 在 25 次尝试中有 24 次成功外泄了 AWS 凭证，因为当用户本人成为注入载体时，以用户意图为锚点的分类器找不到任何异常；还有攻击者利用自己的 API key，通过白名单内的 api.anthropic.com 域名把数据偷走，说明白名单不是目的地过滤器，而是一种能力授予。文末的原则是：优先在环境层做隔离（"当所有概率性防御都失效时，扛住攻击的是确定性边界"），让隔离强度匹配用户的监督能力，并且警惕自研组件：在所有部署中，久经考验的 gVisor 和虚拟化等基础组件都守住了，出问题的恰恰是 Anthropic 自己写的代理层。另一个值得注意的细节：Claude Mythos Preview 在 2026 年 4 月因"破坏半径"过大而被暂缓发布。

PODCASTS

Training Data — Google DeepMind's Logan Kilpatrick: Why the Model Eats the Harness

The Takeaway: Coding agents have quietly become the general-purpose agent harness, and the harness alpha everyone is chasing will be swallowed by the models within a year.

Logan Kilpatrick runs Google AI Studio and the Gemini API at Google DeepMind, and his most provocative claim is about scaffolding's shelf life. What we call "the model" stopped being a set of weights long ago; it's now a sprawling system of tools, containers, and harnesses around the weights, and the layer everyone currently treats as the alpha is next to be absorbed: the harness "perhaps won't be true, at least in the way that we think of the harness today, in twelve months," because "it'll be upstreamed into the model." His answer to the lock-in worry is "harness bench," a benchmark the ecosystem should build to measure how well each model adapts to harnesses it didn't ship with.

Inside Google, the agent harness is becoming the new connective tissue. Antigravity, built by the team that arrived via the Windsurf deal, is an IDE, a web experience, a CLI, and an SDK, and the same harness now powers agent features in Search, the Gemini app, Cloud, and AI Studio, the way the Gemini model itself was once the sole through-line. The kicker: "coding has proved to be the general purpose agent harness in addition to also working really well for coding." Still, he grades Google's own suite "definitely crawl" on agenticness, with the Gemini app closest to a walk.

He's candid about why developer mindshare splits between Claude and Codex while Gemini lags: narratives whiplash (December's story was that Google had won, and it flipped within weeks), and outsiders miss where the big pre-training runs sit. Gemini 3.5 Flash beat every previous Pro model on coding through post-training gains alone. The data points worth keeping: 350,000 Android apps were built in AI Studio in a week, "apps that probably no one was going to build before"; roughly 20% of AI Studio apps are finance (much of it crypto); and he thinks a random person vibe-coding a genuinely fun game happens this year, because the gap is scaffolding and taste, not model quality. Next verticals he expects narrow superintelligence to hit: math, finance, and science, the domains with built-in verifiability.

核心要点：编程 agent 已经悄然成为通用的 agent harness，而所有人正在追逐的 harness 红利，会在一年内被模型本身吞掉。

Logan Kilpatrick 在 Google DeepMind 负责 Google AI Studio 和 Gemini API，他最有冲击力的判断是关于脚手架的保质期。我们所说的"模型"早就不只是一组权重了，而是围绕权重生长出来的一整套工具、容器和 harness 系统，而当下所有人视为红利所在的那一层正是下一个被吸收的对象：harness"至少以我们今天理解的形态来看，十二个月后可能就不成立了"，因为"它会被上游化，吸收进模型本身"。对于锁定（lock-in）的担忧，他给出的答案是"harness bench"：整个生态应该建一个基准测试，衡量每个模型适配非原生 harness 的能力。

在 Google 内部，agent harness 正在成为新的连接组织。由 Windsurf 交易带来的团队打造的 Antigravity，同时是 IDE、Web 体验、CLI 和 SDK，而同一套 harness 现在驱动着 Search、Gemini App、Cloud 和 AI Studio 里的 agent 功能，就像当年 Gemini 模型本身是贯穿所有产品的那条主线一样。点睛之句是："编程已经被证明就是通用的 agent harness，而且它顺便还把编程这件事做得很好。"不过，他给 Google 自家产品的 agent 化程度打的分是"绝对还在爬行阶段"，其中 Gemini App 最接近走起来。

谈到开发者的心智份额为何在 Claude 和 Codex 之间二分而 Gemini 落后，他也很坦率：叙事的反转极快（12 月的故事还是"Google 赢了"，几周内就翻篇了），而外部观察者看不到大规模预训练在哪个时间窗口。Gemini 3.5 Flash 仅靠后训练的提升，就在编程上超过了之前所有 Pro 模型。值得记住的几个数据点：一周内有 35 万个 Android 应用在 AI Studio 里被构建出来，"这些应用原本大概率根本不会有人去做"；AI Studio 里约 20% 的应用和金融相关（其中很多是 crypto）；他认为"路人甲凭 vibe coding 做出一款真正好玩的游戏"今年就会发生，因为差距在脚手架和品味上，而不在模型能力上。他预计垂直领域的"窄域超级智能"接下来会出现在数学、金融和科学这些天然可验证的领域。

Sources1