ai builders

June 11, 2026

18 builders37 posts1 podcast

Anthropic Labs head Mike Krieger argues Fable 5 shifts the scarce skill from prompting to delegation backed by verification loops, Box's Aaron Levie publishes eval numbers showing Fable beating Opus 4.8 across nearly every industry, Anthropic's Thariq details how Fable edited its own launch video end-to-end, and OpenAI's Thibault Sottiaux flags an unexplained 48-hour Codex usage spike while Zara Zhang, Madhu Guru, and Peter Yang weigh in on cross-team agents, start-with-the-best-model strategy, and builder careers.

X / TWITTER

Aaron Levie (Box CEO) published results from Box's internal AI Complex Work Eval pitting Claude Fable 5 against Opus 4.8 on hard, real-world enterprise document tasks, and Fable posted big jumps across almost every industry: Media & Entertainment 78% vs 61%, Technology 81% vs 73%, Financial Services 89% vs 83%, Healthcare 66% vs 60%. The differentiators he highlights: Fable doesn't take shortcuts on complex reasoning, gets multi-step calculations right, and is significantly more consistent across runs. One telling example: on a genre profitability projection, Fable recognized that a 20% Argentine tax deduction was already embedded in the source spreadsheet figures, while Opus applied it again on top, a compounding error that drove its score negative. Fable will be available shortly in Box AI Studio for customers to build agents with.

Sources1

Box CEO Aaron Levie 公布了 Box 内部 AI Complex Work Eval 的评测结果，用真实世界的高难度企业文档任务对比 Claude Fable 5 和 Opus 4.8，Fable 几乎在每个行业都大幅领先：媒体与娱乐 78% 对 61%，科技 81% 对 73%，金融服务 89% 对 83%，医疗 66% 对 60%。他强调的差异点是：Fable 在复杂推理上不走捷径，多步计算能算对，而且多次运行之间的一致性明显更高。一个很说明问题的例子：在一个影视类型盈利预测任务中，Fable 识别出 20% 的阿根廷税收抵扣已经包含在源表格数字里，而 Opus 又叠加扣了一遍，这个复合错误直接把得分拉成负数。Fable 很快会上线 Box AI Studio，供客户构建 agent。

Thariq (works on Claude Code at Anthropic) made a video answering a question a lot of people asked: how he used Fable to edit its own launch video. The model wrote code and tool calls to run transcription services, drive ffmpeg, do color grading, use the Figma MCP, and build and render Remotion UI; he never touched a video editor. He also shared the deck from the video for anyone who wants to go through it themselves.

Sources1 2

在 Anthropic 做 Claude Code 的 Thariq 发了一个视频，回答很多人都在问的问题：他是怎么用 Fable 来剪辑 Fable 自己的发布视频的。模型写了大量代码和 tool call：调用转录服务、驱动 ffmpeg、做调色、使用 Figma MCP、搭建并渲染 Remotion UI，他全程没有碰过视频剪辑软件。他还分享了视频里用到的 deck，想自己研究的人可以直接看。

Zara Zhang (builder) argues teams should build agents and skills for their cross-functional counterparts: if a design team builds a design agent trained on the brand's guidelines for the marketing team, marketing can produce on-brand assets without bugging designers, and orgs can move from being organized by "functions" to being organized by "loops". She also observes that most San Francisco startups are selling to each other: when she asks founders who their target audience is, 90% of the time it's "engineering and product teams, AI-native startups", while very few build for the other 99% of the world. And she notes that the output of an agency increasingly looks like a folder of files for agents instead of one-off assets: "Get paid for your mind, not your hands."

Sources1 2 3

Builder Zara Zhang 提出，每个团队都应该为跨职能的协作团队构建 agent 和 skill：比如设计团队基于品牌设计规范为市场团队训练一个设计 agent，市场团队就能自己产出符合品牌调性的素材，不用每次都麻烦设计师，组织也能从按"职能"划分走向按"闭环"组织。她还观察到，旧金山的大多数创业公司其实在互相卖产品：她问创始人目标用户是谁，90% 的回答是"AI-native 创业公司的工程和产品团队"，很少有人为世界上另外 99% 的人构建产品。她还指出，agency 的交付物正越来越像一个给 agent 用的文件夹，而不是一次性的成品："靠你的头脑赚钱，而不是靠你的双手。"

Madhu Guru (previously a product leader at Google on Gemini, Veo, and Nano Banana) shares the rule of thumb his team gave enterprise customers, who kept getting the quality/cost tradeoff wrong by starting with the smallest, cheapest model: if you're replacing a traditional ML model with an LLM, start small, because you already know what good looks like; if you're building something new, start with the most capable model, think magically, and figure out what's actually possible first. Once you have a high-quality working application, move to a smaller model while maintaining quality.

Sources1

曾在 Google 负责 Gemini、Veo、Nano Banana 的产品负责人 Madhu Guru 分享了他们给企业客户的经验法则。客户经常在质量与成本的权衡上犯错，一上来就用最小最便宜的模型。正确做法是：如果你是在用 LLM 替换一个传统 ML 模型，可以从小模型开始，因为你已经知道"好"长什么样；如果你是在构建全新的东西，就从能力最强的模型开始，大胆设想，先搞清楚到底什么是可能的。等你有了一个高质量的可用应用，再在保持质量的前提下迁移到更小的模型。

Thibault Sottiaux (works on Codex and ChatGPT at OpenAI) confirms a strong spike in growth of token consumption for Codex over the last 48 hours, calling it "unusual when we don't launch something." He also welcomed two new team members, Clint and Michael, saying he's excited about what they'll do together to contribute to the cybersecurity field and accelerate defenders across the globe.

Sources1 2

OpenAI 负责 Codex 和 ChatGPT 的 Thibault Sottiaux 确认，过去 48 小时 Codex 的 token 消耗增长出现了明显飙升，他说"在我们没有发布任何新东西的情况下，这很不寻常"。他还欢迎了两位新成员 Clint 和 Michael 加入，期待一起为网络安全领域做贡献，加速全球防御者的能力建设。

Peter Yang (writes practical AI tutorials for 150K+ readers) tells builders to give themselves permission to build: the traditional career ladder pushes everyone to become a "leader" whose calendar fills with product reviews, cross-functional alignment, and performance calibrations, and he knows a lot of builders who spent their best years climbing the wrong ladder. The good news is this is finally changing: companies are rewarding builders and ICs more than ever, and even managers are increasingly expected to do IC work. Separately, he notes the more he uses Codex, the more ambitious his requests get.

Sources1 2

为 15 万以上读者写实用 AI 教程的 Peter Yang 呼吁 builder 们"允许自己去构建"：传统职业阶梯把所有人都往"领导者"的方向推，日程被产品评审、跨部门对齐和绩效校准填满，他认识很多 builder 把最好的年华花在了爬错的梯子上。好消息是这件事终于在变化：公司比以往任何时候都更奖励 builder 和 IC，连管理者也越来越被要求亲自做一线工作。另外他提到，Codex 用得越多，他提的需求就越有野心。

Dan Shipper (Every CEO) points to reshoring news as something he predicted on Lenny's podcast last year: when AI makes each individual employee much more productive, it becomes appealing to reshore certain jobs back to the US to be close to customers.

Sources1

Every CEO Dan Shipper 提到一则岗位回流美国的新闻，正是他去年在 Lenny 的播客上做出的预测：当 AI 让每个员工的产出大幅提升后，把某些岗位迁回美国、贴近客户就变得有吸引力了。

Guillermo Rauch (Vercel CEO) writes that what he loves about Silicon Valley is that the future is up for grabs, ready for anyone to build: he gets angel-investing intros to all kinds of people and takes everyone equally seriously, "2 lads & a dog, or a 5-time award-winning entrepreneur. No place more meritocratic." He's also teasing special announcements at Vercel Ship in London next week.

Sources1 2

Vercel CEO Guillermo Rauch 写道，他最爱硅谷的一点是：未来悬而未决，任何人都可以动手去构建。他在天使投资上会收到各种各样的人的引荐，而他对每个人都一视同仁地认真对待，"无论是两个小伙子加一条狗，还是拿过五次大奖的连续创业者。没有比这更任人唯贤的地方了。"他还预告下周在伦敦的 Vercel Ship 大会上会有特别发布。

Garry Tan (Y Combinator President & CEO) recommends Nessie as the best way to get all your existing context, memory, and history out of ChatGPT, Perplexity, and Gemini and into all the other places you have memory, including OpenClaw / Hermes Agent, praising its OpenClaw and MCP servers as "ace."

Sources1

Y Combinator 总裁兼 CEO Garry Tan 推荐了 Nessie：它是把你在 ChatGPT、Perplexity、Gemini 里积累的上下文、记忆和历史迁移到其他所有有记忆的地方（包括 OpenClaw / Hermes Agent）的最佳方式。他称赞 Nessie 的 OpenClaw 和 MCP server 做得"一流"。

Google Labs announced that Project Genie access is expanding again: starting today, Google AI Ultra 5X subscribers (the latest tier) globally can access Project Genie.

Sources1

Google Labs 宣布 Project Genie 的开放范围再次扩大：从今天起，全球的 Google AI Ultra 5X（最新档位）订阅用户都可以使用 Project Genie。

Claude (official Anthropic account) announced that scheduled deployments and environment variables in vaults are available today on the Claude Platform. The account also spotlighted Cursor cofounder Michael Truell in its The Problem Solvers series: Cursor went from 15 to 700 people in two years, and over 60% of the Fortune 500 now build with its AI coding platform.

Sources1 2

Anthropic 官方账号 Claude 宣布，定时部署（scheduled deployments）和保管库中的环境变量（environment variables in vaults）今天起在 Claude Platform 上可用。该账号还在 The Problem Solvers 系列里介绍了 Cursor 联合创始人 Michael Truell：Cursor 两年内从 15 人增长到 700 人，财富 500 强中超过 60% 的公司在用它的 AI 编程平台进行开发。

Josh Woodward (VP at Google, leading Google Labs, the Gemini app, and AI Studio) handled a Gemini outage in public: he posted that the team was on it with some fixes already in, and later confirmed everything is back up and running.

Sources1 2

Google 副总裁、负责 Google Labs、Gemini App 和 AI Studio 的 Josh Woodward 公开处理了一次 Gemini 宕机：他先发帖说明团队正在抢修、部分修复已经上线，随后确认服务已全部恢复正常。

PODCASTS

AI & I by Every — How Anthropic Uses Claude Fable 5 With Mike Krieger

The Takeaway: With a Mythos-class model, the scarce skill shifts from prompting to delegation: align on intent upfront, build verification loops, then trust the model to carry complex work through to completion.

Mike Krieger cofounded Instagram and now runs Anthropic Labs, which makes him one of the few people who used Claude Fable 5 daily for months before launch. His first reaction was feeling "like a total newbie again": his habits for prompting and decomposing tasks were suddenly obsolete. His new pattern is to hold an architectural planning conversation upfront, ask the model to produce an HTML page or diagram so the team can align, then hand off whole chunks of work, often overnight. He regularly wishes Claude a good night and wakes up to finished tasks; when a remote service died mid-task one night, the model scaffolded a temporary backend, documented the workaround, and fixed it when the service came back. Over one weekend of kicking off work between hikes with his kids, he built a media tracker whose agent can modify the app from inside the app, a project on the scale of what once took five days of all-nighters for Instagram v1.

The specifics worth stealing: Fable is the first model where he actively varies effort levels, dropping to medium for UI tweaks, and he routes quick questions to Sonnet ("this is not a Fable-worthy question"). On the high price, his reframe is to measure what it costs to complete a task to your satisfaction rather than per turn; Fable wins by skipping the nine or ten "that's not quite what I meant" follow-ups. The training advance he singles out is judgment: the model pushes back on code review feedback ("I thought about it and I still disagree") and nagged him for days about a feature flag he never flipped. Verification is the new discipline: every Claude PR at Anthropic ships with a screenshot gallery or video, and he gives Claude video captures plus ffmpeg so it can catch animation jank a screenshot would miss. Software engineering, he argues, isn't over but different: ownership, production intuition, and judging what's worth building stay human. The line that stuck came from a recruiting colleague using an internal Fable tool: "It is the first time in my life where I feel like the thing that's in my head and the thing that exists in the world is right next to each other."

核心要点：面对 Mythos 级别的模型，稀缺的能力不再是写 prompt，而是委派：先对齐意图，搭好验证闭环，然后信任模型把复杂工作完整做完。

Mike Krieger 是 Instagram 的联合创始人，现在执掌 Anthropic Labs，这让他成为少数在发布前就连续几个月每天使用 Claude Fable 5 的人。他的第一反应是"感觉自己又成了彻头彻尾的新手"：原有的 prompt 写法和任务拆解习惯一下子全过时了。他的新模式是：先和模型进行架构层面的规划对话，让它产出一个 HTML 页面或图表让团队对齐，然后把成块的工作整体交出去，常常是过夜执行。他经常"和 Claude 道晚安"，第二天醒来任务已经完成；有一次夜里远程服务挂了，模型自己搭了一个临时的替代后端，记录下这个变通方案，等服务恢复后再修好。一个周末，他在陪孩子徒步的间隙启动任务，做出了一个媒体追踪应用，应用里的 agent 可以从应用内部修改应用本身，这种规模的项目当年 Instagram v1 花了他五天通宵。

值得借鉴的细节：Fable 是第一个让他主动调节 effort 档位的模型，改 UI 细节时降到 medium，问快问题时直接切到 Sonnet（"这个问题不配用 Fable"）。关于高昂的价格，他的换算方式是：该算的是"把任务做到你满意为止"的总成本，而不是每轮对话的成本，Fable 的优势在于一次做对，省掉九、十轮"这不是我想要的"的来回。如果让他从训练成果里挑一个最突出的进步，那就是判断力：模型会对 code review 的意见据理反驳（"我想过了，但我还是不同意"），还连续几天提醒他有个 feature flag 一直没打开。验证是新的基本功：在 Anthropic，每个 Claude 提交的 PR 都附带截图集或视频，他还会给 Claude 视频录屏加 ffmpeg，让它能发现截图根本捕捉不到的动画卡顿。他认为软件工程没有结束，只是变了：所有权、生产环境的直觉、判断什么值得做，依然是人的事。最打动他的一句话来自一位使用内部 Fable 工具的招聘同事："这是我人生中第一次感觉，我脑子里的东西和现实世界里存在的东西，就挨在一起。"

Sources1