ai builders

June 3, 2026

15 builders38 posts1 podcast

Listen Labs CEO Alfred Wahlforss argues that as AI makes building cheap, knowing what to build is the scarce skill, and reveals customer-research simulation that beats ChatGPT and hits 95% accuracy; meanwhile Vercel's Rauch pushes a 'YES-CODE' thesis, Box's Levie calls model routing inevitable, Roblox's Peter Yang says narrow SaaS is getting hard to monetize, and Anthropic's Thariq and OpenAI's Sottiaux ship Workflows and Codex upgrades.

X / TWITTER

Google Labs & Gemini VP Josh Woodward shipped a fix for a long-standing papercut: Thinking Levels are now available in Gemini across Web, iOS, and Android, letting users dial reasoning depth on every platform rather than just one.

Sources1

Google Labs 与 Gemini 负责人 Josh Woodward 修掉了一个困扰已久的体验问题：Thinking Levels（思考强度调节）现已在 Gemini 的 Web、iOS 和 Android 全平台上线，用户可以在任意平台上自由调整模型的推理深度，而不再局限于单一端。

OpenAI's Thibault Sottiaux, who leads Codex and ChatGPT, rolled out a batch of upgrades aimed at everyday knowledge work: business-plan users can now host and share websites, plus vastly improved plugins and skills for a broad range of roles, and the ability to give an agent feedback through visual annotations directly inside docs, slides, and sheets.

Sources1

OpenAI 负责 Codex 与 ChatGPT 的 Thibault Sottiaux 发布了一批面向日常知识工作的升级：business 套餐用户现在可以托管和分享网站，针对各类岗位的 plugins 和 skills 也大幅增强，并且可以在 docs、slides、sheets 中通过可视化批注的方式直接给 agent 反馈。

Anthropic's Thariq, who works on Claude Code, called Workflows the biggest upgrade to Claude Code's capabilities since skills and subagents, and is especially excited about the non-technical tasks it now unlocks. He published a deep dive on best practices and examples alongside the launch.

Sources1

Anthropic 负责 Claude Code 的 Thariq 表示，Workflows 是继 skills 和 subagents 之后 Claude Code 能力上最大的一次升级，他尤其看好它解锁的非技术类任务。他随发布一同发表了一篇关于最佳实践和示例的深度文章。

Roblox product lead Peter Yang argued that narrow, single-purpose SaaS is getting harder to monetize. His reasoning: AI skills can often solve the same problem in a more flexible, personalized way; AI-native agents like Codex and Claude Code carry the user's personal context and memory, giving them far more leverage than a standalone tool; and while people will pay hundreds or thousands for human-touch services, a $20/month SaaS now gets mentally benchmarked against a Claude or ChatGPT subscription. He thinks larger enterprise SaaS spanning multiple jobs (like Figma) are still fine.

Sources1

Roblox 产品负责人 Peter Yang 认为，功能单一、场景狭窄的 SaaS 正变得越来越难变现。他的逻辑是：AI skills 往往能以更灵活、更个性化的方式解决同样的问题；像 Codex、Claude Code 这样的 AI 原生 agent 携带用户的个人上下文和记忆，杠杆远高于一个独立工具；而且人们愿意为有人情味的服务花成百上千，却会把一个每月 20 美元的 SaaS 拿去和自己的 Claude 或 ChatGPT 订阅做对比。他认为像 Figma 这种能覆盖多种工作场景的大型企业级 SaaS 依然稳固。

Replit CEO Amjad Masad announced a partnership with Microsoft to let everyone in the enterprise build and deploy safe, secure Fabric data apps, made possible by Microsoft's new Rayfin SDK. He also argued that traditional SWE benchmarks don't capture app-building ability, pointing to ViBench as a better measure.

Sources1

Replit CEO Amjad Masad 宣布与 Microsoft 合作，让企业中的每个人都能构建和部署安全可靠的 Fabric 数据应用，这得益于 Microsoft 新推出的 Rayfin SDK。他还指出，传统的 SWE 基准测试并不能衡量应用构建能力，并认为 ViBench 是更合适的衡量标准。

Vercel CEO Guillermo Rauch laid out a "YES-CODE" thesis: the entire "no-code" category was built on the assumption that code is expensive, difficult, and scarce, but coding agents have flipped that equation so that code is now cheap, easy, and abundant. He recalled co-founder Malte Ubl batting away an analyst's "is Vercel a no-code platform?" with "no, it's the absolute opposite," and framed Vercel's mission as building the easiest cloud for agents that you never graduate from. In a separate post he argued that human language is now the new API to the world, the way you go direct from intent to tangible things without translating into machine instruction.

Sources1

Vercel CEO Guillermo Rauch 提出了一个 "YES-CODE"（要代码）的观点：整个 "no-code"（无代码）品类的前提是代码昂贵、困难且稀缺，但 coding agent 彻底改变了这个等式，如今代码变得廉价、简单且充裕。他回忆联合创始人 Malte Ubl 曾干脆地回绝分析师 "Vercel 是不是无代码平台" 的提问："不，恰恰相反"，并把 Vercel 的使命定义为打造一个你永远不会 "毕业离开" 的、最适合 agent 的云。在另一条帖子里他提出，人类语言如今成了通往世界的新 API，让你能从意图直接抵达实物，而无需再翻译成机器指令。

Box CEO Aaron Levie predicted that as token budgets become a larger share of operating expenses, model routing is the inevitable conclusion and one of the biggest areas of differentiation for the applied AI layer. By understanding the work patterns in your domain and building strong evals, you can peel off individual use cases and send them to cheaper models once quality is sufficient. He thinks most use cases still need frontier performance for now, and that enterprises won't be able to figure this out alone, so products that intelligently route workflows to the right model tier will aggregate demand.

Sources1

Box CEO Aaron Levie 预测，随着 token 预算在运营开支中占比越来越高，model routing（模型路由）将是必然结果，也是应用 AI 层最大的差异化方向之一。通过理解你所在领域的工作模式并建立强大的 evals，你可以在质量足够时把单个用例剥离出来交给更便宜的模型处理。他认为目前大多数用例仍需前沿模型的性能，而企业无法独自搞定这件事，因此能把工作流智能路由到合适模型档位的产品将会聚集起需求。

Hetzner-veteran-turned-AI-builder Peter Steinberger noted no notable posts beyond promotion. Skipped.

Every CEO Dan Shipper asked the community for honest reactions to Opus 4.8 a week after release. His team was extremely bullish in testing but found the public response more tepid, and he floated a theory: by nature the model pushes back on your framing a little more, making results high-variance — sometimes it does something amazing, sometimes it disagrees in a way that's obviously wrong.

Sources1

Every CEO Dan Shipper 在 Opus 4.8 发布一周后，向社区征集对它的真实反馈。他的团队在内测时极为看好，但发现公众反响偏冷淡，于是抛出一个理论：这个模型天生更倾向于挑战你的设定框架，使结果方差很大——有时表现惊艳，有时却以一种明显错误的方式表示反对。

VC investor Zara Zhang surfaced data from OpenAI's latest Codex report: knowledge workers now represent about 20% of Codex users and are adopting it more than three times as fast as developers, with the fastest-growing task types being Data Analysis (110% week-over-week growth), Research (+37%), and Knowledge Artifacts (+36%).

Sources1

投资人 Zara Zhang 分享了 OpenAI 最新 Codex 报告中的数据：知识工作者现在约占 Codex 用户的 20%，采用速度是开发者的三倍多，增长最快的任务类型是数据分析（周环比增长 110%）、研究（+37%）和知识产物（+36%）。

FPV Ventures partner Nikunj Kothari advised founders that the best ones treat each of AI/timing, funding, distribution, market, product, and revenue as a necessary component but not the entire business. He's seeing too many pitches touting just one of these as the core reason to invest. With the seed-to-A gap widening, he says you need to show how you're capturing multiple of these in a way that's hard to copy, and to communicate long-term ambition since that may be the only difference between you and the next round.

Sources1

FPV Ventures 合伙人 Nikunj Kothari 建议创始人：最优秀的人会把 AI/时机、融资、分发、市场、产品、收入中的每一项都当作业务的必要组成部分，而不是业务的全部。他看到太多 pitch 只拿其中一项当作核心投资理由。随着 seed 到 A 轮的鸿沟拉大，他表示你需要展示自己如何以难以复制的方式同时抓住其中多项，并清晰传达长期的雄心，因为这可能是你能否拿到下一轮的唯一区别。

OpenAI CEO Sam Altman weighed in on AI policy, arguing the US should lead by continuing to develop the very best models, making sure they're safe, and getting cyber tools into the hands of trusted defenders. He said the new executive order gets the balance right.

Sources1

OpenAI CEO Sam Altman 就 AI 政策发声，认为美国应通过持续开发最顶尖的模型、确保其安全、并把网络安全工具交到可信的防御者手中来保持领先。他表示新的行政命令在各方之间取得了恰当的平衡。

PODCASTS

Training Data — Knowing What Your Customers Want, All the Time: Listen Labs' Alfred Wahlforss

The Takeaway: as AI makes building cheap, the scarce skill becomes knowing what to build — and Listen Labs is turning customer research into a real-time, simulatable layer.

Alfred Wahlforss is founder and CEO of Listen Labs, an AI-first customer research platform that runs thousands of voice interviews simultaneously and already serves 20% of the Fortune 500, including Microsoft, Anthropic, and Sweetgreen. His framing of why this matters now is sharp: "as we get closer to AGI, it will be easier to build things, but the hard part will know what to build."

The counterintuitive findings are where it gets interesting. People are more honest with an AI interviewer, not less — it's a nonjudgmental, almost therapeutic entity, and because interviews are asynchronous you can pay participants less than for a human interviewer. The real moat isn't the model but the audience: Wahlforss says 80% of engineering goes into sourcing the right people, because every product is driven by a power law. Even Sweetgreen, seemingly for everyone, has a core segment (urban, high-income, mostly female, knows what seed oils are) that drives most of the revenue.

The frontier is simulation. After enough interviews on one person, Listen can predict their answers with up to 95% accuracy, and it's training "augmented response" audiences that can be queried instantly — even from inside coding agents via an MCP. Wahlforss tested 100 titles for a conference talk against his simulated customer panel and beat ChatGPT, which picked the wrong one. His take on durable advantage for vertical AI companies: proprietary evals. Listen lifted its "is the interview annoying" eval from 20% to 85%, then deliberately raised the bar with a harder one and dropped back to 20% — the climb itself is the moat.

Sources1

Training Data — 实时洞察客户所想：Listen Labs 的 Alfred Wahlforss

核心要点：当 AI 让 "构建" 变得廉价，稀缺的能力就变成了 "知道该构建什么"——而 Listen Labs 正把客户研究变成一个可实时模拟的层。

Alfred Wahlforss 是 Listen Labs 的创始人兼 CEO，这是一个 AI 原生的客户研究平台，能同时进行数千场语音访谈，已服务 20% 的财富 500 强企业，包括 Microsoft、Anthropic 和 Sweetgreen。他对这件事 "为何此刻重要" 的概括很犀利："随着我们越来越接近 AGI，构建东西会越来越容易，但难的部分将是知道该构建什么。"

最有意思的是那些反直觉的发现。人们面对 AI 访谈者时反而更诚实——它是一个不带评判、近乎有疗愈感的对象，而且因为访谈是异步的，你付给参与者的报酬可以比真人访谈更低。真正的护城河不是模型而是受众：Wahlforss 说 80% 的工程投入都花在寻找对的人上，因为每个产品都受幂律支配。就连看似面向所有人的 Sweetgreen，也有一个贡献了大部分收入的核心人群（都市、高收入、以女性为主、知道 seed oils 是什么）。

前沿在于模拟。在对一个人做足够多访谈后，Listen 能以高达 95% 的准确率预测他的回答，并正在训练可即时查询的 "augmented response" 受众——甚至能通过 MCP 在 coding agent 内部调用。Wahlforss 拿一场会议演讲的 100 个标题去测试他的模拟客户面板，结果胜过了选错答案的 ChatGPT。他认为垂直 AI 公司的持久优势在于专有 evals：Listen 把 "访谈是否令人厌烦" 这一 eval 从 20% 提升到 85%，然后又故意用更难的标准重新拉高门槛，跌回 20%——这个不断攀登的过程本身就是护城河。

Sources1