Building a Good Vertical Agent
How do you build an agent that actually wins in a domain — one customers pick because it's better? The basics of an agent have been standardized: it's a while-loop around a model...
How do you build an agent that actually wins in a domain — one customers pick because it's better?
Peter Wang(Anaconda 联合创始人、PyData 创造者)基于他近一年构建 Shortcut(电子表格垂直 Agent)的经验,阐述了如何构建真正在垂直领域胜出的 Agent。核心观点:人人都能写 Agent,但把 Agent 做好靠的不是技巧,而是对领域的深刻理解和耐心。
Agent 的基础已标准化
The basics of an agent have been standardized over the past year: it's a while-loop around a model that calls tools until the task is done. Give it a filesystem, give it a shell, and let it do most things through that. You can write it in an afternoon, and most people have.
人人都能做 Agent,但好 Agent 不靠技巧
Everyone can build an agent — it really isn't that hard, and, as I'll spell out, it isn't that deep either. What separates a good one from a toy isn't cleverness; it's a real understanding of your domain and the patience to do some tedious, careful work.
关键单元是 Harness
The unit that matters is the harness that loops across models, data, and tools.
核心架构设计决策决定了 Agent 好坏:
This is the most important design decision in the entire architecture, and the one that separates good agent design from bad.
Harness 需要随 Agent 进化
The agent harness needs to evolve with the agent. A recipe that works for one model size can behave very differently at another size. Each model has its own...
闭环:Trained Model + Harness + RL Environment
相关推文(6月10日):
attempting to close the big beautiful loop of trained model, harness (shortcut), and rl environment (mog).
三个核心组件形成闭环:
- Trained Model — 专门训练的模型
- Harness (Shortcut) — 编排层 / Agent 框架
- RL Environment (Mog) — 强化学习环境
案例研究:Shortcut 电子表格 Agent
- 定位:最准确的电子表格 Agent("the most accurate spreadsheet agent around")
- 部署:已部署到三家最大的金融机构
- 技术栈:SpreadJS surface(Excel 插件 + Shortcut Web)
- 能力:LBO、DCF、三报表模型等金融建模
- 评价:在 Excel 世界锦标赛案例中得分 >80%,速度比人类快 10 倍
关键洞察
- Agent 基础是商品:while-loop + model + tools,一个下午就能写完。竞争壁垒不在这里。
- 垂直 Agent 的护城河是领域深度:真正理解领域 + 耐心做枯燥但关键的工作。
- Harness 是核心架构单元:不是模型,不是数据,是横跨模型、数据和工具的编排层。
- 工具设计决定 Agent 上限:用 DSL / 编程语言的完整表达力,而非受限的接口。
- 模型规模变化时 Harness 需要适配:同一个 recipe 在不同模型规模上表现可能完全不同。
- 最终目标是闭环:训练模型 → Harness 部署 → RL 环境评估 → 反馈改进模型,形成飞轮。
来源: Peter Wang @ X