Building SQLite with an AI Swarm — The Reality of Multi-Agent Division of Labor
Six AI agents (Claude, Codex, Gemini) built a 19,000-line Rust SQLite clone in parallel. Analyzing the real costs of multi-agent coordination and task division.
Overview
Six AI agents (2 Claude, 2 Codex, 2 Gemini) collaborated in parallel to build a SQLite-compatible database engine in Rust. The result: approximately 19,000 lines of code passing 282 unit tests — a remarkably complete system.
This experiment was detailed in Kian Kyars’ blog post and gained 63 points on Hacker News.
Architecture: Software Development as a Distributed System
The core idea is simple: treat software engineering like a distributed system. Coordination happens through git, lock files, tests, and merge discipline.
Workflow
graph TD
B[Bootstrap Phase<br/>One Claude builds foundation] --> W1[Worker 1<br/>Claude]
B --> W2[Worker 2<br/>Claude]
B --> W3[Worker 3<br/>Codex]
B --> W4[Worker 4<br/>Codex]
B --> W5[Worker 5<br/>Gemini]
B --> W6[Worker 6<br/>Gemini]
W1 --> G[Git Main Branch]
W2 --> G
W3 --> G
W4 --> G
W5 --> G
W6 --> G
G --> T[Test Validation<br/>sqlite3 Oracle]
Agent Loop
Each agent runs an infinite loop:
- Pull latest main branch
- Claim one scoped task (lock file)
- Implement + test against sqlite3 as oracle
- Update shared progress docs/notes
- Push
Implemented Features
The completeness of this swarm-built SQLite clone is remarkable:
| Layer | Components |
|---|---|
| Parser | SQL parser |
| Planner | Stats-aware query planning |
| Executor | Volcano model executor |
| Storage | Pager, B+ trees |
| Transactions | WAL, recovery, transaction semantics |
| Features | JOINs, aggregates, indexing, grouped aggregates |
Total: 154 commits over 2 days (2026-02-10 to 02-12).
The Coordination Tax
The most fascinating finding is the coordination tax.
Total commits: 154
Coordination commits: 84 (54.5%)
├── Lock claims
├── Lock releases
├── Stale lock cleanup
└── Task coordination
54.5% of all commits were pure coordination overhead. This demonstrates that multi-agent parallel throughput depends heavily on lock hygiene and stale-lock cleanup discipline.
Success Factors
1. Oracle-Based Validation + High Test Cadence
Using sqlite3 as the ground truth oracle for validation was decisive. Fast feedback loops through cargo test and ./test.sh kept agents on track.
2. Strong Module Boundaries
graph LR
P[Parser] --> PL[Planner] --> E[Executor] --> S[Storage]
Clear module boundaries — parser → planner → executor → storage — allowed agents to work on orthogonal slices with minimal merge conflicts.
3. Shared State Docs as Runtime, Not Documentation
PROGRESS.md and design notes functioned as system runtime state, not mere documentation. This highlights the importance of shared state management in multi-agent collaboration.
Limitations and Lessons
The Missing Coalescer
A coalescer agent was built to clean up duplication and drift, but it only ran once at project end. Gemini couldn’t complete the full deduplication and stopped midway. The coalescer should run as frequently as the other agents.
Untrackable Token Usage
Different platforms use different usage formats, making it impossible to determine which agent contributed most.
Documentation Explosion
PROGRESS.md grew to 490 lines, and the notes directory accumulated massive amounts of documentation — a visible cost of inter-agent communication.
Connection to Prior Work
This experiment aligns with Verdent AI’s multi-agent SWE-bench results. While Verdent demonstrated parallel execution effectiveness on benchmarks, this SQLite project proves multi-agent division of labor on real system construction.
Key shared insights:
- Give agents a narrow interface, a common truth source, and fast feedback, and you get compounding throughput on real systems code
- Tests are the anti-entropy force
Key Summary
| Metric | Value |
|---|---|
| Agent count | 6 (Claude 2 + Codex 2 + Gemini 2) |
| Lines of code | ~19,000 (Rust) |
| Commits | 154 |
| Coordination overhead | 54.5% |
| Tests | 282 passing |
| Development time | 2 days |
Conclusion
This experiment reveals both the promise and limits of multi-agent development. Six agents building a working 19,000-line database engine in 2 days is impressive, but the fact that over half of all commits were coordination overhead cannot be ignored.
Parallelism is powerful, but only with strict task boundaries. And tests aren’t just quality assurance — they’re the core mechanism fighting entropy in agent systems.
References
Was this helpful?
Your support helps me create better content. Buy me a coffee! ☕