Building SQLite with an AI Swarm — The Reality of Multi-Agent Division of Labor

Building SQLite with an AI Swarm — The Reality of Multi-Agent Division of Labor

Six AI agents (Claude, Codex, Gemini) built a 19,000-line Rust SQLite clone in parallel. Analyzing the real costs of multi-agent coordination and task division.

Overview

Six AI agents (2 Claude, 2 Codex, 2 Gemini) collaborated in parallel to build a SQLite-compatible database engine in Rust. The result: approximately 19,000 lines of code passing 282 unit tests — a remarkably complete system.

This experiment was detailed in Kian Kyars’ blog post and gained 63 points on Hacker News.

Architecture: Software Development as a Distributed System

The core idea is simple: treat software engineering like a distributed system. Coordination happens through git, lock files, tests, and merge discipline.

Workflow

graph TD
    B[Bootstrap Phase<br/>One Claude builds foundation] --> W1[Worker 1<br/>Claude]
    B --> W2[Worker 2<br/>Claude]
    B --> W3[Worker 3<br/>Codex]
    B --> W4[Worker 4<br/>Codex]
    B --> W5[Worker 5<br/>Gemini]
    B --> W6[Worker 6<br/>Gemini]
    W1 --> G[Git Main Branch]
    W2 --> G
    W3 --> G
    W4 --> G
    W5 --> G
    W6 --> G
    G --> T[Test Validation<br/>sqlite3 Oracle]

Agent Loop

Each agent runs an infinite loop:

  1. Pull latest main branch
  2. Claim one scoped task (lock file)
  3. Implement + test against sqlite3 as oracle
  4. Update shared progress docs/notes
  5. Push

Implemented Features

The completeness of this swarm-built SQLite clone is remarkable:

LayerComponents
ParserSQL parser
PlannerStats-aware query planning
ExecutorVolcano model executor
StoragePager, B+ trees
TransactionsWAL, recovery, transaction semantics
FeaturesJOINs, aggregates, indexing, grouped aggregates

Total: 154 commits over 2 days (2026-02-10 to 02-12).

The Coordination Tax

The most fascinating finding is the coordination tax.

Total commits: 154
Coordination commits: 84 (54.5%)
├── Lock claims
├── Lock releases
├── Stale lock cleanup
└── Task coordination

54.5% of all commits were pure coordination overhead. This demonstrates that multi-agent parallel throughput depends heavily on lock hygiene and stale-lock cleanup discipline.

Success Factors

1. Oracle-Based Validation + High Test Cadence

Using sqlite3 as the ground truth oracle for validation was decisive. Fast feedback loops through cargo test and ./test.sh kept agents on track.

2. Strong Module Boundaries

graph LR
    P[Parser] --> PL[Planner] --> E[Executor] --> S[Storage]

Clear module boundaries — parser → planner → executor → storage — allowed agents to work on orthogonal slices with minimal merge conflicts.

3. Shared State Docs as Runtime, Not Documentation

PROGRESS.md and design notes functioned as system runtime state, not mere documentation. This highlights the importance of shared state management in multi-agent collaboration.

Limitations and Lessons

The Missing Coalescer

A coalescer agent was built to clean up duplication and drift, but it only ran once at project end. Gemini couldn’t complete the full deduplication and stopped midway. The coalescer should run as frequently as the other agents.

Untrackable Token Usage

Different platforms use different usage formats, making it impossible to determine which agent contributed most.

Documentation Explosion

PROGRESS.md grew to 490 lines, and the notes directory accumulated massive amounts of documentation — a visible cost of inter-agent communication.

Connection to Prior Work

This experiment aligns with Verdent AI’s multi-agent SWE-bench results. While Verdent demonstrated parallel execution effectiveness on benchmarks, this SQLite project proves multi-agent division of labor on real system construction.

Key shared insights:

  • Give agents a narrow interface, a common truth source, and fast feedback, and you get compounding throughput on real systems code
  • Tests are the anti-entropy force

Key Summary

MetricValue
Agent count6 (Claude 2 + Codex 2 + Gemini 2)
Lines of code~19,000 (Rust)
Commits154
Coordination overhead54.5%
Tests282 passing
Development time2 days

Conclusion

This experiment reveals both the promise and limits of multi-agent development. Six agents building a working 19,000-line database engine in 2 days is impressive, but the fact that over half of all commits were coordination overhead cannot be ignored.

Parallelism is powerful, but only with strict task boundaries. And tests aren’t just quality assurance — they’re the core mechanism fighting entropy in agent systems.

References

Read in Other Languages

Was this helpful?

Your support helps me create better content. Buy me a coffee! ☕

About the Author

JK

Kim Jangwook

Full-Stack Developer specializing in AI/LLM

Building AI agent systems, LLM applications, and automation solutions with 10+ years of web development experience. Sharing practical insights on Claude Code, MCP, and RAG systems.