OpenCastle

Docs / Agentic Development

Agentic Development & Multi-Agent Orchestration

Master AI-assisted software development — from effective single-agent vibecoding to coordinating specialist agent teams interactively or through automated batch execution with orchestration frameworks like OpenCastle.

What Is Agentic Development?

Agentic development — sometimes called vibecoding — is a workflow where you collaborate with AI coding agents instead of writing every line yourself. You describe what you want, and the agent handles the how: writing code, running tests, fixing errors, and iterating until the task is done.

This isn't about blindly accepting AI output. The best results come from treating the agent as a capable junior developer — one that needs clear direction, good context, and code review.

Key mindset shift: Your job moves from "writing code" to "directing, reviewing, and verifying code." The quality of your input directly determines the quality of the output.

Multi-Agent Development

Why Multi-Agent?

A single AI agent is powerful — but it has limits. Context windows fill up. Prompts lose focus when mixing UI, backend, tests, and security in one conversation. Knowledge is shallow across too many domains.

Multi-agent development solves this by assigning specialist agents to focused tasks — just like a real engineering team. A Team Lead decomposes work, delegates to experts, and verifies results. Each agent operates within a well-scoped context, producing higher quality output.

Key Insight: Multi-agent orchestration isn't about replacing developers — it's about giving each AI agent the right scope, the right context, and the right expertise to do its best work.
DO

Decompose into specialist tasks

"Developer: implement the form. Testing Expert: write E2E tests. Security Expert: audit auth flow."
DON'T

Overload one agent with everything

"Build the feature, write tests, add docs, fix accessibility, and optimize performance."

Orchestration Patterns

In multi-agent development, an orchestrator (or Team Lead) coordinates the work. It analyzes requirements, creates a plan, assigns tasks to specialist agents, and verifies their output — similar to a tech lead managing a team of engineers.

Orchestrator Pattern
Team Lead
decomposes & delegates
Developer
Tester
Reviewer
verifies & delivers
Team Lead

Key orchestration principles:

  • Task decomposition — break work into the smallest meaningful, independently verifiable units
  • File partitioning — parallel agents never touch the same files, eliminating merge conflicts
  • Dependency ordering — tasks are phased so dependent work waits for prerequisites
  • Independent verification — every output is reviewed before being accepted
DO

Define clear file ownership per agent

"Developer A owns src/components/. Developer B owns src/api/. No overlap."
DON'T

Let parallel agents edit the same files

Two agents both modifying layout.tsx at the same time → merge conflicts and lost work

Specialist Agents

Instead of one generalist AI trying to handle everything, multi-agent systems assign focused roles with domain-specific instructions, tools, and context. This mirrors how real engineering teams operate — you wouldn't ask a backend engineer to also design the UI and write the security audit.

Premium

Orchestration and cross-system coordination — tasks requiring deep reasoning and multi-step planning.

Team Lead
Quality

Feature implementation, UI components, security audits — core development work that demands precision and nuanced reasoning.

Developer · UI/UX Expert · Security Expert · Architect
Standard

CMS integration, database work, API design, performance profiling — domain-specific tasks with broad context needs.

Content Engineer · Database Engineer · API Designer · Performance Expert · Researcher
Fast

Testing, data pipelines, DevOps, releases — terminal-heavy tasks that benefit from rapid iteration cycles.

Testing Expert · Data Expert · DevOps Expert · Release Manager
Economy

Documentation, copy, SEO, code review — tasks that benefit from speed and cost-efficiency over raw reasoning power.

Documentation Writer · Reviewer · Copywriter · SEO Specialist
DO

Match agent tier to task complexity

Premium model for security audits. Economy model for docs updates. Right tool for the job.
DON'T

Use the most expensive model for everything

Running a premium model to update README files wastes budget without adding value.

Quality Gates

In single-agent workflows, you manually verify output. In multi-agent systems, quality gates automate verification — ensuring every agent's work passes review before being accepted.

  • Deterministic checks — lint, type-check, and test suites run automatically
  • Fast review — a lightweight reviewer agent checks acceptance criteria, file partitions, and regressions
  • Panel review — for high-stakes changes (security, auth, DB migrations), multiple independent reviewers vote PASS/BLOCK
  • Self-improvement — lessons from failures are captured and fed back into future agent prompts

Quality gates turn multi-agent development from "hope it works" into a verified pipeline. Every agent's output is independently checked before it touches the main branch.

DO

Verify at every step of the pipeline

Agent delivers → fast review → lint/typecheck → tests → merge. Every gate must pass.
DON'T

Trust agent output without verification

Merging code because "the agent said it works" — always run tests and reviews.

Architecture

The Ralph Wiggum Loop

The Ralph Wiggum Loop, invented by Geoffrey Huntley, is a single-agent technique for autonomous coding. It's a simple bash loop — while :; do cat PROMPT.md | claude-code; done — where one AI agent iterates repeatedly on a task until it's done. Named after the Simpsons character because he's cheerfully persistent despite failures: eventual consistency through brute force.

  • One task per iteration — each loop gets a fresh context window; the spec in PROMPT.md is deterministically loaded every time
  • Backpressure — tests, type checkers, and linters act as gates; bad code is rejected automatically before the next iteration
  • Self-improvement — the agent updates AGENT.md and fix_plan.md between runs, accumulating knowledge across iterations
  • Brute force meets persistence — the loop is the hero, not the model; it runs until the work is done
  • Best for greenfield — works well as a single-agent technique for new projects (Geoffrey Huntley built an entire programming language, CURSED, using this technique)
Single-agent primitive: Ralph Wiggum shows that one agent looping with specs and backpressure can build real software. Gas Town scales this idea to many agents working in parallel with a full orchestration layer.

Gas Town Architecture

Steve Yegge's Gas Town takes the looping-agent idea to industrial scale — dozens of Claude Code instances working in parallel through a rigorous work representation system called the MEOW stack (Molecular Expression of Work):

Gas Town Model
Beads (Git-backed JSONL)
MEOW stack: Formulas → Molecules → Epics → Beads · state persists in Git
Mayor
Deacon
Polecats
Witness
GUPP: "If there is work on your hook, YOU MUST RUN IT"
Refinery (Merge Queue)
  • Beads — the atomic unit of work; one-issue-per-line JSON files tracked in Git (a Git-backed JSONL issue tracker built in Go)
  • MEOW stack — layered work representation: Beads → Epics → Molecules → Formulas; state persists in Git so work survives crashes (Nondeterministic Idempotence)
  • GUPP (Gastown Universal Propulsion Principle) — "If there is work on your hook, YOU MUST RUN IT"; keeps agents driving forward across session crashes
  • Polecats — ephemeral per-rig workers; they spin up on demand in swarms, produce Merge Requests, get merged, then are fully decommissioned with names recycled — cattle, not pets
  • Deacon — the "daemon beacon"; a patrol agent that runs workflows in a loop, propagating DYFJ ("Do Your Job") signals to workers, managing orchestration, plugins, and session recycling; has helper workers called Dogs (including Boot)
  • Witness — the actual worker health monitor; checks on polecats and refineries and detects stuck or zombie workers
  • Convoy — the ticketing and work-order system that wraps work into a delivery unit, maintaining order and dependencies across the pipeline
  • Refinery — a merge queue that serializes changes from parallel workers, preventing conflicts before they reach the main branch

Gas Town vs OpenCastle

OpenCastle takes these ideas and rebuilds them for the IDE-native developer workflow — TypeScript over Go, SQLite over Git JSONL, standard git tooling over custom infrastructure:

Aspect Gas Town OpenCastle
Language Go + custom tooling TypeScript (Node.js 22+)
State storage Beads (Git-backed JSONL) SQLite WAL — zero deps, built into Node.js
Worker model Claude Code instances in tmux Any AI runtime (Copilot, Claude Code, Cursor, OpenCode)
Setup tmux + Claude Code + Go binary (gt) npx opencastle init — works in any repo
Orchestration MEOW stack (formulas → molecules → beads) YAML spec → deterministic execution
Isolation Git worktrees (same as OpenCastle) Git worktrees — standard tooling
Integration Standalone platform Plugs into existing IDE workflows
Learning curve Steep — proprietary concepts Gentle — standard tools, familiar patterns

Why OpenCastle

Benefits

  • Easy adoption — one command install (npx opencastle init), works with your existing IDE and AI runtime
  • Minimal setup — YAML config, no dedicated infrastructure, no proprietary tooling to learn
  • Crash-safe execution — SQLite WAL persists all task state; interrupted runs resume from where they stopped
  • Isolated workers — git worktrees give each parallel agent its own branch, eliminating mid-flight merge conflicts
  • Multi-runtime support — bring your own AI: GitHub Copilot, Claude Code, Cursor, and OpenCode all supported
  • Observable by default — SQL-queryable state, NDJSON event logs, and a real-time dashboard out of the box
Best Practices

Write Great Prompts

The quality of agent output is directly proportional to the quality of your instructions. Vague prompts produce vague results. Specific prompts with context produce production-ready code.

DO

Be specific and provide context

"Add a `lastLogin` timestamp field to the User model in `src/models/user.ts`. Update the login handler in `src/auth/login.ts` to set it on successful authentication. Use the existing `updatedAt` pattern for the database update."
DON'T

Give vague, context-free instructions

"Track when users log in."
DO

Reference existing files and patterns

"Create a new API route at `src/api/invoices.ts` following the same pattern as `src/api/orders.ts` — validate with Zod, use the existing `db` client, and return typed responses."
DON'T

Expect the agent to guess your conventions

"Make an invoices endpoint."
DO

Specify acceptance criteria

"The component should handle loading, error, and empty states. It needs to be keyboard-accessible and work at mobile breakpoints. Add unit tests for the data transformation logic."
DON'T

Leave requirements implicit

"Build a user list component."

Build Tools, Not Tasks

This is the single most impactful pattern for agentic development. When facing a large refactoring or repetitive task, ask the agent to create a tool or script that performs the operation — rather than asking it to do the work directly.

Why Tools Beat Direct Edits

  • Repeatable — Run the tool again if new files appear or requirements change
  • Verifiable — Review the tool's logic once, trust it for 500 files
  • Reviewable — A 50-line script is easier to code review than 200 scattered file edits
  • Debuggable — When something goes wrong, you debug one script — not hunt through hundreds of diffs
  • Safe — Run with --dry-run first, apply when confident
DO — Ask for a tool

Create a codemod

"Write a codemod script that finds all React components using the old `<Button variant='primary'>` API and replaces them with `<Button color='brand'>`. Support both JSX and TSX. Add a --dry-run flag that prints changes without writing."
DON'T — Do it manually

Edit every file one by one

"Update every Button component in the codebase to use `color='brand'` instead of `variant='primary'`."

With 80+ files, the agent will miss some, introduce inconsistencies, or hit context limits mid-way.

DO — Ask for a tool

Create a migration script

"Write a script that reads all YAML config files in `config/`, converts them to TOML format, and writes the output to `config-v2/`. Preserve comments where possible."
DON'T — Do it manually

Convert each file individually

"Convert config/database.yml to TOML. Now convert config/auth.yml. Now convert config/cache.yml..."

Each conversion is a separate context window with no shared logic. Errors compound instead of being fixed once.

DO — Ask for a tool

Build a linter rule

"Create an ESLint rule that flags any direct `console.log` calls in `src/` and suggests using our logger utility from `src/lib/logger` instead. Include auto-fix support."
DON'T — Do it manually

Find-and-replace across the codebase

"Replace all console.log calls with logger calls in the codebase."

A rule catches new violations too. Manual replacement is a one-shot fix that regresses immediately.

Scope Your Requests

AI agents work best with focused, well-scoped tasks. Large, multi-concern requests lead to mediocre results everywhere instead of excellent results somewhere. Break big features into small, independently testable pieces.

DO

Break work into focused tasks

"Step 1: Create the database schema for invoices with these fields: id, amount, status, customerId, createdAt. Step 2: Build the API route with list and create endpoints. Step 3: Build the UI table component."
DON'T

Ask for everything at once

"Build a complete invoicing system with database, API, UI, PDF export, email notifications, and Stripe integration."
DO

Ship and verify incrementally

"Let's start with just the data model and API. Once that's working and tested, we'll add the UI."
DON'T

Wait until everything is done to test anything

"Build the full feature, I'll test it all at the end."

Verify, Don't Trust

AI agents are confident, fast, and sometimes wrong. Always verify their output. Run tests, inspect diffs, and check edge cases. The agent is a tool — you're still the engineer.

DO

Ask the agent to prove its work

"After making the changes, run the test suite and show me the results. Also run the type checker."
DON'T

Merge without checking

"Looks good, ship it." (without reviewing the diff or running tests)
DO

Review diffs before committing

"Show me the git diff of all changes so I can review before we commit."
DON'T

Let the agent commit and push on its own

"Just push all changes to main." (skipping review entirely)

Use Context Effectively

Agents don't automatically know your project's conventions, architecture, or unwritten rules. The more relevant context you provide, the better the output matches your codebase.

DO

Point to existing patterns

"Create a new service following the same pattern as `src/services/userService.ts` — use dependency injection, return Result types, and log with our structured logger."
DON'T

Assume conventions are obvious

"Create an order service."

The agent might use a completely different pattern — classes instead of functions, throw instead of Result, console.log instead of your logger.

DO

Mention constraints and requirements

"We use Tailwind CSS v4 with our custom design tokens. Colors come from `theme.colors` — don't use arbitrary hex values. All components must support dark mode."
DON'T

Let the agent make style decisions

"Style the card component nicely."

Iterate, Don't Restart

When the agent produces something close but not right, give targeted feedback and ask for adjustments. Starting over from scratch throws away correct work and burns context.

DO

Give specific, targeted feedback

"The table component looks good, but the sorting logic is wrong — it should sort by date descending by default, not ascending. Also, the empty state should show the illustration from `public/empty-inbox.svg`."
DON'T

Throw away everything and start over

"This is wrong. Redo the whole thing."

Without saying what's wrong, the agent repeats the same mistakes. And you lose the parts that were already correct.

DO

Acknowledge what's right

"The API route structure and validation are perfect. Just two changes: rename the `getData` function to `fetchInvoices`, and add pagination support using cursor-based pagination."
DON'T

Provide only negative feedback

"That's not what I wanted."