This is a submission for the GitHub Copilot CLI Challenge
Parallel, evidence-verified orchestration of real GitHub Copilot CLI sessions.
What I Built
Copilot Swarm Orchestrator is an external process orchestrator for the GitHub Copilot CLI.
It doesn't dispatch work inside Copilot or coordinate agents behind an abstraction layer. Each agent is a fully independent copilot -p subprocess, spawned on its own git branch, with its own transcript. The orchestrator's job is everything that happens around those sessions: planning, scheduling, branch isolation, verification, self-repair, and merge.
The key word is external. The orchestrator sits outside Copilot's runtime. It doesn't see inside the session or steer it mid-flight. It builds the plan, sets up the branch, hands the agent a prompt with dependency context from prior steps, launches the subprocess, waits for it to finish, then independently verifies what happened by parsing the /share transcript after the fact. If the evidence checks out, the branch merges. If it doesn't, the Repair Agent kicks in.
Here's what that looks like in practice:
- Breaks goals into dependency-aware steps: A local Planner maps out the work. A PM Agent validates the plan for cycles, duplicate steps, and missing dependencies before any Copilot sessions get spent.
- Runs independent steps in parallel waves: Steps with no mutual dependencies execute simultaneously. Adaptive concurrency adjusts limits based on success rates and rate-limit signals.
- Spawns each agent as its own subprocess: Every step gets its own
copilot -pprocess on its own git branch. There is no session reuse, no shared state between agents, no internal routing. With--strict-isolation, cross-wave context is restricted to entries backed by verified transcript evidence only. - Verifies results from transcript evidence, not self-reporting: The verifier parses each
/sharetranscript looking for concrete proof: git commit SHAs, test runner output, build success markers, file-change records. Claims the agent made are cross-referenced against this evidence. An optional Governance Critic scores each step on weighted axes (build, test, lint, commit, claim) and flags drift before merge, with auto-pause for human approval. - Self-repairs on failure: A Repair Agent classifies failures (build, test, missing-artifact, dependency, timeout) and applies targeted fix strategies with accumulated context, up to three retries per step before fallback re-execution.
- Maintains a persistent, resumable audit trail: Full execution state is saved per session ID. Resume interrupted runs with
--resume. Theauditcommand generates Markdown reports with timeline, cost breakdown, gate results, and evidence.
Nothing is simulated. No undocumented flags. No cloud-based middleware.
Every agent is a real, isolated CLI process. Every verification is done after the fact from parsed output. Every merge requires proof.
Demo
Repository: https://github.com/moonrunnerkc/copilot-swarm-orchestrator
Quick demo command:
npm start demo-fast
This launches two independent Copilot CLI subprocesses in parallel on separate git branches and prints live, interleaved output so you can see concurrency in action. Takes about 30 seconds.
For a more complete showcase, the dashboard-showcase demo builds a React + Chart.js analytics dashboard with an Express API using four agents across three waves (roughly 6 minutes):
npm start demo dashboard-showcase
Other demos scale up from there:
npm start demo todo-app # 4 agents, ~15 min
npm start demo api-server # 6 agents, JWT auth + Prisma + Docker
npm start demo full-stack-app # 7 agents, Playwright E2E + CI/CD
npm start demo saas-mvp # 8 agents, Stripe + analytics + security audit
Each run produces an auditable trail (plans/, runs/, proof/) showing what each agent did, what transcript evidence was verified, and what was merged. You can also inspect runs in the browser:
npm start web-dashboard # http://localhost:3002
Screencast (fresh project interaction):
Screenshots (existing project interaction):
My Experience with GitHub Copilot CLI
This project was built with Copilot CLI, not "wrapped around" it.
I treated the Copilot CLI as a "Compute Engine" for intelligence. Each session is an opaque subprocess. The orchestrator never reaches inside it. Instead, the value comes from what I built around it:
- Dependency Planning: Automating the order of operations, with a PM Agent that catches broken dependency graphs before any Copilot sessions get spent.
- Bounded Agent Scopes: Eight agent profiles (six specialists, a Repair Agent, and a PM Agent), each with a defined purpose, scope boundaries, and refusal rules. Custom agents are supported via YAML config.
- Post-Hoc Verification + Governance: The orchestrator doesn't trust what the agent says it did. It parses the transcript after execution and checks for real evidence. The optional Critic scores steps on weighted quality axes and auto-pauses for human approval when it finds flags.
- Per-Step Branch Isolation: Every agent works on its own branch. Nothing touches main until verification passes.
--strict-isolationgoes further, restricting cross-wave context to entries backed by verified transcripts.
Copilot accelerates implementation. The orchestrator adds structure, coordination, and evidence checks on top of that, from the outside.
The result is a workflow where Copilot can move fast, fail safely, and leave behind proof instead of vibes.
Key Constraints (Intentional)
- Official Integration: Uses only documented Copilot CLI flags (such as
-p,--model,--share). - Zero Emulation: Does not embed or emulate Copilot; it uses your local authenticated CLI session. Each agent is a standalone subprocess.
- Evidence-Based: Does not guarantee correctness; verification is transcript-based (parsing commands and outputs after the session ends), not just semantic guessing. Governance scores are heuristic indicators, not authoritative judgments.
- Local Control: All execution is explicit, inspectable, and reversible (work happens on branches before merge).
- Persistent Learning: A knowledge base captures execution patterns across runs (dependency ordering, anti-patterns, failure modes) and feeds them back into future planning via lean mode (
--lean).
Why It Matters
GitHub Copilot CLI is a game-changer for individual commands, but real-world engineering is rarely a single task. It's a series of parallel move-sets: updating the API, adjusting the frontend, and ensuring the test suite doesn't regress.
Copilot Swarm Orchestrator transitions the CLI from a "helper" to a Strategic Partner:
- Eliminates the "Babysitting" Tax: Instead of waiting for one prompt to finish so you can paste the next, you delegate an entire "Wave" of work. It handles the tedious context-switching and branch management while you stay in the flow. If a step fails, the Repair Agent handles it (classified by failure type, with targeted strategies) so you don't have to babysit retries either.
- Trust Through Transparency: Every run generates a local proof/ folder with parsed transcripts, verification reports, and merge records. You aren't just getting code; you're getting a documented chain of evidence showing how that code was produced and validated. Sessions are resumable, auditable, and inspectable via the web dashboard.
- Unlocks Professional Scaling: It proves that "Agentic" workflows don't require expensive cloud infrastructure or complex enterprise setups. Multi-repo orchestration, persistent sessions, plan caching, and a knowledge base that learns across runs, all running from your local terminal.
In short, this moves the terminal from a Chat Box to a Mission Control Center. It's about giving developers the one thing AI is supposed to provide but often costs: Time.
License: ISC
Built with: TypeScript, Node.js 18+, GitHub Copilot CLI (563 tests, 66 source files, 16,489 lines)


Top comments (0)