OrchestrAI
AI agent orchestration platform that transforms high-level goals into actionable plans through automated research, synthesis, and a living knowledge graph you can steer in real-time.
Built an AI agent orchestration platform from scratch — input a north star goal, watch parallel research agents investigate, then steer the results through an interactive knowledge graph. Features 5 orchestration patterns, 6 LLM provider integrations, real-time budget tracking, and human-in-the-loop controls. 529 passing tests across 44 test files.
The Problem
Building complex software projects requires extensive research, careful planning, and coordinated execution. Traditional approaches either rely on manual research that's time-consuming and incomplete, use single AI agents that lack depth for comprehensive analysis, or miss the structured planning needed for successful implementation. I wanted something that could take a high-level idea and systematically turn it into a researched, validated, executable plan.
The mental model was simple: what if you could give an AI system a "north star" — like "build a real-time collaboration tool for designers" — and it would automatically decompose that into research questions, dispatch parallel agents to investigate each one, synthesize the findings, detect conflicts, and generate a roadmap you could actually follow?
How It Works
The pipeline flows like this:
- North Star Decomposition — Your goal gets broken into research questions across 7 categories: Market, Technical, User Needs, Compliance, Competition, Architecture, and Integration
- Research Swarm — Parallel agents investigate each question independently with web search and structured citations
- Synthesis — Findings get merged with theme identification, conflict detection, and coverage gap analysis
- Plan Generation — Creates plan.md, roadmap.md, and Architecture Decision Records (ADRs) with full provenance tracking
- Knowledge Graph — Everything connects in a living graph you can explore, steer, and inject new questions into
Design Decisions
Interactive Knowledge Graph
The centerpiece of the UI. Built with React Flow, the graph visualizes 10 node types and 7 edge types — from your north star goal down to individual research findings, decisions, and tasks. Every plan item links back to the research that informed it. You can click any node to see its provenance chain.
I chose a graph visualization over a traditional list/tree because the relationships between research findings are non-linear. A market insight might inform an architecture decision which contradicts a compliance requirement. Those connections need to be visible, not buried in documents.
5 Orchestration Patterns
Built five distinct patterns for different scenarios: Supervisor (lead agent delegates), Map-Reduce (parallel with aggregation), Pipeline (sequential stages), Consensus (multiple agents vote), and Hierarchical (multi-level delegation). The system can switch patterns mid-execution based on budget, failure rates, or complexity changes.
Dynamic pattern switching was the hardest design problem. When a Supervisor pattern is burning budget too fast, the system needs to gracefully hand off to a Map-Reduce pattern without losing context. I designed a state machine that preserves in-flight work during transitions.
Budget Controls & Cost Transparency
Multi-agent systems use 8-15x the tokens of a single agent. I built real-time cost monitoring with per-agent breakdowns, automatic model tier downgrades when budget thresholds hit (50%, 80%, 95%), and cost projections for remaining tasks. The UI shows exactly where every dollar is going.
The tiered routing strategy — Opus for planning, Sonnet for coding, Haiku for quick tasks — cuts costs 50-80% compared to running everything on the most capable model.
Human-in-the-Loop Controls
Approval gates let you require human sign-off before critical actions (plan execution, budget increases, external API calls). Stop hooks create checkpoints throughout execution where you can pause, inspect, and redirect. I call this the "Ralph Pattern" — named after the ability to pull the emergency brake at any point.
Plan/Act Mode Toggle
A simple but critical UX decision: Plan mode lets you explore research, steer the graph, and inject questions without any execution happening. Act mode activates the agents. This separation prevents accidental execution and gives users confidence to explore freely.
The Web UI
Built the dashboard in Next.js 16 with React 19 and 7 Zustand stores managing different slices of state:
- Graph Store — Knowledge graph nodes, edges, and layout
- Session Store — Execution sessions and Plan/Act mode
- Budget Store — Real-time cost tracking
- Approvals Store — Human approval queue
- Agent Pool Store — Parallel agent visualization (up to 8 concurrent)
- Memory Store — Persistent cross-session knowledge bank
- Connection Store — WebSocket health
The dashboard streams updates via Socket.io WebSockets, so you see agents working in real-time — research being gathered, synthesis happening, the graph growing node by node.
Other UI features: Command palette (Cmd+K) for quick navigation, confidence badges on tasks (0-100% completion likelihood), session replay for stepping through past executions event by event, and a settings page for configuring LLM providers.
Architecture
The system is three tiers:
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 16, React Flow, Zustand | Interactive dashboard and graph |
| Real-time | Socket.io WebSocket server | Bidirectional streaming |
| Backend | TypeScript engine, 18 modules | Orchestration, research, planning |
State persistence is deliberately low-infrastructure: Markdown files committed to Git. plan.md, roadmap.md, and memory.md are human-readable, diffable, and require zero database setup. Every agent commit includes attribution metadata so you can trace who (or what) changed what.
What I Learned
Before building the graph UI, I was thinking about AI orchestration as a sequential pipeline. The graph visualization revealed that research findings form a web of dependencies and contradictions that can't be captured in a linear flow. The UI changed my mental model of the system.
My first test run without budget limits cost $47 in 6 minutes. Multi-agent systems need cost controls from day one, not as an afterthought. The tiered routing and automatic model downgrades turned a $47 experiment into a $4 one.
The technical implementation of approval gates is straightforward. The hard part is designing when to interrupt the user vs. when to proceed autonomously. Too many interruptions and the system feels useless. Too few and users lose trust. I'm still iterating on the right balance.