OrchestrAI

TL;DR

Built an AI agent orchestration platform from scratch — input a north star goal, watch parallel research agents investigate, then steer the results through an interactive knowledge graph. Features 5 orchestration patterns, 6 LLM provider integrations, real-time budget tracking, and human-in-the-loop controls. 529 passing tests across 44 test files.

Role

Solo Designer & Engineer

Timeline

2 weeks

Team

Solo

Impact

529 tests, 6 phases shipped

Company

Personal Project

Tools

TypeScript, Next.js 16, React Flow, Socket.io, Zustand

The Problem

Building complex software projects requires extensive research, careful planning, and coordinated execution. Traditional approaches either rely on manual research that's time-consuming and incomplete, use single AI agents that lack depth for comprehensive analysis, or miss the structured planning needed for successful implementation. I wanted something that could take a high-level idea and systematically turn it into a researched, validated, executable plan.

The mental model was simple: what if you could give an AI system a "north star" — like "build a real-time collaboration tool for designers" — and it would automatically decompose that into research questions, dispatch parallel agents to investigate each one, synthesize the findings, detect conflicts, and generate a roadmap you could actually follow?

How It Works

The pipeline flows like this:

North Star Decomposition — Your goal gets broken into research questions across 7 categories: Market, Technical, User Needs, Compliance, Competition, Architecture, and Integration
Research Swarm — Parallel agents investigate each question independently with web search and structured citations
Synthesis — Findings get merged with theme identification, conflict detection, and coverage gap analysis
Plan Generation — Creates plan.md, roadmap.md, and Architecture Decision Records (ADRs) with full provenance tracking
Knowledge Graph — Everything connects in a living graph you can explore, steer, and inject new questions into

Design Decisions

Interactive Knowledge Graph

The centerpiece of the UI. Built with React Flow, the graph visualizes 10 node types and 7 edge types — from your north star goal down to individual research findings, decisions, and tasks. Every plan item links back to the research that informed it. You can click any node to see its provenance chain.

I chose a graph visualization over a traditional list/tree because the relationships between research findings are non-linear. A market insight might inform an architecture decision which contradicts a compliance requirement. Those connections need to be visible, not buried in documents.

5 Orchestration Patterns

Built five distinct patterns for different scenarios: Supervisor (lead agent delegates), Map-Reduce (parallel with aggregation), Pipeline (sequential stages), Consensus (multiple agents vote), and Hierarchical (multi-level delegation). The system can switch patterns mid-execution based on budget, failure rates, or complexity changes.

Dynamic pattern switching was the hardest design problem. When a Supervisor pattern is burning budget too fast, the system needs to gracefully hand off to a Map-Reduce pattern without losing context. I designed a state machine that preserves in-flight work during transitions.

Budget Controls & Cost Transparency

Multi-agent systems use 8-15x the tokens of a single agent. I built real-time cost monitoring with per-agent breakdowns, automatic model tier downgrades when budget thresholds hit (50%, 80%, 95%), and cost projections for remaining tasks. The UI shows exactly where every dollar is going.

The tiered routing strategy — Opus for planning, Sonnet for coding, Haiku for quick tasks — cuts costs 50-80% compared to running everything on the most capable model.

Human-in-the-Loop Controls

Approval gates let you require human sign-off before critical actions (plan execution, budget increases, external API calls). Stop hooks create checkpoints throughout execution where you can pause, inspect, and redirect. I call this the "Ralph Pattern" — named after the ability to pull the emergency brake at any point.

Plan/Act Mode Toggle

A simple but critical UX decision: Plan mode lets you explore research, steer the graph, and inject questions without any execution happening. Act mode activates the agents. This separation prevents accidental execution and gives users confidence to explore freely.

The Web UI

Built the dashboard in Next.js 16 with React 19 and 7 Zustand stores managing different slices of state:

Graph Store — Knowledge graph nodes, edges, and layout
Session Store — Execution sessions and Plan/Act mode
Budget Store — Real-time cost tracking
Approvals Store — Human approval queue
Agent Pool Store — Parallel agent visualization (up to 8 concurrent)
Memory Store — Persistent cross-session knowledge bank
Connection Store — WebSocket health

The dashboard streams updates via Socket.io WebSockets, so you see agents working in real-time — research being gathered, synthesis happening, the graph growing node by node.

Other UI features: Command palette (Cmd+K) for quick navigation, confidence badges on tasks (0-100% completion likelihood), session replay for stepping through past executions event by event, and a settings page for configuring LLM providers.

Architecture

The system is three tiers:

Layer	Technology	Purpose
Frontend	Next.js 16, React Flow, Zustand	Interactive dashboard and graph
Real-time	Socket.io WebSocket server	Bidirectional streaming
Backend	TypeScript engine, 18 modules	Orchestration, research, planning

State persistence is deliberately low-infrastructure: Markdown files committed to Git. plan.md, roadmap.md, and memory.md are human-readable, diffable, and require zero database setup. Every agent commit includes attribution metadata so you can trace who (or what) changed what.

What I Learned

Visualization Changes How You Think About AI

Before building the graph UI, I was thinking about AI orchestration as a sequential pipeline. The graph visualization revealed that research findings form a web of dependencies and contradictions that can't be captured in a linear flow. The UI changed my mental model of the system.

Budget Controls Aren't Optional

My first test run without budget limits cost $47 in 6 minutes. Multi-agent systems need cost controls from day one, not as an afterthought. The tiered routing and automatic model downgrades turned a $47 experiment into a $4 one.

Human-in-the-Loop Is a UX Problem

The technical implementation of approval gates is straightforward. The hard part is designing when to interrupt the user vs. when to proceed autonomously. Too many interruptions and the system feels useless. Too few and users lose trust. I'm still iterating on the right balance.

Layer

Technology

Purpose

Frontend

Next.js 16, React Flow, Zustand

Interactive dashboard and graph

Real-time

Socket.io WebSocket server

Bidirectional streaming

Backend

TypeScript engine, 18 modules

Orchestration, research, planning