A multi-agent system that turns a product team's PRDs, roadmap, and SQL library into structured context AI can act on — making agentic AI a reliable contributor to product development, not a guesser.
Shadowed PMs and engineers on real agent sessions, ran onboarding retros with the last four hires, reviewed the PRD lifecycle, and traced bad AI outputs back to their source.
Specs written for a launch rarely stay authoritative six months later. When only humans read them, drift is invisible. When AI reads them too, drift becomes a bug you can fix.
The roadmap on the wall diverges from what engineers and PMs actually believe. No single agent-readable source keeps them honest.
Most bad AI output traced to skipping a planning step, not weak models. Planning is the product.
Implicit agent and tool discovery misses about a third of the time. Explicit invocation and a clear manifest is what makes AI a trustworthy teammate.
Interviewed all 7 teammates across PM, backend, frontend, QA, and data. Grouped findings into three roles around the PRD lifecycle: author, consumer, and metric owner.
PRD author. Drives specs, roadmap, launches — wants an AI partner that actually knows the product.
PRD consumer. Turns specs into shipping software, uses AI coding assistants daily.
Metric owner. Pairs every PRD with a query and a dashboard.
Not a wiki, not a PM SaaS, not an AI wrapper. Hive is an agentic AI product for PM work, sitting on a repo-native Team OS.
Agent-native IDEs are reliable enough to trust on product artifacts. And "context, context window, compaction, thinking room" are now first-class product concerns for any team shipping with AI.
A shared agents/ folder (agents, commands, skills), a product-development/ folder (PRDs/RFCs/roadmap/SQL), and a team/ folder (onboarding, retros).
Writing happens in the IDE and PR. The app is for reading and running — never for authoring. One surface, one job.
At the top of the repo, a manifest that hands every agent a map of the codebase, the team, and the domain graph before any work starts.
Handled upstream in PRs and chat. Duplicating them in-app would fragment attention without earning the complexity.
PM, Engineer, Analyst agents, each built from real team conventions. Every non-trivial write produces a plan file checked into the repo.
~70% hit rate isn't safe for team-wide workflows. Explicit invocation trades a little magic for a lot of reliability.
Prioritization lens: does this change make the next agentic task cheaper — in context tokens, in planning time, in rediscovery? If not, don't ship.
A single YAML map from domain → feature → PRD, RFC, plan, SQL, dashboards, tickets. Agents traverse the graph instead of guessing — and the same map anchors every human navigating the workspace.
The PM agent drafts and edits PRDs with domain awareness. The Engineer agent reads PRDs + schemas to plan work. The Analyst agent runs reviewed SQL and cites its source PRD. Each loaded with real team conventions as a system prompt.
Real-time interaction with live context injected from the loaders layer. Compaction-resistant by design — ground truth lives in files, so any session can be resumed or rebuilt from source.
Every multi-step action starts with a plan file in the repo. The plan is both a review artifact and future-session context — planning stops being overhead and starts being inventory.
One PR for product review, a separate PR for engineering review. Reviews are faster, history is cleaner, and long agent sessions behave better on the smaller change units.
Within the first month, plan-first went from a suggested practice to an unspoken team norm — plan files now appear in PRs without prompting.
The shared agent layer turned one engineer's handy command into every engineer's default on the next pull. Compounding beat any single feature launch.
Reviews moved faster, history got cleaner, and long agent sessions held up better because work was chunked the way the review actually happens.
The three-part Team OS structure + domain index became the reference other product teams in the org asked to replicate for their own workspaces.
Once I stopped treating PRDs as deliverables and started treating them as context for the next session (human or AI), every design decision got easier. The PRD isn't the artifact; the agent-readable context it produces is.
The highest-leverage change wasn't a better model — it was forcing a plan-first workflow. Planning is the product; prompting is just the last mile.
Multi-agent systems only work when the substrate (root manifest + PRD graph + shared conventions) makes the team legible. The agents are the surface; the repo is the engine.
Ship the shared agent layer on day one. Individual agents are nice; a team's shared agents compound. I underestimated how much adoption came from “I pulled main and my tools got better,” not from any single feature launch.