Day 19: PlanWright

Day 19 of 30 — the second leg of the Autopilot arc

PlanWright is the project-management tool built for AI-native software development. It is live today at planwright.tools, and it is live as an MCP server — mcp.planwright.tools — that exposes the full work-surface tool set to any agent runtime that speaks Model Context Protocol. Nadeem Haider is coming in to build out the product and the company alongside this launch. PlanWright is the second leg of the Autopilot arc the run opened yesterday with NameIntel — and where NameIntel proved the buyer-change thesis (the agent as a first-class buyer), PlanWright proves the supervisor-change thesis: the human is a first-class supervisor of the work, and the tool that makes that role legible at audit-chain rigor is the tool we are launching today.

Yesterday’s Day 18 NameIntel post named the schedule slip plainly: the original calendar had NameIntel on Day 16 (Saturday) and PlanWright on Day 17 (Sunday); the weekend slipped on the x402 facilitator-integration debugging, both builds compressed forward, NameIntel landed on Day 18 and PlanWright lands today on Day 19. The discipline of the run is to name slips when they happen. The Autopilot arc lands fully now, two days behind the original calendar and structurally on time.

The problem

Every engineering organization I have talked to in the last six months is stuck staring into the same chasm.

On the near side of the chasm is the SDLC that’s been running for a decade: a product manager pre-digests objectives into user stories; engineers pick up tickets from Jira or Linear; pull requests open against GitHub; peer reviewers inspect; tests run; staging deploys; QA exercises; release; repeat every two weeks. The ceremony has decades of operational learning baked in — peer review catches bugs, separation of duties satisfies auditors, staged rollout protects production. It isn’t stupid. It is expensive and slow.

On the far side is what software development looks like when coding agents are doing the implementation: humans synthesize chaos into Objectives at the top, agents claim cards and write code in the middle, humans read the agent’s return log and accept the work at the bottom, and a cryptographic audit log runs through the whole thing. Both sides work. The far side is faster by an order of magnitude. That is not a hypothetical claim — the Velocity Launch portfolio is 13 production-shipping products in 13 days, and the production-method substrate underneath every one of those repos is the pattern PlanWright was built around.

The problem is that the bridge does not exist yet.

Jira was built for human ceremonies. Linear is faster but still assumes a human author on every issue. GitHub PRs assume the code was written by an authenticated user. Notion is a doc store. Asana is a task tracker. They all work fine on the near side of the chasm. None of them enforce the human-bookend pattern. None of them log a cryptographic chain of custody on the status transitions. None of them treat a coding agent as a first-class actor with claim-and-return semantics. If you try to cross the chasm with the existing tools, you have two options. Option A: extend the existing tools with custom integration glue, hope your auditor accepts your homemade audit trail, hope your engineers don’t drift into shortcuts. Option B: don’t cross — keep the turbocharger model and watch the frontier move away from you in the windscreen. Neither option is good. Option A is fragile and expensive. Option B is competitive suicide on an eighteen-month horizon.

That is the gap the launch is for.

What PlanWright does

A Kanban board built around the human-bookend pattern. The card is the unit of work; the column is the lifecycle stage; the transition is the cryptographically signed event. The four moves of the cycle run on top of that substrate.

Move 1 — Congeal. Humans synthesize chaos into Objectives. The Product Manager or the founder or the lead engineer sits with Claude Desktop with the PlanWright MCP server connected, pulls in customer transcripts and Slack threads and screenshots and brand context and analytics dashboards and CRM data through other MCP servers, and produces well-formed Objective cards with machine-readable acceptance criteria and references to the relevant context files in the repo. The cards land in PlanWright via planwright_create_objective and planwright_push_context_file. The context file is the load-bearing artifact — typically 800–2,500 words of structured prose plus a handful of code references — and it carries the customer language, the architectural constraints, the prior-art references, the explicit out-of-scope boundaries, and the acceptance criteria the human will check against at the end of the cycle.

Move 2 — Publish. A human schedules the Objective into a release via planwright_schedule_objective. The release is the unit of human commitment; the Objective is the unit of agent work. Decoupling them is what lets a team run 20–60 Objectives per week per repo through the agent pool without the release cadence going haywire. PlanWright records the schedule, increments the release’s planned scope, and notifies the agent pool that there is pickup-eligible work.

Move 3 — Pickup. A coding agent — Claude Code as the default in our installations, but the surface is agent-neutral and works equally with Cursor, Cline, Continue, Aider, or any MCP-speaking runtime — claims a scheduled Objective via planwright_claim_objective. The agent reads the context file before it reads the code (the order matters; agents that open the codebase first start pattern-matching against existing code and miss the architectural intent the human captured in the context file). The agent records its plan via planwright_append_plan, works on a dev branch named after the Objective, opens a draft PR against the dev integration branch, records the diff and test results and a structured self-assessment via planwright_record_diff, and hands back to the human via planwright_request_acceptance. If the agent encounters an ambiguity that requires human judgement, it calls planwright_append_note with the question and pauses; the notes-and-resume protocol keeps agents from making bad judgement calls when they should be asking instead.

Move 4 — Accept. A human reviewer (the CTO, the lead engineer, sometimes the founder if the Objective is customer-facing) opens the PR, reads the diff and the agent’s return log, runs the acceptance criteria, and either accepts the work (the Objective branch merges into the dev integration branch) or rejects with feedback (the agent gets the feedback as an appended note and another claim window). The accept move is the audit trail — every accepted change carries a structured record: the context file (the human’s authored intent), the agent’s plan and diff and self-assessment, the human’s acceptance decision, the test results, the time-to-completion, and the agent identity. That structured record is the load-bearing artifact for the regulatory shape coming in 2026.

The cryptographic audit chain

Every transition on every card produces a signed record with the issuer identity (human or agent, both first-class), the timestamp, the prior state hash, and the transition payload. The signatures form a chain — any sequence of transitions is verifiable independently of PlanWright. The keys are managed per workspace; the verifier is public; the signature primitive is Ed25519; the chain export is a SOC 2-evidence-shape JSON bundle that drops cleanly into the evidence package your auditor will ask for.

The point of the cryptographic chain is not cryptographic theater. The point is that a SOC 2 auditor or a FedRAMP reviewer or an EU AI Act high-risk classification reviewer can verify the audit chain without trusting PlanWright. That is the wedge for regulated-industry adoption — the audit chain is not “our database said it happened,” it is mathematically defensible against a hostile-database-modification scenario. In the next eighteen months, every regulatory framework that touches AI-assisted software development is going to converge on the same question: who authorized this code change and who reviewed it before it shipped? PlanWright is built to make the answer to that question structurally complete and independently verifiable.

A precise distinction the launch holds: PlanWright is not itself SOC 2 / FedRAMP / ISO certified at launch. The wedge is that PlanWright produces the evidence shape those frameworks require for AI-agent-produced code. We are running the SOC 2 Type I process now and expect Type II audit-ready by Q4. The launch posture is the audit chain is the product, and the precision around what is and is not certified is part of the discipline.

The repository becomes a contract

When agents are doing the implementation, the repo needs to begin with — and maintain — relevant machine-readable context for the project. The conventions, the architectural intent, the business context, the economic constraints, the explicit out-of-scope boundaries. PlanWright treats CLAUDE.md, AGENTS.md, .cursorrules, .clinerules, .windsurfrules, README.md, and ARCHITECTURE.md as first-class context substrates, automatically discovers them on planwright_set_repo, and references them from every card so the agent reads the architectural intent before it reads the code. The team can also attach explicit per-Objective context files for the load-bearing cases — the technical RFC, the customer-language transcript, the schema fragment that anchors the Objective in the existing codebase.

The discipline that makes this work, and the discipline that takes the longest to internalize: the context file has to be good enough that a senior engineer who has never seen the codebase could read it and know what to do. That is the bar. An agent will execute against a thin context file — it will just produce thin work, and the accept-or-reject move at the end of the cycle will catch the thinness, and the cycle will burn an Objective slot. The cost of a thin context file is a wasted agent run plus a human review cycle. The cost of a thick context file is forty-five minutes of human + Claude Desktop work at the front of the cycle. The trade is enormously asymmetric in favor of thick context files, and the operator discipline of writing them is the single biggest predictor of throughput we have observed across the Velocity Launch portfolio.

Who is using it

The Velocity Launch portfolio — 13 products shipped in 13 days as of this morning across cybersecurity (SecureStackScan, SecureLink, CompliancePulse, CyberSavi LMS, CyberSavIQ, GovernAI, TrainTogether), discoverability (PodToSite, GEOPress), services / GTM (CogleGroup, CounselExpress, ChamberAdvance, PlanCheckers), agentic infrastructure (NameIntel), and a local-services game-day build (Swole Labor Services) — runs on PlanWright as the production-method substrate. Every accepted change in every one of those repos carries the structured Objective + context file + agent diff + human acceptance record. The numbers on the landing page are pulled from the actual PlanWright export — not estimated — and the audit chain on every accepted change is independently verifiable.

Outside the Velocity Launch portfolio, the first wave of early-access customers is running PlanWright on real production codebases this month. Several engineering organizations in cybersecurity, regulated legal, and financial services are in the early-access cohort; named references with consent on file appear on the landing page, and we are deliberately picking the early-access cohort for shape — engineering organizations that have audit obligations and are starting to run coding agents in production — not for volume. If you are at one of those companies and the chasm described above is the one you are standing at, the contact section below opens the conversation.

Three design choices that put PlanWright on the right side of the supervisor change

The agent is a first-class actor in the data model, not a footnote. The default posture across the SaaS PM world today is human-first: a Jira issue has an assignee that is a person, a Linear ticket has a worker that is a person, a GitHub PR has an author that is an authenticated GitHub user. PlanWright inverts that — the card has an issuer (human) and a worker (human or agent, both first-class), the transitions carry the worker identity, and the audit chain records the agent runtime and the prompt provenance on every claim. The data model was rewritten for the buyer change Sequoia named in Services: The New Software and the production-method change the Dark Factory essay frames. If the agent is going to do the work, the tool the agent claims work from has to treat the agent as a real actor, not a webhook.

The audit chain is built in, not bolted on. Most existing PM tools can produce an audit log on request; PlanWright produces a cryptographically signed audit chain by default, on every transition, with public verification. The difference is the difference between “our database says this happened” and “this happened and the signature chain proves it independently.” For regulated-industry adoption, that distinction is the wedge. The CISO / CTO / audit committee at a financial-services or healthcare or federal-contractor engineering organization is going to ask the same question in 2026: who authorized this code change and who reviewed it before it shipped? PlanWright answers that question structurally, and the answer is independently verifiable.

The repository is the contract. Every card references the repo’s machine-readable context substrate. The agent reads the context before it reads the code. If the context isn’t there, the agent makes it up and you get drift — which is the load-bearing argument for why the existing PM tools fail at the chasm, even when patched with custom glue. The PM tool has to make context a first-class entity. PlanWright does.

The thesis behind the launch — the supervisor change

PlanWright sits inside Theme #4 of the 30-day run — the Autopilot economy — and inside the single-venture theme Future of SDLC on the SEQUENCING calendar. Two long-form companions drop the same morning as this launch:

→ Crossing the Chasm to the Dark Factory — the thesis. Why every engineering organization is stuck at the same chasm, what the far side actually looks like, and why the bridge has to be cryptographically auditable. The launch is the product form of the essay; the essay is the conceptual setup for the launch. “PlanWright is the bridge to a well-supervised Dark Factory with industrial guard rails and observability.”

Day 18’s NameIntel launch proved the buyer-change face of the same thesis — the agent is a first-class buyer, paid per call in USDC, with no account and no subscription. Day 19’s PlanWright launch proves the supervisor-change face — the human is a first-class supervisor, with the audit-chain rigor regulated-industry organizations actually need. NameIntel and PlanWright are not two unrelated products; they are two faces of the same buyer-change/supervisor-change pair that defines the next eighteen months of the SDLC.

Pricing & business model

Free tier. One workspace, one repo, full audit chain on, agent-claim path enabled. No credit card. The free tier is real and useful and it is the right shape for an engineering team that wants to try the human-bookend pattern on one project this quarter.
Team — $39 per seat per month + per-agent-action metering. The human side is per seat; the agent side is metered per claim/diff/return at fractions of a cent. The agent doesn’t have a seat, doesn’t have a month, and doesn’t want a subscription it has to remember to cancel — same argument as NameIntel x402 yesterday. The bill itself is a cost-control instrument, not an opaque seat charge.
Enterprise. Private deployment, SSO, audit-export bundle that drops into a SOC 2 evidence package, named onboarding, custom signing-key custody policies. Contact todd@silverbackcto.com. The reference target is the regulated-industry engineering organization — financial services, healthcare, federal contractors, energy — where audit obligations are non-negotiable and the per-deployment install is worth the conversation.

The unit economics on the agent-side metering are deliberately set to clear margin at small per-action prices: the transition records are small, the storage is cheap, the signing is essentially free at the margin, and the upstream cost is dominated by the GitHub integration and the workspace key custody. The team-tier metering becomes profitable on agent-call volume — which is exactly the call pattern a human-bookend SDLC produces. The free tier subsidizes itself out of paid conversion within 60 days based on the early-access cohort data.

The Velocity Process notes

What Claude Code handled: the Next.js 16 + React 19 + Tailwind v4 marketing and board UI at planwright.tools — landing, board, card detail, audit-chain viewer, pricing, onboarding, the llms.txt and robots.txt written in the product’s own voice, the public verifier at /verify; the TypeScript MCP server (mcp-planwright) — JSON-RPC 2.0 server, twelve tool implementations with their input/output schemas, the tools/list discovery surface, the per-workspace auth and tenancy fence; the cryptographic audit-chain engine — Ed25519 signing per transition, per-workspace key rotation, public verifier, chain export in SOC 2-evidence-shape JSON; the card data model — Objective entity with verbose description, machine-readable acceptance criteria, repo-and-branch reference, context-file reference set, dependency graph, structured return-log slot; the GitHub integration — App-install model with short-lived tokens, PR creation, PR-status mirroring, commit-signing pass-through; the repo-context discovery and indexing layer; the agent identity surface with per-runtime auth and prompt-provenance recording; the Stripe + per-agent-action metering ledger; the AWS CDK v2 stack on VelocityStack — API Gateway HTTP API v2, ten Lambdas, the workspace + card + transition + context-file storage tier, KMS-backed key custody, Secrets Manager wiring, monthly budget alarm; the GitHub Actions OIDC deploy pipeline with staging + prod environments.

What required human judgement: the audit-chain primitive choice — the simpler launch would have been a tamper-evident database log with a public hash export and a “we’ll add signatures later” disclosure. The decision was to ship Ed25519 signatures on every transition on Day 1, even though the engineering cost was a week of additional work. The reason is that the wedge for regulated-industry adoption is the cryptographic chain — without it, PlanWright is “Jira with a webhook for agents,” and the regulated-industry CTO conversation never starts. The signatures are the product, not a feature flag. The agent identity surface — the path of least resistance was a single API token per workspace that any agent could carry; the decision was to authenticate per agent runtime with verifiable identity per claim, even though it requires per-runtime integration work that grows linearly with the agent ecosystem. The reason is that the audit chain has to record which agent did the work — not just “an agent” — for the evidence shape to be defensible. The context-substrate decision — the instinct was to require a PlanWright-proprietary context format (planwright.toml); the decision was to auto-discover the existing conventions (CLAUDE.md, AGENTS.md, .cursorrules, .clinerules, .windsurfrules, README.md, ARCHITECTURE.md) so PlanWright drops into an existing repo without a migration cost. The framing is PlanWright meets the repo where the repo lives; the operational payoff is that early-access installs take an afternoon, not a week. The release-vs-Objective decoupling — releases ship every 3–7 days, Objectives complete every few hours; running them as the same entity (the way Jira sometimes does) collapses the cadence; running them as decoupled entities (the way PlanWright does) is what lets a team run 20–60 Objectives per week per repo through the agent pool without the release cadence going haywire. The Velocity Launch portfolio is the proof.

What broke — and what the readiness pass caught: three integrity problems made it into the late drafts of the launch surface and were pulled before go-live. (1) The landing page initially carried a “SOC 2 Type II certified” badge under the audit-chain section — false at launch; PlanWright is running the Type I process now and expects Type II audit-ready by Q4. The badge was replaced with the precise wording “produces the audit-chain evidence shape your SOC 2 reviewer will require for AI-agent-produced code” and the HN first comment carries the certification gap as an explicit honest-disclosures line item. The Day 8 GovernAI “GA4 not wired” discipline applies here at full strength: name the certification gap before the commenter does. (2) The customer logo wall initially carried logos for two enterprises that had agreed verbally but not in writing to be named. Removed; only customers with written consent on file appear at launch. The Day 18 NameIntel “Trusted by 50+ naming firms” pre-launch failure mode is the same class; the discipline applies. (3) The Velocity Launch portfolio internal-use proof was initially stated as “30+ products” — a forward-looking claim. Tightened to “13 production-shipping products in 13 days” — the number that is actually true as of this morning. The forward-looking version is reserved for the Day 30 retrospective. The fourth issue the readiness pass surfaced was less an integrity break and more a scope question: the MCP server’s planwright_record_diff tool was returning a non-deterministic agent-self-assessment because the Haiku prompt that generated the self-assessment narrative was non-deterministic; the fix was to make the self-assessment a structured criterion-by-criterion pass/fail with optional free-text rationale, rather than a free-form narrative. The structured shape is dramatically easier for a human reviewer to scan and dramatically harder for an agent to “agree with itself” in the wrong way.

What I’d do differently: ship a public read-only audit-chain explorer alongside the board on Day 1, not in Week 2. The single strongest piece of social proof for a cryptographic-audit-chain product is letting a curious engineering leader paste in a card export and verify the signatures themselves, in their browser, without an account. The work to produce the explorer is small; the failure was treating it as a follow-up rather than as part of the launch surface. The Day 20 DALZDx launch should not repeat that pattern — whatever the load-bearing proof artifact is for DALZDx, it ships with the launch, not after.

What’s next this week

Day 19 (today): planwright.tools is live; mcp.planwright.tools is live; the cryptographic audit chain is signing every transition with Ed25519 signatures and the public verifier at /verify is up; both Substack companions are scheduled before 11 AM ET (Crossing the Chasm to the Dark Factory + The SDLC When Intelligence Is Cheap); the Product Hunt submission and the Show HN submission go up at 11 AM ET on the second pulse.
Day 20 (Wed May 20): was going to be — physician-built MSK clinical decision support, Non-Device CDS safe harbor. However, we have a gameday substitution!
This week: ship the public read-only audit-chain explorer (the gap the readiness pass surfaced); publish two named-customer install case studies (with consent on file); open the regulated-industry-CTO conversations from the Day 19 inbound; close the Nadeem Haider operator-build-out announcement with a follow-up post on the venture-formation side once the equity structure is signed; monitor the agent-call volume on mcp.planwright.tools and publish the first-week readout in Friday’s weekly recap.
Sunday (May 24): the Autopilot arc closes its 30-day window with the SDLC essay’s six-week-running production data published in full — agent-call volume, acceptance rate by Objective shape, hard-reject rate by context-file thickness, time-to-acceptance by agent runtime. The data is the proof; the essay is the publication shape.

Want to talk

If you run engineering at a regulated company — financial services, healthcare, federal, energy — and the chasm I described above is the one you are standing in front of, the early-access program is live at planwright.tools and the contact for direct conversations is todd@silverbackcto.com or my calendar. Bring the actual codebase and the actual audit requirements; we’ll talk about how the install looks. We are deliberately picking the early-access cohort for shape, not for volume.

If you build coding agents or run an agent runtime and you want PlanWright to be the work surface your agent claims from — point your client at mcp.planwright.tools, read the tool surface, file the integration. The MCP tool descriptions are written for an LLM caller; the schemas are public. Same email for partnership.

If you just want to try the human-bookend pattern on one project this quarter, the free tier is live at planwright.tools. One workspace, one repo, full audit chain on, no credit card. Connect your GitHub repo, set your CLAUDE.md as the context substrate, issue your first Objective, and let an agent claim it. The whole loop takes an afternoon to install.

PlanWright is co-built with Nadeem Haider and lives inside the Autopilot-economy thesis the next two weeks of the Velocity Launch are organized around. Live today at planwright.tools. Tomorrow: DALZDx — physician-built MSK clinical decision support. The Healthcare arc opens.