ProductMay 13, 2026 · KlyHub Team

Mode 2 architecture, explained

Most SaaS-meets-AI products are built on what we call Mode 1: bundle inference, mark up tokens, expose a chat surface to the customer. The vendor takes the margin on each token your AI generates. The customer gets convenience.

KlyHub is built on Mode 2. The customer brings their own AI subscription. We hold zero inference cost. We hold zero AI risk. We compete on context quality and structural integrity, not on who can resell GPT-4-mini the cheapest.

This essay explains the architectural consequences of that choice — because "BYO-AI" is a marketing line that hides a lot of careful engineering decisions underneath.

The shape of Mode 1

A Mode 1 architecture looks roughly like this:

Customer browser ─→ SaaS API ─→ LLM API ($)
                       │
                       └──→ DB (knowledge, history)

The vendor pays the LLM bill. The vendor decides the model. The vendor decides the prompt structure. The vendor's gross margin is whatever's left between the customer's subscription and the inference invoice. As model prices drop, the arbitrage shrinks. As model prices rise (or models switch from per-token to per-call), the vendor either eats the hit or raises prices and watches churn.

Mode 1 is also a closed loop. The customer doesn't easily plug in another LLM. They can't redirect spend to the AI they already pay for. They're stuck inside the vendor's chat surface, with the vendor's model selection, on the vendor's roadmap.

The shape of Mode 2

Mode 2 inverts the diagram:

Customer AI client (Claude / ChatGPT / Cursor) ─→ KlyHub MCP ─→ DB
                                                       │
                                                       └─→ tenant context

The customer's existing AI subscription does the inference. KlyHub serves context over the Model Context Protocol — an open standard that all the major AI clients now implement. The customer connects once, in their AI client's settings, with a per-tenant URL like mcp.klyhub.com/v1/acme-corp. From then on, every conversation the customer has with their AI has access to the full, structured, version-controlled company knowledge base.

We bill the customer for the context layer. They keep paying their AI vendor for inference. Neither side has a margin squeeze when token prices move.

The engineering consequences

Mode 2 isn't just a billing change — it shapes every layer of the system.

1. No prompt engineering surface

We don't have a chat UI. We don't have system prompts. We don't have model selection. We deliver structured data via MCP tools (search_entities, get_entity, list_motions) and the customer's AI client handles all the generative work. Our QA story is dramatically simpler: assert that the tools return correct data, period. No "the model said something weird" tickets.

2. Permissions are honest

In a Mode 1 chat, the vendor's backend hits the LLM with whatever context it needs. The user has no idea what was sent. Mode 2 makes the data transfer explicit: the AI client tells the user "I want to call search_entities", the user approves (or auto-approves), and the data flows. Audit trail at the tool-call level. No mysterious prompt leakage.

3. Multi-tenant isolation works

Because we don't pass tenant data into a shared LLM session pool, we don't have to worry about prompt-injection cross-tenant leakage. Every MCP request is signed with a tenant-scoped OAuth token. Every database query is wrapped in a SET LOCAL app.tenant_id transaction. Row-level security in Postgres catches anything that slipped through. The customer literally cannot see another tenant's data, even if they wanted to.

4. The data layer is the moat

Mode 1 vendors compete on UX polish and prompt tuning. Mode 2 vendors compete on the quality, structure, and durability of the data they expose. Our entire product investment is in the 5-layer ontology, the AI-guided intake, the revision history, the permissions model. The customer's AI keeps getting smarter (model providers will see to that); our job is to keep giving it better material to work with.

5. We don't lose customers when a new AI ships

The day a new frontier model drops, our customers can switch their AI subscription and KlyHub still works exactly the same. We're not betting on any particular model winning. We're betting on "structured context for AIs" being a permanent need regardless of which AI is winning this quarter.

What we give up

Mode 2 isn't free. We lose three things:

The chat UX moment. When a competitor demoes their slick chat, we show MCP wiring. It's less viscerally impressive at first glance. We trade it for compounding context value.
The inference margin. Some vendors will subsidize early customers by taking a loss on inference. We can't match that with cash; we have to match it with product depth.
The "all in one app" pitch. Customers using KlyHub are going to spend most of their time in someone else's UI. We have to be okay with being a back-end service that quietly improves their existing tools.

Why this matters now

The MCP standard makes Mode 2 viable. Without it, the only way to give an AI client your data was to copy-paste, upload-file, or build a one-off integration per client. With MCP, one server reaches all major clients. The unit economics of building a Mode 2 product changed in late 2025; we're shipping the playbook that change unlocked.

If you're building anything in the AI space, the Mode 1 / Mode 2 fork is the strategic decision that matters most. Picking right shapes the next decade of your business.