Deploying your first Agent

Mon, 01 Jan 0001 00:00:00 +0000

This guide deploys a custom agent into a running Krypton cluster, verifies the gateway route, and shows where scaling signals come from. Krypton treats your container as a black box that speaks A2A, MCP, or plain HTTP.

Ports & endpoints — the two-minute mental model

Krypton exposes two HTTP services. Talk to the right one:

Port	Service	What lives here
8080	Gateway	All agent traffic. Anything under `/v1/agents/{ns}/{name}/*` is reverse-proxied to your pod — the protocol RPC at `/`, the A2A card at `/.well-known/agent-card.json`, OAuth callbacks at `/oauth/...`, MCP SSE streams, anything. The gateway strips the `/v1/agents/{ns}/{name}` prefix; your container sees the original sub-path.
8090	Control plane	Web UI and introspection APIs (`/v1/agents`, `/v1/agents/{ns}/{name}/mcp/tools`, `/v1/agents/{ns}/{name}/status`). Operator tooling only — never the path your clients use to invoke an agent.

Rule of thumb: if it’s a normal A2A / MCP / HTTP client, point it at :8080. If it’s a browser or kubectl-adjacent tool, :8090.

Deploy Your First LLM

Mon, 01 Jan 0001 00:00:00 +0000

Krypton lets you serve an LLM the same way you manage the rest of your cluster: declare a resource, let the controller create the workload, and send traffic through the gateway.

Krypton serves Hugging Face GGUF models with llama.cpp. A Model resource points at a repo and file, the controller creates a Deployment and Service, and Krypton exposes the model through OpenAI-compatible endpoints:

GET /v1/models
POST /v1/chat/completions
POST /v1/completions
POST /v1/embeddings

Any OpenAI SDK can use the gateway as its base_url.

Tutorials on Krypton Runtime

Deploying your first Agent

Ports & endpoints — the two-minute mental model

Deploy Your First LLM