Quickstart
From a fresh project to a streamed reply, in three curl commands. Ingram
Cloud runs an AI assistant for each smith (your end-user): it holds
conversations, remembers facts per smith, calls tools, and pauses for human
approval. Your backend drives it over /v1; this console is where you watch
and debug it.
Every request needs two headers:
Authorization: Bearer <token>
IC-Api-Version: 2026-05-01
Base URL: https://api.cloud.ingram.tech.
1. Mint a tenant-admin token
In the console: Settings → API Keys → Tenant admin. The token is shown once. Store it as a server-side secret. It is bound to this project (the tenant lives inside the token, never in a URL), so it can never touch another project's data.
export IC_TOKEN="tha_live_…" # tenant-admin token (server-side only)
2. Add a model-provider key
Smiths run on a model-provider key. Add your own (BYOK) in Settings → Models
— an OpenAI, Anthropic, or Gemini key — to run on your provider account. A run
that resolves to a provider with no key available fails with model_key_missing.
3. Create a smith
One smith per end-user, per agent. external_id is your own user id; a smith is
keyed by (external_id, agent_id), so "ensure smith" on every login is
idempotent (one agent → one smith per user; several agents → one each).
# Authorization: tenant-admin token (server-side only)
curl https://api.cloud.ingram.tech/v1/smiths \
-H "Authorization: Bearer $IC_TOKEN" \
-H "IC-Api-Version: 2026-05-01" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: smith-create-user_123" \
-d '{
"external_id": "user_123",
"display_name": "Ada Lovelace",
"instructions": "You are a helpful assistant.",
"model": "claude-sonnet-4-6"
}'
# → 201 { "id": "smt_…", "external_id": "user_123",
# "config_source": "override", "agent_id": "agt_…", … }
Naming no agent_id lands the smith on your project's default agent; the
config fields you pass (instructions, model, …) ride on top as this smith's
overrides (config_source: "override"). When you have more than a handful of
smiths, design the behaviour once as an agent instead of
repeating it per smith.
4. Get a response
# Authorization: tenant-admin token (server-side only)
curl https://api.cloud.ingram.tech/v1/smiths/smt_…/runs \
-H "Authorization: Bearer $IC_TOKEN" \
-H "IC-Api-Version: 2026-05-01" \
-H "Content-Type: application/json" \
-d '{
"input": [{ "role": "user", "content": "Say hello!" }],
"thread_id": "chat_001"
}'
# → { "id": "run_…", "status": "completed",
# "output": { "content": "Hello! …" }, "usage": { … } }
Pass the same thread_id on the next turn and the smith keeps the
conversation; omit it and each run starts a fresh thread.
5. Stream it
Set "stream": true and read Server-Sent Events. For a first integration you
only need two event types. Append message.delta chunks, stop on
run.completed:
# Authorization: tenant-admin token (server-side only)
curl -N https://api.cloud.ingram.tech/v1/smiths/smt_…/runs \
-H "Authorization: Bearer $IC_TOKEN" \
-H "IC-Api-Version: 2026-05-01" \
-H "Content-Type: application/json" \
-d '{ "input": [{ "role": "user", "content": "Tell me a story." }],
"stream": true }'
# event: run.started data: {"v":1,"run_id":"run_…","smith_id":"smt_…",…}
# event: message.delta data: {"v":1,"run_id":"run_…","delta":"Once"}
# event: message.delta data: {"v":1,"run_id":"run_…","delta":" upon"}
# …
# event: run.completed data: {"v":1,"run_id":"run_…","stop_reason":"end_turn"}
The full envelope (tool calls, approvals, pauses) is on Runs & streaming.
6. Try the Playground
No code needed: the console's Playground is a chat against any smith. Pick a smith, type a message, and the reply streams in token by token, multi-turn, so the smith keeps context across the conversation. A side panel shows the live config the chat runs against (model, instructions, tools, memory), tool calls appear inline as they execute, and each turn links to its run detail page: status, transcript, and a timed span waterfall with per-call cost. "New chat" starts a fresh thread.
The Playground isn't a separate toy — every message is a real /v1 run. Flip
the side panel to View as API and it shows the exact, runnable
POST /v1/smiths/{id}/runs call behind the latest turn (with the live
smith_id and thread_id), so what you click here is what your backend sends.
On a brand-new account the Playground readies a smith keyed to you (the
signed-in operator) so there's always one to chat with — your own real smith,
not a shared default. In production you create one smith per end-user the same
way, keyed to their id.
Already have an OpenAI integration?
If you already run an OpenAI Chat Completions or Responses loop, adopting Ingram
Cloud is essentially one change: point baseURL at
https://api.cloud.ingram.tech/v1 and swap the key for a smith token (or a
tenant-admin token + the user field as the smith's external_id). Your existing
request keeps working — system/instructions, your client-side tools and their
results, multimodal image parts, and model (now the inference-LLM override) are all
honored as-is, and the stream is standard. See
OpenAI-compatible API for the full surface; the agent (its
persona, server-side MCP tools, memory) layers on top of the smith without changing
your call.
Where to go next
- Core concepts: the five nouns everything else builds on.
- Auth & tokens: before any browser or device talks to the API.
- Agents: design behaviour for a fleet, not per smith.
- Tools & approvals: let a smith call your code.