Ingram Cloud

Documentation

Quickstart

Quickstart

From a fresh project to a streamed reply, in three curl commands. Ingram Cloud runs an AI assistant for each smith (your end-user): it holds conversations, remembers facts per smith, calls tools, and pauses for human approval. Your backend drives it over /v1; this console is where you watch and debug it.

Every request needs two headers:

Authorization: Bearer <token>
IC-Api-Version: 2026-05-01

Base URL: https://api.cloud.ingram.tech.

1. Mint a tenant-admin token

In the console: Settings → API Keys → Tenant admin. The token is shown once. Store it as a server-side secret. It is bound to this project (the tenant lives inside the token, never in a URL), so it can never touch another project's data.

export IC_TOKEN="tha_live_…"   # tenant-admin token (server-side only)

2. Add a model-provider key

Smiths run on a model-provider key. Add your own (BYOK) in Settings → Models — an OpenAI, Anthropic, or Gemini key — to run on your provider account. A run that resolves to a provider with no key available fails with model_key_missing.

3. Create a smith

One smith per end-user, per agent. external_id is your own user id; a smith is keyed by (external_id, agent_id), so "ensure smith" on every login is idempotent (one agent → one smith per user; several agents → one each).

# Authorization: tenant-admin token (server-side only)
curl https://api.cloud.ingram.tech/v1/smiths \
  -H "Authorization: Bearer $IC_TOKEN" \
  -H "IC-Api-Version: 2026-05-01" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: smith-create-user_123" \
  -d '{
    "external_id": "user_123",
    "display_name": "Ada Lovelace",
    "instructions": "You are a helpful assistant.",
    "model": "claude-sonnet-4-6"
  }'
# → 201 { "id": "smt_…", "external_id": "user_123",
#         "config_source": "override", "agent_id": "agt_…", … }

Naming no agent_id lands the smith on your project's default agent; the config fields you pass (instructions, model, …) ride on top as this smith's overrides (config_source: "override"). When you have more than a handful of smiths, design the behaviour once as an agent instead of repeating it per smith.

4. Get a response

# Authorization: tenant-admin token (server-side only)
curl https://api.cloud.ingram.tech/v1/smiths/smt_…/runs \
  -H "Authorization: Bearer $IC_TOKEN" \
  -H "IC-Api-Version: 2026-05-01" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [{ "role": "user", "content": "Say hello!" }],
    "thread_id": "chat_001"
  }'
# → { "id": "run_…", "status": "completed",
#     "output": { "content": "Hello! …" }, "usage": { … } }

Pass the same thread_id on the next turn and the smith keeps the conversation; omit it and each run starts a fresh thread.

5. Stream it

Set "stream": true and read Server-Sent Events. For a first integration you only need two event types. Append message.delta chunks, stop on run.completed:

# Authorization: tenant-admin token (server-side only)
curl -N https://api.cloud.ingram.tech/v1/smiths/smt_…/runs \
  -H "Authorization: Bearer $IC_TOKEN" \
  -H "IC-Api-Version: 2026-05-01" \
  -H "Content-Type: application/json" \
  -d '{ "input": [{ "role": "user", "content": "Tell me a story." }],
        "stream": true }'

# event: run.started     data: {"v":1,"run_id":"run_…","smith_id":"smt_…",…}
# event: message.delta   data: {"v":1,"run_id":"run_…","delta":"Once"}
# event: message.delta   data: {"v":1,"run_id":"run_…","delta":" upon"}
# …
# event: run.completed   data: {"v":1,"run_id":"run_…","stop_reason":"end_turn"}

The full envelope (tool calls, approvals, pauses) is on Runs & streaming.

6. Try the Playground

No code needed: the console's Playground is a chat against any smith. Pick a smith, type a message, and the reply streams in token by token, multi-turn, so the smith keeps context across the conversation. A side panel shows the live config the chat runs against (model, instructions, tools, memory), tool calls appear inline as they execute, and each turn links to its run detail page: status, transcript, and a timed span waterfall with per-call cost. "New chat" starts a fresh thread.

The Playground isn't a separate toy — every message is a real /v1 run. Flip the side panel to View as API and it shows the exact, runnable POST /v1/smiths/{id}/runs call behind the latest turn (with the live smith_id and thread_id), so what you click here is what your backend sends. On a brand-new account the Playground readies a smith keyed to you (the signed-in operator) so there's always one to chat with — your own real smith, not a shared default. In production you create one smith per end-user the same way, keyed to their id.

Already have an OpenAI integration?

If you already run an OpenAI Chat Completions or Responses loop, adopting Ingram Cloud is essentially one change: point baseURL at https://api.cloud.ingram.tech/v1 and swap the key for a smith token (or a tenant-admin token + the user field as the smith's external_id). Your existing request keeps working — system/instructions, your client-side tools and their results, multimodal image parts, and model (now the inference-LLM override) are all honored as-is, and the stream is standard. See OpenAI-compatible API for the full surface; the agent (its persona, server-side MCP tools, memory) layers on top of the smith without changing your call.

Where to go next