Now in private beta

AI agents that leave a paper trail.

Ingram Cloud runs a stateful, tool-using agent for every one of your end-users — and puts every run on the record. Each model call, tool, approval, and dollar, traced end to end and replayable. One REST API behind all of it.

Model-agnostic — bring your own OpenAI, Anthropic, or Gemini key.

run.tracerun_7fa2
00:00.0run.startedpriya · "where’s my refund for order #8821?"
00:00.4tool.executingorders.lookup(id: 8821)
00:01.1tool.completedshipped · refund eligible
00:01.2approval.requiredpausedrefund.issue($48.00)
00:02.6approval.resolvedapproved by you
00:02.9message.completed"Your $48 refund is on its way."
00:03.0run.completed5 steps · 3,812 tokens · $0.0241
Auditable by default

Nothing your agent does is a black box.

Hosted AI usually means handing a prompt to an opaque service and hoping. Here, every run is a recorded sequence of steps you can open, replay, and cost — and everything that happens lands on one append-only feed.

Trace every run, end to end

Each run is a recorded sequence of steps — every model call, tool invocation, and decision, timed and costed. Replay any of them.

run.started → tool.executing → run.completed

Account for every token

Usage and dollar cost are attributed down to the individual smith, so you always know which user spent what — and can meter it onward.

budget.threshold

Approve before it acts

Gate sensitive tools behind a human. Runs pause on approval, wait for your sign-off, and resume at the exact step they left off.

approval.required → approval.resolved
GET /v1/eventsappend-only
evt_9c4run.completedpriya12:04:21
evt_9c3approval.resolvedpriya12:04:18
evt_9c2approval.requiredpriya12:04:17
evt_9c1budget.thresholdmarco12:03:55
evt_9c0deployment.inboundalice12:03:40
evt_9bfrun.startedpriya12:04:14
One design, many users

Design once. Run a private one per person.

You design an agent — its instructions, model, tools, and memory — and publish versioned snapshots. For each of your end-users, Ingram Cloud runs an isolated clone of it: its own memory, conversations, and connections.

We call that running clone a smith. Roll a new version out to the whole fleet at once, or pin and override a single one. The token carries the tenant, so data never crosses between users or projects.

Agent · the design
support-concierge
instructionsmodeltoolsmemory
v7 · published
one each
Smiths · one per user
ppriya
live thread
mmarco
2 channels
aalice
312 memories
The runtime

The state and reach agents need in production.

Memory, tools, models, and channels — managed for you, behind one API and one console.

Memory that persists per person

Every smith keeps its own three-tier memory — core facts, recall, and archival history — so conversations resume where they left off. One user's data can never surface in another's.

Tools & MCP, server-side

Connect any MCP server or reach for the built-ins. Smiths call tools on the server, and each one's OAuth connections are stored in isolation — never in your app.

Model-agnostic, BYOK

Bring your own provider keys and pick the model per agent. Move between OpenAI, Anthropic, and Gemini without touching a line of your code.

Every channel, out of the box

Slack, Telegram, WhatsApp, and email. An inbound message wakes the right smith; the reply goes back to the same conversation, on the same thread.

Developer experience

An API-first platform, not a UI.

Everything in the console is the public /v1 REST API — the same surface you build on. Drive agents from your backend, or drop in the OpenAI-compatible endpoint and keep the SDK you already use.

  • OpenAI-compatible /v1/chat/completions — keep your SDK
  • Infrastructure as Code with the Pulumi provider
  • Signed webhooks for every lifecycle event
  • Idempotent writes and a versioned, dated API
  • Per-project isolation with cryptographically scoped tokens
  • Meter and bill your own customers on top
app.ts
// drop-in: point the OpenAI-compatible
// provider at a smith and stream
import { createOpenAICompatible }
from "@ai-sdk/openai-compatible";
import { streamText } from "ai";
const ingram = createOpenAICompatible({
name: "ingram",
baseURL: "https://api.cloud.ingram.tech/v1",
apiKey: SMITH_TOKEN,
});
const { textStream } = streamText({
model: ingram(""), // the smith's configured model
prompt,
});

Put an agent in your product this week.

Create a project, mint a token, and stream your first reply in minutes — every run on the record from the first one. No infrastructure to stand up.