AI agents that leave a paper trail.
Ingram Cloud runs a stateful, tool-using agent for every one of your end-users — and puts every run on the record. Each model call, tool, approval, and dollar, traced end to end and replayable. One REST API behind all of it.
Model-agnostic — bring your own OpenAI, Anthropic, or Gemini key.
Nothing your agent does is a black box.
Hosted AI usually means handing a prompt to an opaque service and hoping. Here, every run is a recorded sequence of steps you can open, replay, and cost — and everything that happens lands on one append-only feed.
Trace every run, end to end
Each run is a recorded sequence of steps — every model call, tool invocation, and decision, timed and costed. Replay any of them.
run.started → tool.executing → run.completedAccount for every token
Usage and dollar cost are attributed down to the individual smith, so you always know which user spent what — and can meter it onward.
budget.thresholdApprove before it acts
Gate sensitive tools behind a human. Runs pause on approval, wait for your sign-off, and resume at the exact step they left off.
approval.required → approval.resolvedDesign once. Run a private one per person.
You design an agent — its instructions, model, tools, and memory — and publish versioned snapshots. For each of your end-users, Ingram Cloud runs an isolated clone of it: its own memory, conversations, and connections.
We call that running clone a smith. Roll a new version out to the whole fleet at once, or pin and override a single one. The token carries the tenant, so data never crosses between users or projects.
The state and reach agents need in production.
Memory, tools, models, and channels — managed for you, behind one API and one console.
Memory that persists per person
Every smith keeps its own three-tier memory — core facts, recall, and archival history — so conversations resume where they left off. One user's data can never surface in another's.
Tools & MCP, server-side
Connect any MCP server or reach for the built-ins. Smiths call tools on the server, and each one's OAuth connections are stored in isolation — never in your app.
Model-agnostic, BYOK
Bring your own provider keys and pick the model per agent. Move between OpenAI, Anthropic, and Gemini without touching a line of your code.
Every channel, out of the box
Slack, Telegram, WhatsApp, and email. An inbound message wakes the right smith; the reply goes back to the same conversation, on the same thread.
An API-first platform, not a UI.
Everything in the console is the public /v1 REST API — the same surface you build on. Drive agents from your backend, or drop in the OpenAI-compatible endpoint and keep the SDK you already use.
- OpenAI-compatible /v1/chat/completions — keep your SDK
- Infrastructure as Code with the Pulumi provider
- Signed webhooks for every lifecycle event
- Idempotent writes and a versioned, dated API
- Per-project isolation with cryptographically scoped tokens
- Meter and bill your own customers on top
// drop-in: point the OpenAI-compatible// provider at a smith and streamimport { createOpenAICompatible } from "@ai-sdk/openai-compatible";import { streamText } from "ai"; const ingram = createOpenAICompatible({ name: "ingram", baseURL: "https://api.cloud.ingram.tech/v1", apiKey: SMITH_TOKEN,}); const { textStream } = streamText({ model: ingram(""), // the smith's configured model prompt,});Put an agent in your product this week.
Create a project, mint a token, and stream your first reply in minutes — every run on the record from the first one. No infrastructure to stand up.