Vercel AI SDK
@ingram-tech/ai-sdk-adapter is the batteries-included way to drive a smith from
the Vercel AI SDK. It is a thin, idiomatic extension: a
pre-configured provider plus small helpers for the three things Ingram Cloud adds
on top — smith identity, server-side memory, and human-in-the-loop approvals.
It stands on the OpenAI-compatible API, so your app speaks
the OpenAI Chat Completions wire format end to end. There is no custom SSE
envelope to parse and no streamText replacement to learn — the smith looks
like any other model. If you want the raw wire format, read that page; this one
is the shortcut.
npm install @ingram-tech/ai-sdk-adapter ai
The provider
createIngramCloud() returns a normal AI SDK provider. The token names the
smith (the end-user's instance), and the agent is the one that smith runs.
The model id is the upstream inference LLM: pass "" to use the agent's configured
model, or a model id like gpt-5.5 to override the LLM for that call.
import { createIngramCloud } from "@ingram-tech/ai-sdk-adapter";
import { streamText } from "ai";
// A per-smith token already names one smith — browser-safe.
const ingram = createIngramCloud({ apiKey: SMITH_TOKEN });
const result = streamText({ model: ingram(""), prompt: "Reset my password?" });
for await (const delta of result.textStream) process.stdout.write(delta);
It composes with everything in the SDK — streamText, generateText, structured
output, agents — because it is just @ai-sdk/openai-compatible with Ingram
Cloud's defaults (the IC-Api-Version header, identity, and memory) wired in.
On the wire, ingram("") is exactly:
# Authorization: smith token (browser-safe, scoped to one smith)
curl https://api.cloud.ingram.tech/v1/chat/completions \
-H "Authorization: Bearer $IC_SMITH_TOKEN" \
-H "IC-Api-Version: 2026-05-01" \
-H "Content-Type: application/json" \
-d '{ "model": "", "stream": true,
"messages": [{ "role": "user", "content": "Reset my password?" }] }'
In the console
A streaming chat route plus a browser hook is the whole integration. Your server holds the token and picks the smith; the browser is plain AI SDK pointed at that route.
The route (app/api/chat/route.ts) — resolve the smith from the session,
never from the client:
import { createIngramCloud } from "@ingram-tech/ai-sdk-adapter";
import { convertToModelMessages, streamText } from "ai";
export async function POST(req: Request) {
const { messages, conversationId } = await req.json();
const { smithId } = await resolveVisitor(req); // your auth → smt_…
const ingram = createIngramCloud({
apiKey: process.env.IC_TOKEN!, // tenant-admin token, server-side only
smithId, // sent as IC-Smith-Id
threadId: `chat_${conversationId}`, // sent as IC-Thread-Id (memory)
});
const result = streamText({
model: ingram(""),
messages: convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse();
}
The browser — the transport points at your route; approvalsSettled makes a
paused turn resume on its own once every approval has a decision:
"use client";
import { useChat } from "@ai-sdk/react";
import { ingramCloudTransport, approvalsSettled } from "@ingram-tech/ai-sdk-adapter/react";
export function Chat() {
const { messages, sendMessage } = useChat({
transport: ingramCloudTransport({ api: "/api/chat" }),
sendAutomaticallyWhen: approvalsSettled,
});
// …render messages, call sendMessage({ text })
}
Memory
Plain Chat Completions is stateless: the messages you send are the whole
context. Pass a threadId and Ingram Cloud holds the conversation server-side
(the IC-Thread-Id header) — you send only the new turn and
memory is in play, exactly like a native run's thread_id. One
thread per conversation; reuse it across turns.
const ingram = createIngramCloud({ apiKey: SMITH_TOKEN, threadId: `chat_${id}` });
Approvals
A tool the agent marks destructiveHint pauses the run for a human decision. On
this surface the pause is a normal tool call whose id is
"<run_id>::<tool_call_id>", and the turn ends with finish_reason: "tool_calls". Pull the pending approvals off the result and resume by appending
a decision:
import { getApprovalRequests, approvalToolResult } from "@ingram-tech/ai-sdk-adapter";
import { generateText } from "ai";
const first = await generateText({ model: ingram(""), messages });
const approvals = getApprovalRequests(first.toolCalls);
if (approvals.length) {
const decided = await askTheHuman(approvals); // your UI or policy
await generateText({
model: ingram(""),
messages: [
...messages,
...decided.map((d) => approvalToolResult(d.request, d.ok ? "approve" : "reject")),
],
});
}
On approve, Ingram Cloud runs the tool itself and continues; on reject, the
run completes with stop_reason: "approval_rejected" and nothing executes. In a
useChat UI the same pause shows up as a tool call awaiting approval; answer it
and approvalsSettled resubmits the decision for you. This is the same approval a
deployment confirmation or /submit drives — one mechanism, several
front doors.
Tools
Two models, both standard:
-
Client-side tools — define them with the AI SDK's
tool()and pass them tostreamText/generateTextexactly as with any provider. They're sent on the request; the model's calls come back as tool calls for you to run, and the SDK loops by re-sending the conversation. Ingram Cloud executes nothing — your loop owns it. This is the literal OpenAI function-call contract; no Ingram-specific setup.import { tool } from "ai"; import { z } from "zod"; const result = streamText({ model: ingram(""), messages, tools: { get_weather: tool({ description: "…", inputSchema: z.object({ city: z.string() }) }) }, });A turn that sends
toolsruns only those client tools (the agent still supplies instructions, but its server-side tools/memory sit out that turn). -
Server-side tools (MCP) — Ingram Cloud calls your MCP server and runs the tools for you, with approval gating (see Approvals). Don't pass
tools; register the server once and it's available to the smith automatically. Use this for shared/remote tools you don't want to run in the client.
Identity & tokens
| Token | Use | How the smith is chosen |
|---|---|---|
Smith token (sub = "<tenant>:<smith>") | browser-safe; the default | the token is the smith |
| Tenant-admin token | server-side only | pass smithId (sent as IC-Smith-Id) |
The agent is the one the smith runs — chosen by the smith, never by an argument.
The model argument is the upstream inference LLM: pass "" to use the agent's
configured model, or a model id (e.g. gpt-5.5) to override the LLM for that call.
Without a resolvable smith the call returns 400 smith_unresolved.
When you need more
- Raw wire format, the
openaiSDK, or no extra dependency → OpenAI-compatible API. - Live tool-progress frames (
tool.executing/tool.completed) aren't carried on the standard surface yet. The adapter's opt-in@ingram-tech/ai-sdk-adapter/nativeentry point parses the native run envelope into a UI message stream for that case. Prefer the standard provider otherwise.