Ingram Cloud

Documentation

Vercel AI SDK

Vercel AI SDK

@ingram-tech/ai-sdk-adapter is the batteries-included way to drive a smith from the Vercel AI SDK. It is a thin, idiomatic extension: a pre-configured provider plus small helpers for the three things Ingram Cloud adds on top — smith identity, server-side memory, and human-in-the-loop approvals.

It stands on the OpenAI-compatible API, so your app speaks the OpenAI Chat Completions wire format end to end. There is no custom SSE envelope to parse and no streamText replacement to learn — the smith looks like any other model. If you want the raw wire format, read that page; this one is the shortcut.

npm install @ingram-tech/ai-sdk-adapter ai

The provider

createIngramCloud() returns a normal AI SDK provider. The token names the smith (the end-user's instance), and the agent is the one that smith runs. The model id is the upstream inference LLM: pass "" to use the agent's configured model, or a model id like gpt-5.5 to override the LLM for that call.

import { createIngramCloud } from "@ingram-tech/ai-sdk-adapter";
import { streamText } from "ai";

// A per-smith token already names one smith — browser-safe.
const ingram = createIngramCloud({ apiKey: SMITH_TOKEN });

const result = streamText({ model: ingram(""), prompt: "Reset my password?" });
for await (const delta of result.textStream) process.stdout.write(delta);

It composes with everything in the SDK — streamText, generateText, structured output, agents — because it is just @ai-sdk/openai-compatible with Ingram Cloud's defaults (the IC-Api-Version header, identity, and memory) wired in. On the wire, ingram("") is exactly:

# Authorization: smith token (browser-safe, scoped to one smith)
curl https://api.cloud.ingram.tech/v1/chat/completions \
  -H "Authorization: Bearer $IC_SMITH_TOKEN" \
  -H "IC-Api-Version: 2026-05-01" \
  -H "Content-Type: application/json" \
  -d '{ "model": "", "stream": true,
        "messages": [{ "role": "user", "content": "Reset my password?" }] }'

In the console

A streaming chat route plus a browser hook is the whole integration. Your server holds the token and picks the smith; the browser is plain AI SDK pointed at that route.

The route (app/api/chat/route.ts) — resolve the smith from the session, never from the client:

import { createIngramCloud } from "@ingram-tech/ai-sdk-adapter";
import { convertToModelMessages, streamText } from "ai";

export async function POST(req: Request) {
  const { messages, conversationId } = await req.json();
  const { smithId } = await resolveVisitor(req); // your auth → smt_…

  const ingram = createIngramCloud({
    apiKey: process.env.IC_TOKEN!,        // tenant-admin token, server-side only
    smithId,                               // sent as IC-Smith-Id
    threadId: `chat_${conversationId}`,    // sent as IC-Thread-Id (memory)
  });

  const result = streamText({
    model: ingram(""),
    messages: convertToModelMessages(messages),
  });
  return result.toUIMessageStreamResponse();
}

The browser — the transport points at your route; approvalsSettled makes a paused turn resume on its own once every approval has a decision:

"use client";
import { useChat } from "@ai-sdk/react";
import { ingramCloudTransport, approvalsSettled } from "@ingram-tech/ai-sdk-adapter/react";

export function Chat() {
  const { messages, sendMessage } = useChat({
    transport: ingramCloudTransport({ api: "/api/chat" }),
    sendAutomaticallyWhen: approvalsSettled,
  });
  // …render messages, call sendMessage({ text })
}

Memory

Plain Chat Completions is stateless: the messages you send are the whole context. Pass a threadId and Ingram Cloud holds the conversation server-side (the IC-Thread-Id header) — you send only the new turn and memory is in play, exactly like a native run's thread_id. One thread per conversation; reuse it across turns.

const ingram = createIngramCloud({ apiKey: SMITH_TOKEN, threadId: `chat_${id}` });

Approvals

A tool the agent marks destructiveHint pauses the run for a human decision. On this surface the pause is a normal tool call whose id is "<run_id>::<tool_call_id>", and the turn ends with finish_reason: "tool_calls". Pull the pending approvals off the result and resume by appending a decision:

import { getApprovalRequests, approvalToolResult } from "@ingram-tech/ai-sdk-adapter";
import { generateText } from "ai";

const first = await generateText({ model: ingram(""), messages });
const approvals = getApprovalRequests(first.toolCalls);

if (approvals.length) {
  const decided = await askTheHuman(approvals); // your UI or policy
  await generateText({
    model: ingram(""),
    messages: [
      ...messages,
      ...decided.map((d) => approvalToolResult(d.request, d.ok ? "approve" : "reject")),
    ],
  });
}

On approve, Ingram Cloud runs the tool itself and continues; on reject, the run completes with stop_reason: "approval_rejected" and nothing executes. In a useChat UI the same pause shows up as a tool call awaiting approval; answer it and approvalsSettled resubmits the decision for you. This is the same approval a deployment confirmation or /submit drives — one mechanism, several front doors.

Tools

Two models, both standard:

  • Client-side tools — define them with the AI SDK's tool() and pass them to streamText/generateText exactly as with any provider. They're sent on the request; the model's calls come back as tool calls for you to run, and the SDK loops by re-sending the conversation. Ingram Cloud executes nothing — your loop owns it. This is the literal OpenAI function-call contract; no Ingram-specific setup.

    import { tool } from "ai";
    import { z } from "zod";
    
    const result = streamText({
      model: ingram(""),
      messages,
      tools: { get_weather: tool({ description: "…", inputSchema: z.object({ city: z.string() }) }) },
    });
    

    A turn that sends tools runs only those client tools (the agent still supplies instructions, but its server-side tools/memory sit out that turn).

  • Server-side tools (MCP) — Ingram Cloud calls your MCP server and runs the tools for you, with approval gating (see Approvals). Don't pass tools; register the server once and it's available to the smith automatically. Use this for shared/remote tools you don't want to run in the client.

Identity & tokens

TokenUseHow the smith is chosen
Smith token (sub = "<tenant>:<smith>")browser-safe; the defaultthe token is the smith
Tenant-admin tokenserver-side onlypass smithId (sent as IC-Smith-Id)

The agent is the one the smith runs — chosen by the smith, never by an argument. The model argument is the upstream inference LLM: pass "" to use the agent's configured model, or a model id (e.g. gpt-5.5) to override the LLM for that call. Without a resolvable smith the call returns 400 smith_unresolved.

When you need more

  • Raw wire format, the openai SDK, or no extra dependencyOpenAI-compatible API.
  • Live tool-progress frames (tool.executing / tool.completed) aren't carried on the standard surface yet. The adapter's opt-in @ingram-tech/ai-sdk-adapter/native entry point parses the native run envelope into a UI message stream for that case. Prefer the standard provider otherwise.