Documentation

Get started in minutes.

Forge is a drop-in replacement. If you've used the OpenAI or Anthropic API, you already know how to use Forge.

1

Get your API key

Sign up and generate an API key from your dashboard. Keys start with rrt-burst-

2

Point your SDK at Forge

Python (OpenAI SDK)
# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

response = client.chat.completions.create(
    model="auto",  # or "fast", "gpt-4o", "claude-sonnet-4-20250514", etc.
    messages=[{"role": "user", "content": "Hello from Forge!"}],
)

print(response.choices[0].message.content)
cURL
curl https://forge-api.lanaai.io/v1/chat/completions \
  -H "Authorization: Bearer $FORGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'
Anthropic SDK
# pip install anthropic
import anthropic

client = anthropic.Anthropic(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello from Forge!"}],
)
3

That's it.

Forge handles routing, cost optimization, and failover automatically. Monitor your usage from the dashboard.

API Endpoints

POST
/v1/chat/completions

OpenAI-compatible chat completions. Supports streaming.

POST
/v1/messages

Anthropic Messages API compatible. Full streaming SSE support.

POST
/v1/embeddings

Generate embeddings with the same interface as OpenAI.

GET
/v1/models

List available models and their capabilities.

GET
/dashboard/usage

Your token usage, cost breakdown, and forecasted spend.

Model Aliases

Let Forge pick the best model for you, or request a specific one.

Alias Routes to Best for
"auto" Cost-optimized pick General use — cheapest that works
"fast" Smallest, fastest model Low latency, simple tasks
"reasoning" Strongest available model Complex analysis, coding, research
"embedding" Embedding model Vector embeddings for RAG

You can also use specific model names like "gpt-4o", "claude-sonnet-4-20250514", "llama-3.1-70b", etc.

Implementation Guides

Step-by-step examples for common production use cases.

Streaming Chat Responses

Stream tokens as they generate for real-time chat UIs. Works identically to the OpenAI streaming API.

Python — Streaming
from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

# Stream tokens as they generate
stream = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in 3 paragraphs"},
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Node.js / Next.js API Route

Add AI features to your web app with a Next.js API route. Works with Express, Fastify, or any Node.js framework.

Next.js — app/api/chat/route.ts
// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://forge-api.lanaai.io/v1",
    apiKey: process.env.FORGE_API_KEY,
});

export async function POST(req: Request) {
    const { messages } = await req.json();

    const stream = await client.chat.completions.create({
        model: "auto",
        messages,
        stream: true,
    });

    // Return as a streaming response
    return new Response(stream.toReadableStream(), {
        headers: { "Content-Type": "text/event-stream" },
    });
}

RAG with Embeddings

Build retrieval-augmented generation using Forge embeddings. Embed your documents, store in a vector database, and query with context.

Python — Embeddings + RAG
from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

# Step 1: Embed your documents
docs = ["Forge is an AI gateway", "It supports sovereign mode"]
embeddings = client.embeddings.create(
    model="embedding",
    input=docs,
)
vectors = [e.embedding for e in embeddings.data]
# Store vectors in your database (Pinecone, Weaviate, pgvector, etc.)

# Step 2: Query with context
query = "What is sovereign mode?"
query_vec = client.embeddings.create(model="embedding", input=query)
# Search your vector DB for similar docs...
relevant_docs = ["It supports sovereign mode"]  # from vector search

# Step 3: Generate answer with retrieved context
response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": f"Answer based on: {relevant_docs}"},
        {"role": "user", "content": query},
    ],
)

Tool Calling (Function Calling)

Let the model call your functions. Forge supports OpenAI-compatible tool calling for building AI agents and structured data extraction.

Python — Tool Calling
import json
from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

# Define your tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

# Model returns a tool call — execute it and send the result back
tool_call = response.choices[0].message.tool_calls[0]
print(f"Call: {tool_call.function.name}({tool_call.function.arguments})")

Sovereign Mode

Force all inference to self-hosted models. No data leaves LANA infrastructure. Available on Pro plans and above.

Per-request sovereign mode
# Option 1: Per-request via header
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this NDA"}],
    extra_headers={"X-Sovereign": "true"},
)

# Option 2: Org-level — enable in Settings > Data Sovereignty
# Once enabled, ALL requests from your org are automatically sovereign.
# No header needed.

What happens in sovereign mode: Forge routes exclusively to self-hosted models (Qwen3-VL-32B). If the self-hosted model is unavailable, Forge returns a 503 error instead of falling back to a third-party provider. Your data never leaves LANA infrastructure.

LangChain Integration

Use Forge as the LLM backend in your LangChain pipelines. Works with chains, agents, and retrieval.

Python — LangChain
# pip install langchain-openai
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Use Forge as your LLM
llm = ChatOpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
    model="auto",
)

# Use Forge for embeddings too
embeddings = OpenAIEmbeddings(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
    model="embedding",
)

# Works with any LangChain chain
response = llm.invoke("What is Forge?")
print(response.content)

Error Handling

Forge returns standard HTTP error codes. Handle them the same way you would with OpenAI.

Python — Error Handling
from openai import OpenAI, APIError, RateLimitError, AuthenticationError

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

try:
    response = client.chat.completions.create(
        model="auto",
        messages=[{"role": "user", "content": "Hello"}],
    )
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('Retry-After')}s")
except APIError as e:
    if e.status_code == 429:
        print("Monthly quota exceeded — upgrade your plan")
    elif e.status_code == 503:
        print("Sovereign mode: self-hosted model unavailable")
    else:
        print(f"API error: {e}")
Code Meaning Action
401 Invalid API key Check your API key in the dashboard
429 Rate limit or quota exceeded Wait for Retry-After header, or upgrade plan
404 Model not found Check /v1/models for available models
503 Sovereign mode — no internal model available Retry, or disable sovereign to allow external fallback
502 Upstream provider error Forge retries automatically; if persists, contact support

Response Headers

Forge adds useful headers to every response so you can monitor routing, usage, and limits.

Header Description
X-RateLimit-Limit Your plan's requests-per-minute limit
X-RateLimit-Remaining Requests remaining in the current minute
X-Request-ID Unique request ID for debugging and audit trail lookup
Retry-After Seconds to wait before retrying (on 429 responses)

Compatible Frameworks

Forge works with anything that supports OpenAI-compatible endpoints. No custom SDK required.

OpenAI SDK

Python & Node.js

Anthropic SDK

Python & Node.js

LangChain

Python & JS

LlamaIndex

Python

Vercel AI SDK

Next.js / React

OpenClaw

AI Agents

cURL / HTTP

Any language

Any OpenAI-compat

Just change base_url