Documentation

Get started in minutes.

Forge is a drop-in replacement. If you've used the OpenAI or Anthropic API, you already know how to use Forge.

Get your API key

Point your SDK at Forge

Python (OpenAI SDK)

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

response = client.chat.completions.create(
    model="auto",  # or "fast", "gpt-4o", "claude-sonnet-4-20250514", etc.
    messages=[{"role": "user", "content": "Hello from Forge!"}],
)

print(response.choices[0].message.content)

cURL

curl https://forge-api.lanaai.io/v1/chat/completions \
  -H "Authorization: Bearer $FORGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Anthropic SDK

# pip install anthropic
import anthropic

client = anthropic.Anthropic(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello from Forge!"}],
)

That's it.

Forge handles routing, cost optimization, and failover automatically. Monitor your usage from the dashboard.

API Endpoints

POST

/v1/chat/completions

OpenAI-compatible chat completions. Supports streaming.

POST

/v1/messages

Anthropic Messages API compatible. Full streaming SSE support.

POST

/v1/embeddings

Generate embeddings with the same interface as OpenAI.

GET

/v1/models

List available models and their capabilities.

GET

/dashboard/usage

Your token usage, cost breakdown, and forecasted spend.

Model Aliases

Let Forge pick the best model for you, or request a specific one.

Alias	Routes to	Best for
`"auto"`	Cost-optimized pick	General use — cheapest that works
`"fast"`	Smallest, fastest model	Low latency, simple tasks
`"reasoning"`	Strongest available model	Complex analysis, coding, research
`"embedding"`	Embedding model	Vector embeddings for RAG

You can also use specific model names like "gpt-4o", "claude-sonnet-4-20250514", "llama-3.1-70b", etc.

Implementation Guides

Step-by-step examples for common production use cases.

Streaming Chat Responses

Stream tokens as they generate for real-time chat UIs. Works identically to the OpenAI streaming API.

Python — Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

# Stream tokens as they generate
stream = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in 3 paragraphs"},
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Node.js / Next.js API Route

Add AI features to your web app with a Next.js API route. Works with Express, Fastify, or any Node.js framework.

Next.js — app/api/chat/route.ts

// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://forge-api.lanaai.io/v1",
    apiKey: process.env.FORGE_API_KEY,
});

export async function POST(req: Request) {
    const { messages } = await req.json();

    const stream = await client.chat.completions.create({
        model: "auto",
        messages,
        stream: true,
    });

    // Return as a streaming response
    return new Response(stream.toReadableStream(), {
        headers: { "Content-Type": "text/event-stream" },
    });
}

RAG with Embeddings

Build retrieval-augmented generation using Forge embeddings. Embed your documents, store in a vector database, and query with context.

Python — Embeddings + RAG

from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

# Step 1: Embed your documents
docs = ["Forge is an AI gateway", "It supports sovereign mode"]
embeddings = client.embeddings.create(
    model="embedding",
    input=docs,
)
vectors = [e.embedding for e in embeddings.data]
# Store vectors in your database (Pinecone, Weaviate, pgvector, etc.)

# Step 2: Query with context
query = "What is sovereign mode?"
query_vec = client.embeddings.create(model="embedding", input=query)
# Search your vector DB for similar docs...
relevant_docs = ["It supports sovereign mode"]  # from vector search

# Step 3: Generate answer with retrieved context
response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": f"Answer based on: {relevant_docs}"},
        {"role": "user", "content": query},
    ],
)

Tool Calling (Function Calling)

Let the model call your functions. Forge supports OpenAI-compatible tool calling for building AI agents and structured data extraction.

Python — Tool Calling

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

# Define your tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

# Model returns a tool call — execute it and send the result back
tool_call = response.choices[0].message.tool_calls[0]
print(f"Call: {tool_call.function.name}({tool_call.function.arguments})")

Sovereign Mode

Force all inference to self-hosted models. No data leaves LANA infrastructure. Available on Pro plans and above.

Per-request sovereign mode

# Option 1: Per-request via header
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this NDA"}],
    extra_headers={"X-Sovereign": "true"},
)

# Option 2: Org-level — enable in Settings > Data Sovereignty
# Once enabled, ALL requests from your org are automatically sovereign.
# No header needed.

What happens in sovereign mode: Forge routes exclusively to self-hosted models (Qwen3-VL-32B). If the self-hosted model is unavailable, Forge returns a 503 error instead of falling back to a third-party provider. Your data never leaves LANA infrastructure.

LangChain Integration

Use Forge as the LLM backend in your LangChain pipelines. Works with chains, agents, and retrieval.

Python — LangChain

# pip install langchain-openai
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Use Forge as your LLM
llm = ChatOpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
    model="auto",
)

# Use Forge for embeddings too
embeddings = OpenAIEmbeddings(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
    model="embedding",
)

# Works with any LangChain chain
response = llm.invoke("What is Forge?")
print(response.content)

Error Handling

Forge returns standard HTTP error codes. Handle them the same way you would with OpenAI.

Python — Error Handling

from openai import OpenAI, APIError, RateLimitError, AuthenticationError

client = OpenAI(
    base_url="https://forge-api.lanaai.io/v1",
    api_key="your-forge-api-key",
)

try:
    response = client.chat.completions.create(
        model="auto",
        messages=[{"role": "user", "content": "Hello"}],
    )
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('Retry-After')}s")
except APIError as e:
    if e.status_code == 429:
        print("Monthly quota exceeded — upgrade your plan")
    elif e.status_code == 503:
        print("Sovereign mode: self-hosted model unavailable")
    else:
        print(f"API error: {e}")

Code	Meaning	Action
`401`	Invalid API key	Check your API key in the dashboard
`429`	Rate limit or quota exceeded	Wait for Retry-After header, or upgrade plan
`404`	Model not found	Check `/v1/models` for available models
`503`	Sovereign mode — no internal model available	Retry, or disable sovereign to allow external fallback
`502`	Upstream provider error	Forge retries automatically; if persists, contact support

Response Headers

Forge adds useful headers to every response so you can monitor routing, usage, and limits.

Header	Description
`X-RateLimit-Limit`	Your plan's requests-per-minute limit
`X-RateLimit-Remaining`	Requests remaining in the current minute
`X-Request-ID`	Unique request ID for debugging and audit trail lookup
`Retry-After`	Seconds to wait before retrying (on 429 responses)

Compatible Frameworks

Forge works with anything that supports OpenAI-compatible endpoints. No custom SDK required.

OpenAI SDK

Python & Node.js

Anthropic SDK

Python & Node.js

LangChain

Python & JS

LlamaIndex

Python

Vercel AI SDK

Next.js / React

OpenClaw

AI Agents

cURL / HTTP

Any language

Any OpenAI-compat

Just change base_url