Documentation
Get started in minutes.
Forge is a drop-in replacement. If you've used the OpenAI or Anthropic API, you already know how to use Forge.
Get your API key
Sign up and generate an API key from your dashboard. Keys start with rrt-burst-
Point your SDK at Forge
# pip install openai from openai import OpenAI client = OpenAI( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", ) response = client.chat.completions.create( model="auto", # or "fast", "gpt-4o", "claude-sonnet-4-20250514", etc. messages=[{"role": "user", "content": "Hello from Forge!"}], ) print(response.choices[0].message.content)
curl https://forge-api.lanaai.io/v1/chat/completions \ -H "Authorization: Bearer $FORGE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}], "stream": true }'
# pip install anthropic import anthropic client = anthropic.Anthropic( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", ) message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Hello from Forge!"}], )
That's it.
Forge handles routing, cost optimization, and failover automatically. Monitor your usage from the dashboard.
API Endpoints
/v1/chat/completions
OpenAI-compatible chat completions. Supports streaming.
/v1/messages
Anthropic Messages API compatible. Full streaming SSE support.
/v1/embeddings
Generate embeddings with the same interface as OpenAI.
/v1/models
List available models and their capabilities.
/dashboard/usage
Your token usage, cost breakdown, and forecasted spend.
Model Aliases
Let Forge pick the best model for you, or request a specific one.
| Alias | Routes to | Best for |
|---|---|---|
"auto" |
Cost-optimized pick | General use — cheapest that works |
"fast" |
Smallest, fastest model | Low latency, simple tasks |
"reasoning" |
Strongest available model | Complex analysis, coding, research |
"embedding" |
Embedding model | Vector embeddings for RAG |
You can also use specific model names like "gpt-4o", "claude-sonnet-4-20250514", "llama-3.1-70b", etc.
Implementation Guides
Step-by-step examples for common production use cases.
Streaming Chat Responses
Stream tokens as they generate for real-time chat UIs. Works identically to the OpenAI streaming API.
from openai import OpenAI client = OpenAI( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", ) # Stream tokens as they generate stream = client.chat.completions.create( model="auto", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain machine learning in 3 paragraphs"}, ], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
Node.js / Next.js API Route
Add AI features to your web app with a Next.js API route. Works with Express, Fastify, or any Node.js framework.
// npm install openai import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://forge-api.lanaai.io/v1", apiKey: process.env.FORGE_API_KEY, }); export async function POST(req: Request) { const { messages } = await req.json(); const stream = await client.chat.completions.create({ model: "auto", messages, stream: true, }); // Return as a streaming response return new Response(stream.toReadableStream(), { headers: { "Content-Type": "text/event-stream" }, }); }
RAG with Embeddings
Build retrieval-augmented generation using Forge embeddings. Embed your documents, store in a vector database, and query with context.
from openai import OpenAI client = OpenAI( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", ) # Step 1: Embed your documents docs = ["Forge is an AI gateway", "It supports sovereign mode"] embeddings = client.embeddings.create( model="embedding", input=docs, ) vectors = [e.embedding for e in embeddings.data] # Store vectors in your database (Pinecone, Weaviate, pgvector, etc.) # Step 2: Query with context query = "What is sovereign mode?" query_vec = client.embeddings.create(model="embedding", input=query) # Search your vector DB for similar docs... relevant_docs = ["It supports sovereign mode"] # from vector search # Step 3: Generate answer with retrieved context response = client.chat.completions.create( model="auto", messages=[ {"role": "system", "content": f"Answer based on: {relevant_docs}"}, {"role": "user", "content": query}, ], )
Tool Calling (Function Calling)
Let the model call your functions. Forge supports OpenAI-compatible tool calling for building AI agents and structured data extraction.
import json from openai import OpenAI client = OpenAI( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", ) # Define your tools tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } }] response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "What's the weather in Tokyo?"}], tools=tools, ) # Model returns a tool call — execute it and send the result back tool_call = response.choices[0].message.tool_calls[0] print(f"Call: {tool_call.function.name}({tool_call.function.arguments})")
Sovereign Mode
Force all inference to self-hosted models. No data leaves LANA infrastructure. Available on Pro plans and above.
# Option 1: Per-request via header response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Summarize this NDA"}], extra_headers={"X-Sovereign": "true"}, ) # Option 2: Org-level — enable in Settings > Data Sovereignty # Once enabled, ALL requests from your org are automatically sovereign. # No header needed.
What happens in sovereign mode: Forge routes exclusively to self-hosted models (Qwen3-VL-32B). If the self-hosted model is unavailable, Forge returns a 503 error instead of falling back to a third-party provider. Your data never leaves LANA infrastructure.
LangChain Integration
Use Forge as the LLM backend in your LangChain pipelines. Works with chains, agents, and retrieval.
# pip install langchain-openai from langchain_openai import ChatOpenAI, OpenAIEmbeddings # Use Forge as your LLM llm = ChatOpenAI( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", model="auto", ) # Use Forge for embeddings too embeddings = OpenAIEmbeddings( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", model="embedding", ) # Works with any LangChain chain response = llm.invoke("What is Forge?") print(response.content)
Error Handling
Forge returns standard HTTP error codes. Handle them the same way you would with OpenAI.
from openai import OpenAI, APIError, RateLimitError, AuthenticationError client = OpenAI( base_url="https://forge-api.lanaai.io/v1", api_key="your-forge-api-key", ) try: response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Hello"}], ) except AuthenticationError: print("Invalid API key") except RateLimitError as e: print(f"Rate limited. Retry after: {e.response.headers.get('Retry-After')}s") except APIError as e: if e.status_code == 429: print("Monthly quota exceeded — upgrade your plan") elif e.status_code == 503: print("Sovereign mode: self-hosted model unavailable") else: print(f"API error: {e}")
| Code | Meaning | Action |
|---|---|---|
401 |
Invalid API key | Check your API key in the dashboard |
429 |
Rate limit or quota exceeded | Wait for Retry-After header, or upgrade plan |
404 |
Model not found | Check /v1/models for available models |
503 |
Sovereign mode — no internal model available | Retry, or disable sovereign to allow external fallback |
502 |
Upstream provider error | Forge retries automatically; if persists, contact support |
Response Headers
Forge adds useful headers to every response so you can monitor routing, usage, and limits.
| Header | Description |
|---|---|
X-RateLimit-Limit |
Your plan's requests-per-minute limit |
X-RateLimit-Remaining |
Requests remaining in the current minute |
X-Request-ID |
Unique request ID for debugging and audit trail lookup |
Retry-After |
Seconds to wait before retrying (on 429 responses) |
Compatible Frameworks
Forge works with anything that supports OpenAI-compatible endpoints. No custom SDK required.
OpenAI SDK
Python & Node.js
Anthropic SDK
Python & Node.js
LangChain
Python & JS
LlamaIndex
Python
Vercel AI SDK
Next.js / React
OpenClaw
AI Agents
cURL / HTTP
Any language
Any OpenAI-compat
Just change base_url