Pricing

Predictable AI costs.

Flat monthly rate with generous token limits. No per-token billing. No surprises.

Forge is flat-rate $49/mo for 5M tokens - unlimited your first 30 days. The same usage runs roughly $75 to $150/mo on OpenAI GPT-4o, billed per token.

See the full cost breakdown or why teams switch to Forge.

Flat-rate Forge plans

Starter

$49

/month

🎉 Unlimited tokens your first 30 days

Unlimited tokens for 30 days, then 5M/month
60 requests/min
8K context window
OpenAI + Anthropic compatible
Usage dashboard
Audit log egress
Zero-retention mode (optional +$15/mo)

Sovereign inference requires Pro or above

Get Started

Pro

$149

/month

30M tokens/month
200 requests/min
8K context window
$2.00/M overage tokens
Priority routing
Usage dashboard + API
Compliance audit log
Audit log egress
Zero-retention mode
Sovereign mode

Get Started

Scale

$499

/month

150M tokens/month
600 requests/min
16K context window
$1.50/M overage tokens
Higher priority routing
Dedicated support
Sovereign mode
Compliance audit log
Audit log egress
Zero-retention mode

Get Started

Coming Soon

Enterprise

Custom

500M+ tokens/month
3,000+ requests/min
32K context window
$1.00/M overage tokens
Highest priority + SLA
Org-level sovereign lock
Dedicated inference isolation
Audit log egress + zero-retention
Custom retention + SLA

Coming Soon

Frequently asked

What happens if I go over my token limit?

On Pro and above, you can keep going at the overage rate. On Starter, requests are rate-limited until the next billing cycle.

Can I use my existing OpenAI SDK code?

Yes. Just change the base_url to forge-api.lanaai.io/v1 and use your Forge API key. Everything else stays the same.

Which models are available?

One intelligent model interface - you just choose auto (the default), fast, reasoning, or embedding, and Forge routes each request to the best engine for the job. It all runs on our own sovereign infrastructure.

What does "auto" model do?

It lets Forge automatically pick the best model for each request based on speed, cost, and quality - so you never have to manage model selection. It is the default.

Can I upgrade or downgrade at any time?

Yes. Plan changes take effect immediately. When upgrading, you get access to higher limits right away. When downgrading, limits are adjusted at the start of your next billing cycle.

What is sovereign mode?

Sovereign mode ensures your prompts and data never leave LANA-controlled infrastructure. All inference runs on our self-hosted models. No data is sent to OpenAI, Anthropic, or any third-party API. Available on Pro plans and above, org-wide on Enterprise.

Is there an audit trail for compliance?

Yes. Every request is logged with its routing decision, provider used, sovereign enforcement status, and timestamp. Pro plans and above can query the audit log via API. Enterprise plans support custom data retention policies.

What is audit log egress?

Audit log egress forwards your compliance events to your own infrastructure in real-time, via a webhook URL you control. You hold the record, not us. Available on Starter and above.

What is zero-retention mode?

When enabled, no request content, prompts, or response data is stored anywhere on LANA infrastructure. Only billing counters (token counts, cost) are retained. Included on Pro and above, or available as a $15/mo add-on for Starter.

What is dedicated inference isolation?

Enterprise customers can run on their own isolated GPU infrastructure: no shared compute, no noisy neighbors, no cross-tenant exposure. Your requests never touch hardware shared with other organizations.