Pricing
Predictable AI costs.
Flat monthly rate with generous token limits. No per-token billing. No surprises.
Starter
first month
then $49/month
- 5M tokens/month
- 60 requests/min
- 8K context window
- OpenAI + Anthropic compatible
- Usage dashboard
- Audit log egress
- Zero-retention mode (optional +$15/mo)
Sovereign inference requires Pro or above
Get StartedPro
/month
- 30M tokens/month
- 200 requests/min
- 8K context window
- $2.00/M overage tokens
- Priority routing
- Usage dashboard + API
- Compliance audit log
- Audit log egress
- Zero-retention mode
- Sovereign mode
Scale
/month
- 150M tokens/month
- 600 requests/min
- 16K context window
- $1.50/M overage tokens
- Higher priority routing
- Dedicated support
- Sovereign mode
- Compliance audit log
- Audit log egress
- Zero-retention mode
Enterprise
contact us
- 500M+ tokens/month
- 3,000+ requests/min
- 32K context window
- $1.00/M overage tokens
- Highest priority + SLA
- Org-level sovereign lock
- Dedicated inference isolation
- Audit log egress + zero-retention
- Custom retention + SLA
Frequently asked
What happens if I go over my token limit?
On Pro and above, you can keep going at the overage rate. On Starter, requests are rate-limited until the next billing cycle.
Can I use my existing OpenAI SDK code?
Yes. Just change the base_url to forge-api.lanaai.io/v1 and use your Forge API key. Everything else stays the same.
Which models are available?
Self-hosted open-source models running on our sovereign infrastructure, plus commercial providers including OpenAI, Anthropic, Google, DeepSeek, Perplexity, and xAI as intelligent fallbacks.
What does "auto" model do?
It lets Forge pick the best model based on cost and quality. Most requests go to fast, cheap self-hosted models. Complex requests route to commercial models automatically.
Can I upgrade or downgrade at any time?
Yes. Plan changes take effect immediately. When upgrading, you get access to higher limits right away. When downgrading, limits are adjusted at the start of your next billing cycle.
What is sovereign mode?
Sovereign mode ensures your prompts and data never leave LANA-controlled infrastructure. All inference runs on our self-hosted models — no data is sent to OpenAI, Anthropic, or any third-party API. Available on Pro plans and above, org-wide on Enterprise.
Is there an audit trail for compliance?
Yes. Every request is logged with its routing decision, provider used, sovereign enforcement status, and timestamp. Pro plans and above can query the audit log via API. Enterprise plans support custom data retention policies.
What is audit log egress?
Audit log egress forwards your compliance events to your own infrastructure in real-time — a webhook URL you control. You hold the record, not us. Available on Starter and above.
What is zero-retention mode?
When enabled, no request content, prompts, or response data is stored anywhere on LANA infrastructure. Only billing counters (token counts, cost) are retained. Included on Pro and above, or available as a $15/mo add-on for Starter.
What is dedicated inference isolation?
Enterprise customers can run on their own isolated GPU infrastructure — no shared compute, no noisy neighbors, no cross-tenant exposure. Your requests never touch hardware shared with other organizations.