Tomorrow at 12pm PT, Anthropic is cutting off Claude subscriptions for OpenClaw. If you're running your agent on a Claude Pro or Max plan, it will either start billing you per-token at API rates or stop working entirely.

We got the email today at 3:47pm. One day's notice.

I run extraseat — we deploy managed AI assistants for small businesses using OpenClaw. Several of our deployments were on Claude subscriptions, some with always-on cron jobs. Not the kind of infrastructure you can just turn off while you figure out a new plan.

So we spent the last 12 hours figuring out a long-term strategy. The immediate fix was simple (enable extra usage, claim the credit). The interesting part is what comes next: a multi-provider fallback architecture that doesn't care which provider decides to change their terms. Here's the technical breakdown.

The Timeline

This didn't come out of nowhere:

Jan 9: Anthropic silently blocks third-party OAuth tokens server-side. No announcement.
Feb 19: ToS update. Subscription OAuth in third-party tools is now explicitly prohibited.
Feb 2026: OpenAI acqui-hires Peter Steinberger, the OpenClaw creator. The irony.
April 3, 3:47pm: The email arrives. "Starting April 4 at 12pm PT..."
April 4, 12pm PT: Enforcement begins. OpenClaw first, other harnesses to follow.

The writing was on the wall for months. We should have moved earlier. We didn't. So we're doing it now, live, and documenting everything.

The Five-Minute Fix

Before anything else: enable Anthropic's "extra usage" on every account. This flips you to pay-as-you-go at API rates. No downtime.

Anthropic is offering a one-time credit equal to your subscription price, redeemable by April 17. We claimed it on all accounts immediately.

If you haven't done this yet, stop reading and do it now. You're bleeding money or your agent is down. Enable extra usage, grab the credit, then come back.

Why This Hurts

Extra usage bills at standard API rates:

Model	Input / 1M tokens	Output / 1M tokens
Claude Sonnet 4.6	$3.00	$15.00
Claude Opus 4.6	$5.00	$25.00

But here's what people miss about the math: OpenClaw is an agent, not a chatbot. It doesn't send one message and wait. Each task fires multiple API calls. A medium session burns 50K+ tokens from accumulated context alone. Tool use adds ~346 tokens per call just for the system prompt. Multi-agent setups (which OpenClaw encourages) multiply everything by the number of agents — a 3-agent team consumes roughly 7x the tokens of a single agent because each maintains its own context window.

One deployment runs email triage every 5 minutes. On subscription, that cost $19.47/month. At API rates with Sonnet? $60-100/month for that one cron job alone.

Projected extra-usage costs across our customer base: $200-500/month per customer, up from $20-200 on subscription. For a small business running one AI assistant, that's untenable.

For reference, here's how the token economics break down at scale:

Usage profile	Tokens/day	Monthly cost (Sonnet API)	Monthly cost (Opus API)
Light (1-2 sessions/day)	~100K	$50-100	$80-150
Medium (3-5 hrs/day)	~500K	$130-260	$200-400
Heavy (cron + interactive)	~2M+	$400-1,200	$600-2,000
Autonomous swarms	~10M+	$1,000-5,000/day	Don't.

The "autonomous swarm" tier is what triggered this ban in the first place. Anthropic explicitly cited outsized strain from agentic tools.

The Long-Term Fix: A Multi-Provider Fallback Chain

Extra usage stops the bleeding, but it's not a strategy. You're still single-provider, still subject to Anthropic's pricing and policy decisions.

OpenClaw has a model failover system that most people don't know about. Two stages: rotate between auth profiles within a provider, then fall to the next model in the chain. It's designed for exactly this situation.

Here's what we're deploying:

OpenClaw Multi-Provider Fallback Chain

{
  agents: {
    defaults: {
      model: {
        primary: "openai-codex/gpt-5.3-codex",
        fallbacks: ["minimax/MiniMax-M2.7"]
      }
    }
  }
}

Two providers. Flat-rate primary, dirt-cheap backstop. No Anthropic in the chain.

What triggers failover: 429 rate limits, overloads, auth failures, timeouts, and billing disables. It does not trigger on format errors, context overflow, or cost thresholds. This is purely about availability.

Under the hood, for each model candidate OpenClaw tries all auth profiles (LRU ordering, OAuth before API keys). Rate-limited profiles enter exponential backoff: 1 minute, then 5, then 25, then 1 hour. After one profile rotation on a rate-limit error, it advances to the next model in the fallback array.

Sessions are pinned to one auth profile for cache efficiency. You don't round-robin per request, which would kill prompt caching. The pin holds until the profile enters cooldown, then rotates.

Stacking multiple accounts: If one Pro plan isn't enough, add a second. Log in with two ChatGPT accounts and OpenClaw rotates between them automatically:

openclaw models auth login --provider openai-codex  # → alice@example.com
openclaw models auth login --provider openai-codex  # → bob@example.com

Each Pro plan gives 223-1,120 messages per 5-hour window. Two plans effectively double your capacity before the chain advances to the next provider. The Korean developers behind the 100K-star repo run 5 Pro plans this way — $1,000/month for effectively unlimited agentic throughput.

Primary: OpenAI Codex ($200/mo flat)

OpenAI bought OpenClaw in February. They are not going to ban their own tool. ChatGPT Pro gives you 223-1,120 messages per 5-hour window. Flat rate. No per-token billing.

openclaw onboard --auth-choice openai-codex

Customer logs into ChatGPT, authorizes OpenClaw, done.

Alternative primary: Anthropic API with Opus ($5/$25 per MTok)

If agent quality matters more than cost, the community consensus is clear: Opus is still the best model for OpenClaw. Alex Finn (210+ hours/month on OpenClaw): "ChatGPT 5.4 is the best model. But it sucks compared to opus in OpenClaw." Another user: "With Opus the agent does the work. With GPT 5.4 the agent will say 'I'm working on it' and when you ask for progress, it says 'sorry, I lied, I never did it.'"

The catch: Opus via API is $5/$25 per MTok. No flat rate. A heavy agentic workload can hit $400-1,200/month. But the power users aren't paying that. They're running Opus as the orchestrator (planning, reasoning, complex decisions) and routing execution to cheap models (Gemma 4, Qwen 3.5, MiniMax). One user runs 10 agents across 5 models for $5.43/day.

This is more complex to set up than a simple fallback chain. YMMV. But if you've been on Opus and switching to GPT-5.4 feels like a downgrade, this is the path the community is converging on.

Backstop: MiniMax M2.7

This was the real find. $0.30 input / $1.20 output per million tokens. That's 10x cheaper than Sonnet.

OpenClaw's own Anthropic provider docs now have a banner recommending MiniMax as an alternative. Steinberger added first-class MiniMax support in the March releases. He's making it as easy as possible to leave Anthropic from his new seat at OpenAI.

MiniMax uses an Anthropic-compatible API (https://api.minimax.io/anthropic). The integration is three lines of config. At these prices, even heavy fallback usage rounds to zero.

OpenClaw's Anthropic docs also recommend Qwen/Model Studio and GLM/Z.AI, both of which offer "Coding Plan" subscriptions. But MiniMax has the most mature OpenClaw integration — Steinberger added auto-enable for the MiniMax plugin in the March releases.

The Full Provider Landscape

For context, here's where every major provider sits right now:

Provider	Model	Input/MTok	Output/MTok	Notes
MiniMax	M2.7	$0.30	$1.20	10x cheaper than Sonnet. Anthropic-compatible API.
Mistral	Large 3	$0.50	$1.50	Devstral Small 2505 is free.
OpenAI	GPT-5.3-Codex	$1.75	$14.00	Best coding model. Subscription via Codex OAuth.
OpenAI	GPT-5.4	$2.50	$15.00	Flagship. 1.1M context.
Anthropic	Sonnet 4.6	$3.00	$15.00	What we're migrating away from.
Anthropic	Opus 4.6	$5.00	$25.00	Best reasoning. 6x Opus Fast mode at $30/$150.
Google	Gemini Flash-Lite	$0.10	$0.40	Free tier available. 2x for >200K context on Pro.

LLM API prices dropped ~80% between early 2025 and now. What cost $15/MTok input (Opus 3) eighteen months ago now costs $5 for a significantly better model. The budget tier didn't even exist. MiniMax at $0.30 and Gemini Flash-Lite at $0.10 are usable production quality at nearly-free pricing.

What the Numbers Look Like

Scenario	Light user	Heavy user
Do nothing (Anthropic extra usage only)	$50-150/mo	$200-500/mo
Codex Pro + MiniMax fallback	$200 flat + ~$3 spill	$200 flat + ~$15 spill

Flat-rate primary absorbs most traffic. MiniMax catches the rest at pennies.

What We Tried and Rejected

Ollama on the Mac Minis

Seemed obvious — we already have dedicated hardware. Put a local model on each Mini as the "never call the cloud" backstop.

Killed the idea in an hour. OpenClaw has multiple open bugs where Ollama's cold-start timeout silently falls through to the next provider in the chain. Your private data, intended for local processing, gets shipped to a cloud API with no warning.

Issue #52818 is literally titled "Ollama cold-start timeout silently exfiltrates data via fallback chain." It's flagged as a security vulnerability, not a feature request.

Until OpenClaw adds per-provider timeout config, local models in the fallback chain are a liability. We tabled it.

But Can Local Models Even Do the Work?

The fallback bugs killed Ollama-in-the-chain, but we still wanted to know: if those bugs got fixed tomorrow, would a local model actually handle what our agents do every day?

Our deployments run email triage via gog CLI, manage Google Calendar and Drive, post digests to Slack, fetch Zoom transcripts, create HubSpot tasks, and handle multi-step chains that combine all of the above. These aren't chatbot conversations — they're structured tool-calling workflows where the model needs to emit the right CLI command with the right flags.

We built a 45-test eval suite modeled on the kinds of tasks our agents handle daily — email search, calendar ops, Drive file management, Slack message composition, multi-tool chains, error handling, and safety constraints — and ran it against 9 models on a Mac Mini M4 16GB via Ollama 0.20 (MLX backend). Every model got the same system prompt, the same tool definitions, and the same scenarios.

Model	Size	Score	tok/s	Verdict
Qwen 3 8B	5.2 GB	78%	18.1	Only genuinely usable model
Qwen 3.5 4B	3.4 GB	56%	15.8	Best tiny model, but drops multi-tool chains
Gemma 4 8B	9.6 GB	53%	24.9	Prefers to explain rather than act
Mistral Nemo 12B	7.1 GB	44%	12.1	Biggest, slowest, middling
Phi-4 Mini	2.5 GB	29%	32.1	Fast but can't do structured tool calls
Nemotron Mini	2.7 GB	27%	29.0	Same tier as Phi-4
Granite 3.1 8B	5.0 GB	27%	17.1	IBM's "function calling focus" didn't help
GLM4 9B	5.5 GB	0%	—	Tool calling broken in Ollama
Qwen 3.5 9B	6.6 GB	0%	—	Tool calling broken in Ollama

A few things stood out.

Benchmarks lie. Qwen 3.5 4B claims 97.5% on the Berkeley Function Calling Leaderboard. On our eval it scored 56%. BFCL tests isolated, synthetic function calls. Our tests require the model to read a system prompt describing gog CLI tools, decide which tool to call, construct the right command with the correct flags and account references, and handle multi-step chains where the output of one tool feeds the next. That's a different kind of hard.

Size matters more than architecture. The 8B models (Qwen 3, Gemma 4) consistently outperformed the smaller ones on tool calling, regardless of what the model card says about "function calling optimization." Nemotron Mini was explicitly RL-trained for agentic tool use and scored 27%.

The runtime matters as much as the model. GLM4 9B and Qwen 3.5 9B both support tool calling — just not through Ollama. GLM4's Ollama template doesn't include the {{ .Tools }} variable. Qwen 3.5 9B has a known bug where tool calls get printed as text. Both models would probably work fine with llama.cpp or vLLM, but if you're deploying for customers, "works out of the box" is the only standard that counts.

Qwen 3 8B is the only real option. 78% with 100% on calendar, multi-tool chains, Slack composition, system ops, context recall, error handling, and domain knowledge. Its weak spots: safety (40% — it occasionally acts without confirmation) and email triage categorization (33%). At 5.2 GB loaded and 18 tok/s on M4, it's responsive enough for interactive use. Not Opus-quality reasoning, but it can keep the lights on.

The bottom line: local models aren't ready to replace cloud providers for serious agentic work, but Qwen 3 8B is close enough that it's worth watching. When OpenClaw fixes the fallback chain bugs, a Qwen 3 8B on localhost as the last entry in model.fallbacks would give you a zero-cost backstop that handles ~80% of tasks. Not there yet. Getting closer.

ACP: Claude Code as a Fallback Provider

This is the one I can't stop thinking about.

OpenClaw's Agent Client Protocol can spawn Claude Code CLI as a subprocess. Claude Code brings its own auth. It uses the customer's Claude subscription directly — because Claude Code is Anthropic's first-party tool.

Think about what that means. Anthropic banned OpenClaw from using subscription OAuth. But they can't ban Claude Code — it's their own product. If OpenClaw just orchestrates when to invoke Claude Code, and Claude Code handles its own session auth...

Is that a "third-party harness using the subscription"? Or is it a user running claude from a cron job?

There's a WIP PR to add ACP agents to model.fallbacks. When it lands:

{
  agents: {
    defaults: {
      model: {
        primary: "openai-codex/gpt-5.3-codex",
        fallbacks: [
          "acp/claude-code",
          "acp/codex",
          "minimax/MiniMax-M2.7"
        ]
      }
    }
  }
}

Each ACP agent brings its own auth and billing. Your fallback chain becomes a chain of agents, not a chain of API keys that any provider can revoke.

The oh-my-codex developers already do this with Codex. In a recent livestream, one of the developers revealed he runs 5 ChatGPT Pro plans ($1,000/month) through OpenClaw, spawning swarm-mode Codex sessions that auto-coordinate across repositories. They built the entire scaffolding for a 100K-star repo in 3 hours on airplane Wi-Fi, communicating with their agents via OpenClaw on their phones.

The ACP protocol just formalizes what power users are already doing. The PR has 29 tests passing and has been tested locally with Gemini CLI and Claude Code.

Not merged yet, and there are real issues to work through. ACP subprocesses are structurally fragile: gateway restarts kill all running ACP sessions (#52440), dead subprocesses leave permanently broken session state (claude-agent-acp #338), and Gemini CLI's stdout pollution corrupts the JSON-RPC stream (google-gemini #22647). There's also a ~1.5-2x token overhead because each ACP agent maintains its own system prompt and context.

Most of these bugs have been fixed individually, but the pattern reveals structural fragility. This is not production-ready. Still, the direction is right. When it stabilizes, it structurally solves provider lock-in because each agent in the chain owns its own billing relationship.

The Downsides Nobody Talks About

I'd be doing you a disservice if I didn't mention these.

OpenClaw has real security problems. CVE-2026-25253 (CVSS 8.8) is a remote code execution flaw allowing auth token theft. Over 20% of third-party skills on ClawHub have been flagged for malicious code. If you're running this for a business, lock down tools.deny and don't install random skills from the marketplace.

Agents eat tokens. 5-10x more than chat. Every task triggers multiple API calls. Long sessions accumulate context that gets re-sent. Cron jobs compound it. A "$200/mo subscription" does not buy you $200 worth of agentic work. Budget for 3-5x what you'd expect.

GPT-5.4 is not Claude Opus for agentic coding. The benchmarks tell a split story: GPT-5.4 leads on SWE-Bench Pro (57.7% vs ~45%) and OSWorld (75%), but Opus still tops SWE-bench Verified at 80.8% and dominates multi-file refactoring. Alex Finn, who runs OpenClaw 210+ hours/month, put it bluntly: "ChatGPT 5.4 is the best model. But it sucks compared to opus in OpenClaw." His solution: use Opus as the orchestrator (via API, not subscription) and cheap models for execution. We haven't run our own evals yet, so take the benchmark numbers with appropriate skepticism.

This could happen again. Google actually moved first, suspending AI Ultra accounts using OpenClaw's Gemini integration in mid-February. No warnings, no refunds. Anthropic formalized their ban days later. OpenAI could change their mind tomorrow. Subscriptions routed through third-party tools are borrowed time.

The Migration (Copy-Paste)

If you're doing this yourself:

1. Stop the bleeding

Settings > Usage on claude.ai → Enable extra usage → Claim one-time credit

Credit expires April 17.

2. Add Codex as primary

openclaw onboard --auth-choice openai-codex

3. Add MiniMax as backstop

openclaw onboard --auth-choice minimax-global-api

4. Set the fallback chain

// ~/.openclaw/openclaw.json
{
  agents: {
    defaults: {
      model: {
        primary: "openai-codex/gpt-5.3-codex",
        fallbacks: ["minimax/MiniMax-M2.7"]
      }
    }
  }
}

5. Verify

openclaw models list

Failover is automatic. Codex hits rate limits, MiniMax picks up. Your agent stays up.

Where This Is Going

The community is converging fast. One developer went from $200/month on Claude Max to $15/month on Kimi K2.5 + MiniMax. Steinberger put an alternatives banner on the Anthropic docs and shipped first-class MiniMax support. The migration path is paved.

The real lesson isn't about Anthropic. It's that your agent infrastructure can't have a single point of failure at the provider level. We're past the era where you pick one LLM provider and build everything on it. Providers change terms. Prices shift. Models get deprecated. Google suspended Gemini accounts for the same pattern in February.

Multi-provider fallback isn't a nice-to-have. It's infrastructure. Like having more than one DNS provider, or not running your entire stack on one cloud region.

The config is 10 lines. The migration took us 12 hours because we were researching every option. If you're reading this, it should take you about 20 minutes.

I'm Levi Segal. I build extraseat — managed AI assistants on dedicated hardware for small businesses in Los Angeles. If you're local and want one of these running for your team, grab 15 minutes.