Two months ago I wrote why I actually use OpenClaw. The short version: the tasks were never the point. The point was the meta. Shared agents my partner and I both shape, a headless gateway that doesn't care whether my laptop lid is open, scheduled jobs that fix pods at 3 AM, memory that survives.

I also wrote this beautiful, magnificent tweet about what OpenClaw is good at, to capitalize on the 3k followers I got through the sheer luck of standing next to a colleague who got hacked. The tweet got totally ignored by the whole world, VERY LIKELY INCLUDING YOU DEAR READER, so poop on all of you (but keep reading please, it's going to get better, and no more fecal humor, I promise).

Then Anthropic really DID stop allowing people to burn infinite Max plan tokens on stupid stuff, and the death came even for my old token that kept working much longer than those of the poor souls on reddit (and than any new token minted after their February announcement). Instead of responses from the bots I got the classic "Bump up your extra pay limit and start paying by usage, you!". Which is fair, but also pricey, and I wanted none of it.

Before rewriting anything I looked for prior art: an OpenClaw alternative built on native Claude Code, a self-hosted multi-agent Claude Code harness, anyone running persistent Claude Code agents on Kubernetes with Mattermost on the flat claude.ai Max subscription. I found third-party harnesses (banned, same as OpenClaw) and single-shot CLI wrappers, but nobody keeping real claude sessions alive and wiring chat into them. If one of those searches brought you here: yes, this is that, repo link at the bottom.

So: be the change you want to see in the world, right?

Yet another model release week blew my skull off, this time with a generation called Fables. Good timing. I pointed it at my custom OpenClaw fork and gave it one job: read my glorious blog post, rip out the short list of things I actually use, and rewrite that minimal core from scratch on native Claude Code. Native harness, claude.ai Max subscription, flat rate, and the parents stop complaining about me sneaking out the window.

Well, me and Fables did it in less than a day of occasional babysitting (me babysitting Fables, not the other way around). And it's glorious.

What we built: self-hosted Claude Code agents on Kubernetes

N persistent, interactive claude sessions, one per agent, each kept alive in tmux. Each session talks to Mattermost through a small MCP server speaking the Claude Code channels protocol:

                        ┌─────────────────────┐
                        │      Mattermost     │   channels, threads, DMs
                        │ (one bot per agent) │
                        └──────────┬──────────┘
                  REST ▲           │ WebSocket
        (reply, read_thread, …)    │ (new posts)
                       │           ▼
                        ┌─────────────────────┐
                        │    channel server   │   filters: who may talk to
                        │   (MCP, per agent)  │   this agent, which channels,
                        └──────────┬──────────┘   mention-gated or not
                 tool calls ▲      │
                            │      ▼ tagged events (sender, channel, root_id)
                        ┌─────────────────────┐
                        │    router session   │   claude --agent <name>,
                        │  (persistent, tmux) │   answers DMs, routes threads
                        └──────────┬──────────┘
                                   │ spawns (background, model per task)
                                   ▼
                        ┌─────────────────────┐
                        │   thread subagents  │   one per Mattermost thread,
                        │  (isolated context) │   full toolbelt, no cross-talk
                        └─────────────────────┘

One such stack per agent, six stacks in one pod, all on a single PVC holding credentials, transcripts, and memory.

The pieces I cared about from the old post, and how they map:

Channels. The channel server forwards Mattermost posts into the session as tagged events (channel, sender, thread root, is-it-a-DM) and exposes reply, read_thread, read_channel_history, search_posts as tools. Routing is config, not code: per agent you declare which channels it listens to, whether each channel is mention-gated or read-everything (my-channel:mention suffixes), and which humans can reach it. Agents can message each other too, mention-gated only, because two LLMs politely thanking each other forever is the most expensive perpetual motion machine yet devised.

Threads and sessions. One agent is one session is one context window, and that would be a mess with parallel conversations, so the main session acts as a router and hands each Mattermost thread to its own subagent with its own clean context (Claude Code's agent teams make these continuable, so a thread keeps its working state between messages). DMs stay in the main lane. The router prefetches the thread, spawns in the background, and posts the result back into the right root_id. Mattermost stays the ground truth: anything compaction eats can be re-read from the thread itself.

Models. Routers run sonnet at medium effort, because routing and one-liners are not intelligence-sensitive. Real work goes to opus subagents picked per spawn. Opus 4.8 costs about 1.7x sonnet per token and plans well enough at decent effort that it often uses fewer tokens overall on multi-step work. The registry exposes effort per session, so the whole cost-quality knob lives in one YAML file.

Cron. A ~200 line Python scheduler ticks every 30 seconds and, when a job fires, posts the prompt to Mattermost as a @cron bot that @-mentions the target agent. Delivery rides the normal channel pipeline, which means catch-up replay if the agent was down, and the reply threads under the cron post as a visible audit trail. Agents schedule their own jobs by dropping a YAML file in a shared directory, so "remind me every weekday at 2" works conversationally again. No queue, no Redis. Mattermost is the queue. Mission-critical workloads survived the migration, including the cron job that tells me to walk on the treadmill at 2 PM. Disabled, obviously.

Memory. A timer archives session transcripts into a full-text-searchable SQLite, with an optional embedding index on top, so compaction never loses anything for good. The agents' personas, skills, and shared rules are read-only symlinks into the image, so every soul is a file in git: my relationship coach gets code review before she's allowed to change her mind. My 67-book embedded library (the RAG database I was proudest of in the OpenClaw setup) came along too, ported off its dedicated Python venv onto plain REST calls. The skill's own docs claimed 29 books; the database said 67. The agents have apparently been reading behind my back. I wrote up the full context story (how parallel conversations stay apart, what compaction can and cannot lose) in talon's docs. The short version: Mattermost plus the archive make the system effectively append-only, so a lossless-claw style lossless context engine could very likely be built right on top of it.

The tally for all of this: about 1,750 lines of TypeScript for the channel server, a 490-line bash supervisor, three small Python tools, and a Helm chart. Claude Code brings everything else: the tools, the skills, the subagents, the compaction, the auth.

Should you use it?

Probably not directly. This is tailored to exactly my needs and implements the minimal subset of features I personally use. No attachments over the bridge, single pod, one experimental CLI flag doing heavy lifting, and it has been tested by precisely one household.

But if you want to build your own, I published a scrubbed, generic version with two example agents and full setup docs: hnykda/talon. Clone it, create two Mattermost bots, follow the README, and you have a fleet of persistent Claude Code agents on a flat-rate subscription.