All Posts

Saturday Was Rough

December 28, 2025

I had a family thing at 4. The plan was simple: wrap up this database migration, hit the pipeline, be on my way.

At 3:30 I was in the "just 30 more minutes, I'll be a little late" phase. Then the pipeline failed. App Config issue. Fixed that, ran it again. Playwright test failing. Fixed that. Forgot to update an ID somewhere, so that test fails in CI. Just endless.

And when you're in that loop with Claude, there's this moment where something breaks and you get hot—"WE TALKED ABOUT THIS, CLAUDE!!! Cockroach admin can't run on 8080!"—and then twenty minutes later you find a background process still running, eating the port. "Ope. Sorry Claude, I see what you were saying now."

Pretty soon it's 7 and I'm thinking "I have to at least show face for an hour." I don't think I got to bed until after midnight.

Here's how I got there.


The Problem I Thought I Solved

When Siren was getting off the ground, I asked myself: "What happens if Postgres goes down?"

So I did what any reasonable person would do—I added Redis. The idea was simple: dual-write to both stores, read from whichever responds first, and if one dies, the other keeps you alive. I spent a week's worth of evenings writing integration tests that literally killed Postgres mid-run to verify alerts still fired.

It worked. I was proud of it.

A week later, I didn't trust it at all.

Here's the thing: the dual-write pattern did solve the immediate problem. But it created new ones I'd been ignoring:

  • Who's taking Postgres backups? Me, I guess.
  • How do I restore if something goes sideways? Carefully, I hope.
  • Geo-redundancy? None. Single region. If Azure East goes down, so does Siren.
  • Scaling? I'd have to figure that out eventually.
  • Redis and Postgres disagreeing? Edge cases I didn't want to learn about in production.

The whole premise of Siren is that one developer plus AI can build something real if you're intentional about it. My Postgres/Redis solution was intentional about the what but not the how. It was held together with tape, and I couldn't sell something I didn't trust.


The Cosmos Detour

The night before I left for the holiday, I sat down with Claude and just said: "Here's what I've got—Postgres, Redis, duct tape. There's got to be something cloud-native that's better."

We talked for a solid evening. Cosmos DB kept coming up. Managed service, geo-replication built in, Azure-native, basically zero ops burden. I left for Christmas thinking: "Yes. Cosmos. That's the answer."

Sometimes I need to beat an idea into the ground before I commit.

So Friday evening when I got back, I opened Claude and started explaining my Cosmos plan in that "convince me I'm right" kind of way. And the more I talked through it, the more I heard myself saying things like:

  • "I'll just need to rewrite the data layer..."
  • "Document databases don't really do JOINs, but I can denormalize..."
  • "My 28 repositories are all raw SQL, but Claude Code can rewrite those..."

I felt sheepish for not seeing it earlier: I had a fully working relational system, and my plan was to migrate to a document database. That's not a migration. That's a rewrite. Cosmos DB is a great choice if you're starting fresh. I wasn't.


The Right Tool

Somewhere in that conversation, CockroachDB came up.

The pitch: distributed SQL that speaks the Postgres wire protocol. My existing Npgsql code would mostly just work. Multi-region replication is built in. Backups, scaling, failover—all managed. No DBA required.

Netflix runs 100+ production clusters on it. That was enough credibility for me.

More importantly, it solved the problems my Postgres/Redis setup didn't:

Problem Postgres + Redis CockroachDB
Single database goes down Failover to Redis (sort of) Automatic, built-in
Geo-redundancy None Multi-region writes
Backups My responsibility Managed
Scaling Figure it out later Managed
Operational complexity Two systems to maintain One
Edge cases in dual-write Hope for the best Gone

The Refactor

It wasn't free. Still a major refactor.

The biggest decision: primary keys. I was using long for most IDs, int for users. CockroachDB prefers UUIDs for its distributed magic. I could've forced it to work with casting, but I decided this was the right change anyway. Even if CockroachDB doesn't work out long-term, UUIDs are the better choice.

So we changed a lot of code. Thousands of changes to types and interfaces. Everything turned red for the better part of Saturday—tests failing everywhere—and we just chipped away.

This is where the investment in testing paid off. Unit tests, Playwright UI tests, performance tests. Each one a little checkpoint telling us what was still broken and what we'd fixed. By Saturday night, green across the board.

Sunday I signed up for CockroachDB, got the connection string, fought with the deploy for an evening with the Wild game on, and pushed to prod. Robot tests passing in production.


What I Actually Gained

The best part wasn't adding CockroachDB. It was removing Redis.

That dual-write pattern is gone. The edge cases I was worried about? Gone. The second system to monitor and understand? Gone.

I don't think the latency overhead of hitting the database matters for Siren's workload. And if it does someday, I can add caching back. But right now, I have one data store, managed by people who know what they're doing, replicated across regions, with a generous free tier and pricing I'm happy to pay when we scale.

Siren is more reliable for it. And I actually trust it now.


Next Up

Of course, a reliable database doesn't matter if Twilio goes down and your alerts don't send.

Onto dual outbound messaging support—Twilio plus Vonage, because redundancy isn't just for databases.

Try Siren

Incident management that won't go down when you need it most.

25 incidents free. No credit card required.

Try Siren