Home | Robby on Rails

🌀 The Spiral of Internal Tech Stack Adoption

Before we go deeper, here’s a quick overview of the seven stages I’ve observed. These aren’t fixed; your team might skip around or revisit them multiple times. But in general, this is the pattern I’ve seen:

Adopting: A small group of enthusiastic engineers selects and introduces the stack while building a prototype or MVP.
Expanding: The stack proves useful… so it spreads. More features, more developers, more tooling.
Normalizing: The stack becomes the default. Teams standardize around it. Hiring pipelines and best practices emerge.
Fragmenting: Pain points surface. Teams bolt on new tools or sidestep old ones. Internal consistency erodes.
Drifting: The stack feels sluggish. Upgrades are deferred. The excitement is gone.
Debating: Conversations shift to rewrites or migrations. Confidence is shaken.
Recommitting: Teams pause, reflect, and decide to reinvest in the stack… and their shared future with it.

Again, these stages aren’t a ladder; they’re a spiral.

And the question your team has to ask is: Are we spiraling upward… or downward?

Because while The Downward Spiral is a great album, it doesn’t have to be your trajectory.

♻️ It’s a Cycle, Not a Ladder

It might be tempting to look at this lifecycle and think, “Our goal is to get to the Recommitting stage and stay there forever.”

But that’s not how this works.

Every team will move through these stages multiple times over the lifespan of their product. Shifting priorities, team turnover, organizational pivots… they all create new dynamics that ripple across your tech stack.

Recommitting isn’t a finish line. It’s an inflection point. One that clears the fog, sharpens priorities, and invites your team to move forward with intent.

Just don’t mistake clarity for comfort… the spiral keeps turning.

When Your Cache Has a Bigger Carbon Footprint Than Your Users

August 28, 2025

Caching in Rails is like duct tape. Sometimes it saves the day. Sometimes it just makes a sticky mess you’ll regret later.

Nowhere is this more true than in data-heavy apps… like a custom CRM analytics tool that glues together a few systems. You know the type: dashboards full of metrics, funnel charts, KPIs, and reports that customers swear they need “real-time.”

And that’s where the caching debates begin.

Do we cache the raw data in Redis or Solid Cache?
Do we cache entire report payloads?
Do we lean on CloudFront or Cloudflare to hold onto responses?

All valid. All expensive in their own way.

The Dashboard Problem

Your SaaS app has grown to 2,000 customers, each with multiple users.

For the overwhelming majority, dashboards load just fine. Nobody complains.

But then your whales log in. The Fortune 500 accounts your sales reps obsess over. Their dashboards pull data from half a dozen APIs, crunch millions of rows, and stitch together a wall of charts. It’s not just a page. It’s practically a data warehouse in disguise.

These dashboards are slow. Painfully slow. And you hear about it… through support tickets, account managers, and sometimes even a terse email from someone with “Chief” in their title.

So your engineering team digs in. You fire up AppSignal, Datadog, or Sentry and zero in on the slowest dashboard requests. You look at traces, database query timings, and request logs. You chart out the p95 and p99 response times to understand how bad it gets for the biggest customers.

From there, you start experimenting:

Are we missing database indexes?
Are there N+1 queries lurking in the background?
Can we preload or memoize expensive calls?
Could a few data points be cached individually, without touching the rest?

You squeeze what you can out of the obvious optimizations. Maybe things improve… but not enough.

So the conversation shifts.

Negotiating “Real-Time” Expectations

When your product team actually sits down with those whale customers, the conversation shifts.

They start by saying: we need real-time data. But after a little probing, everyone realizes “real-time” doesn’t always mean right now this second.

Maybe what they really need is a reliable snapshot of activity as of the end of the previous day. That’s good enough for the kinds of decisions their leadership is making in the morning meeting. Nobody is making million dollar calls based on a lead that just landed five minutes ago.

And your team can remind them: there are other real-time metrics in the system. For example:

New leads created today.
Active users in the past hour.
Pipeline changes as they happen.

So now you’ve reframed the dashboard story. Instead of one giant “real-time” data warehouse, you split it into two categories:

Daily rollups. Crunch the heavy stuff once a night. End-of-day data is sufficient, and it’s reliable.
Today’s activity. Show a few real-time metrics that are fast to calculate. Give customers the dopamine hit of “live” data without boiling the ocean.

That’s usually enough to recalibrate expectations. Customers feel like they’re still getting fresh data, while your app no longer sets itself on fire every time a big account logs in.

The 1 AM Job (and Its Hidden Cost)

Armed with that agreement, your team ships the “reasonable” solution most of us have built at least once:

Every night at 1 AM, loop through all 2,000 customers.
Generate every dashboard report.
Cache the results somewhere “safe.”

The next morning, dashboards are instant. Support tickets quiet down. Account reps breathe easier. Everyone celebrates.

But here’s the kicker: the whales were the problem. The rest of your customers never needed this optimization in the first place. Their dashboards were already fine.

So now you’ve turned one customer’s problem into everyone’s nightly job. And under the hood, you’ve cranked through hours of CPU, memory, and database load… just to prepare data for customers who won’t even log in later today.

Worse, you’ve stuffed your background job queue with 2,000 little tasks every night. Which means your queue system—whether it’s Sidekiq, Solid Queue, or GoodJob—is spending precious time juggling busy work instead of focusing on the jobs that actually matter. And when those queues get stuck, or a worker crashes, you’re left wading through a mountain of pending jobs just to catch up.

This is what I call Cache Pollution: the buildup of unnecessary caching work that bloats your systems, slows down your queues, and leaves your caching strategy with a far bigger carbon footprint than it needs to. Another benefit of tackling Cache Pollution early is future flexibility — you might eventually solve the computation challenges in a different way, and you won’t be anchored to big, scary scheduled tasks that churn through all of your customers every night.

Frequency Matters More Than We Admit

Do these reports need to run every single day… or only on weekdays when your customers actually log in?

If your traffic drops on Saturdays and Sundays, consider a lighter schedule. Or even none at all. Because “slow” isn’t so slow when almost nobody is around. A BigCorp admin poking the dashboard on Sunday morning might be fine with an on-demand render… especially if the weekday experience is snappy.

And here’s another angle: if your scheduled job runs at 1 AM, that means when a BigCorp user logs in later that same day, they’re still looking at data that’s less than 24 hours old. For most business use cases, that’s plenty. You don’t need to rerun heavy jobs every few hours just because you can.

This is all about right-sizing frequency:

Weekday cadence: nightly rollups for whales; maybe twice a day if usage demands it.
Weekend cadence: pause, or run a narrower subset.
Holiday mode: same idea… different switch.

If your dashboard code doesn’t rely on the cache to render, you keep the option to not precompute. That flexibility is where the savings live. As the business grows, the cost of overly eager schedules grows with it… so design for dials, not hard-coded habits.

Other Things to Consider with Recurring Tasks

One more question to ask about recurring scheduled jobs: do you really need to iterate through all users or all organizations?

In many cases, the answer is no. Most customers don’t trigger the conditions that require a heavy recompute. Yet teams often design jobs to blast across every top-level object in the database, every night, without discrimination.

Instead, look for signals that help you scope the work down:

Which organizations actually logged in today?
Which customers have datasets large enough to need optimization?
Which accounts crossed a threshold since the last run?

By narrowing the set of work each job touches, you cut down on wasted compute, reduce queue congestion, and avoid the kind of Cache Pollution that grows silently as your business scales.

Clever Tricks We’ve Seen

The trick isn’t just caching everything for everyone. It’s knowing who to cache for and when.

Selective pre-caching. Only build nightly rollups for your whales. Maybe 50 out of 2,000 customers. Everyone else can render on demand, which was fine all along.
Cache on login. If you know a user from BigCorp is signing in, enqueue a background job to warm up their dashboard before they hit it. You can even anticipate who they are based on a cookie value when they land on the Sign In page — before they’ve had a chance to trigger 1Password or type in their credentials, the system is already working behind the scenes to prep their dashboard. Even a 10–20 second head start can smooth the experience.
Cache on demand… with a fallback. If cached data is missing, build fresh on the spot. Outages happen when teams assume the cache will always be there.

And here’s a bonus: if your job fails at 1 AM, re-running it for 50 customers is a whole lot faster than crawling through 2,000.

Extra credit: scope your scheduled tasks so that when a customer crosses a certain threshold—say, user count, dataset size, or request volume—they automatically join the “whale” group. No manual babysitting required.

Other Patterns

Not all caching challenges look like dashboards.

Case Study: The Press Release Problem

We once managed a public-facing site for a massive brand. Whenever they dropped a big press release, it spread fast across social media. Traffic would spike within minutes.

Of course, that’s when the CEO would notice a typo. Or the PR team would need to update a paragraph to reflect a question from the media. Despite their editorial workflows, changes still had to happen after publication.

So we had to get clever. We couldn’t cache those fresh pages for hours. Instead, we used a sliding window approach:

First 5 minutes: cache for 30 seconds at a time.

After 5 minutes: increase to 1 minute.

After 10 minutes: increase to 2 minutes.

After 20 minutes: increase to 5 minutes.

After 6 hours: safe to cache for an hour.

After a day: cache for a few hours at a time.

This let us protect our Rails servers from massive traffic spikes when a new article was spreading fast, while still giving editors the ability to push corrections through quickly. Older articles, once stable, could safely sit in Akamai’s cache for hours.

At the time, Akamai could take up to seven minutes to guarantee a purge across their global network. Not ideal. We had to plan for that lag. Today, most CDNs can purge instantly, but back then… it was a constraint we had to design around.

A Final Challenge

A lot of what we’ve talked about here comes down to avoiding Cache Pollution.

That’s the unnecessary churn your system takes on when it generates data nobody asked for. It’s the background job queue bloated with thousands of tasks that fight with more important work. It’s the 1 AM process chewing through CPU just to prep dashboards for customers who never log in.

Cache Pollution looks like optimization on the surface… but underneath it’s just waste.

So before your team spins up the next caching project, stop and ask:

Who really needs this cache?
How fresh does it need to be?
What happens if the cache isn’t there?
Do we need to run it this often, or for this many customers?
Could we scale down the busy work instead of scaling it up?

Because the goal isn’t just faster dashboards. The goal is to keep your caching strategy lean, resilient, and focused — instead of leaving behind a trail of Cache Pollution that grows with every new customer you add.

The Internal Tooling Maturity Ladder

August 13, 2025

(or: How I’m Learning to Step Back So We Can Move Forward)

Here’s a pattern I’ve been contributing to more than I’d like to admit:

Someone on the team proposes a new internal tool. There’s a clear need. There’s momentum. Conversations start about how we’ll build it… what tools we’ll use… where it’ll live… whether it might become client-facing someday.

It hasn’t been built yet, but we’re already architecting the scaffolding.

And that’s usually when I step in.

Not with a thumbs-up. Not with funding. But with questions. The kind that start with, “What if we didn’t build this?”

It’s not fun. It’s not fair. And it’s not how I want to lead.

By that point, people are invested. They’ve done the thinking. They’ve shared the idea. They’ve taken a risk. And now I’m asking them to scale it back—or stop entirely.

This is me taking responsibility for that pattern.

So I did the only thing I know to do in moments like this: I wrote it down.
We now have a model. Internally, we call it The Internal Tooling Maturity Ladder.

🪜 The Ladder (in Plain English)

Level 0 – One-off Manual Script
Something one person runs on their own machine to save time or reduce repetitive work.
Maybe it lives in an unsaved file or as a task in Alfred or Automator.
It’s not elegant—but it works.
You run it manually. You copy-paste the result. You feel smug for five minutes.
No repo. No expectations. No long-term promises.

Example: A quick Ruby or Bash script that tallies something from an API and drops it into Slack.

Level 1 – Shared Manual Script
You cleaned it up. You wrote a little README. You dropped it in a shared Gist or Google Drive.
It’s still manually triggered, but now others can use it too—if they read the instructions.
It’s still lightweight. Still safe.
And it’s often where great tools should stay.

Example: A command-line tool that a few team members can run locally, maybe to generate a report or fetch usage stats.

Level 2 – Scheduled Automation
Now we’re automating things.
It runs on a schedule—maybe through Zapier, a GitHub Action, or a scheduled Rake task.
No UI. No buttons. Just automated updates that go where we already spend time.
Slack. Google Sheets. Email.
These tools hum quietly in the background, doing one job well.

Example: A script that posts weekly project stats to a Slack channel every Monday morning.

Level 3 – Lightweight Internal Service
Now we’re getting fancy.
This has a small UI. A form. A dashboard. Maybe some configuration options.
It needs hosting. Credentials. Some thought about security.
It’s still simple enough that one person can manage it—but now it’s a thing.
And it needs some care.

Example: A mini app that lets the team search across client project docs or surface stale Jira tickets.

Level 4 – Fully Hosted Internal Product
This is a real web app.
It’s deployed. It has a frontend and a backend. It has users. Sessions. Maybe even tests (hopefully).
It needs to be maintained. Updated. Monitored.
It might solve a meaningful problem—but it’s not free.
This is the top of the ladder for a reason.

Example: A polished internal dashboard that’s become a critical part of day-to-day operations.

Start Lower

This isn’t a blueprint. It’s a conversation starter.

The higher you go, the more you commit—time, infrastructure, expectations.
So we’re learning to start lower on the ladder.
To earn our way up.
To see if people care before we care too much.

Why It Matters

Every internal tool is a promise.
To support it. To upgrade it. To explain it to the next person who inherits it.

And sometimes… the smallest version of the tool is all we need.

A Slack post.
A spreadsheet.
A script that helps one person do their job 10% faster.

Not everything needs a UI.
Not everything needs a repo.
And not everything needs me to be the one who calls time on the project two weeks in.

This post isn’t about our internal model. Not really.

It’s about building fewer things that trap us.
And creating more space to experiment without regret.

If you’ve found yourself playing the role of reluctant gatekeeper… you’re not alone.
This ladder is helping me find a better way.

One rung at a time.

The Features We Loved, Lost, and Laughed At: My RailsConf 2025 Talk Is Now Online

July 25, 2025

If you didn’t make it to RailsConf this year…or couldn’t make it to my talk…I’ve got good news: the full video is now live.

🎥 Watch it here

Preparing for this talk was one of the most nostalgic (and sometimes absurd) research dives I’ve done in years. I pitched The Features We Loved, Lost, and Laughed At thinking it would be easy to uncover a long list of removed or weird Rails features to poke fun at.

Turns out? They weren’t so easy to find.

Rails hasn’t just thrown things away. It’s looped. It’s learned. It’s come back to old ideas and made them better.

In the talk, I trace that evolution…using code examples and stories from the early days of ActiveRecord, form builders, observe_field, semicolon routes, and even a few lesser-known misadventures involving matrix parameters.

I touch on features like Observers (invisible glue, invisible bugs) and ActiveResource…which wasn’t confusing so much as it was optimistic. It assumed the APIs you were consuming were designed with Rails-like conventions in mind. That was rarely the case.

I also explore what Rails has taught us about developer happiness, what it means to build with care, and what the community keeps refining (and laughing about).

Here’s a quick example: I once wrote an InvoiceObserver that did four different things silently…and when it broke, it took hours to even figure out where the logic lived. Magical until it wasn’t.

Why Look Back Now?

With RailsConf coming to a close, it felt like the right moment to reflect not just on the framework…but on how we evolve alongside it.

Rails doesn’t just chase trends. It revisits its own decisions and asks: “What still brings us joy?”

That’s a rare trait in software. And it’s why Rails still feels like home for so many of us.

“Rails doesn’t just move forward…it reflects. It loops. It asks: Where’s the friction? What can we make effortless again?”

If you’re newer to the framework, or just curious what Rails has quietly taught us over the years…I hope you find something here to smile at.

I’m grateful to my Ruby friends…some old, some new…who shared memories, weird bugs, screenshots, mailing list lore, and just the right amount of healthy skepticism while I was putting this together.

Stop Pretending You're the Last Developer

July 16, 2025

Ruby on Rails is often celebrated for how quickly it lets small teams build and ship web applications. I’d go further: it’s the best tool for that job.

Rails gives solo developers a powerful framework to bring an idea to life—whether it’s a new business venture or a behind-the-scenes app to help a company modernize internal workflows.

You don’t need a massive team. In many cases, you don’t even need a team.

That’s the magic of Rails.

It’s why so many companies have been able to start with just one developer. They might hire a freelancer, a consultancy, or bring on a full-time engineer to get something off the ground. And often, they do.

Ideas get shipped. The app goes live. People start using it. The team adds features, fixes bugs, tweaks things here and there. Maybe they’ve got a Kanban board full of tasks and ideas. Maybe they don’t. Either way, the thing mostly works.

Until something breaks.

Someone has to redo work. A weird bug eats some data. A quick patch is deployed. Then someone in management asks the timeless question: “How do we prevent this from happening again?”

Time marches on. Other engineers come and go, but the original developer is still around. Still knows the system inside and out. Still putting out fires.

Eventually, the company stops backfilling roles. There’s not quite enough in the backlog to justify it. And besides, everything important seems to be in one person’s head. That person becomes both the system’s greatest asset—and its biggest risk.

This is usually about the time our team at Planet Argon gets a call.

Sometimes, it’s the developer who reaches out. They’re burned out. They miss collaborating with others. They’re tired of carrying the whole thing. Other times, it’s leadership. Things are moving too slowly. Tickets aren’t getting closed. The bugs they reported last quarter still haven’t been addressed. They’re worried about what happens if that one dev goes on vacation. Or leaves.

They’ve tried bringing in outside help… but nothing sticks. The long-term engineer keeps saying new people “don’t get it.”

By the time we step in, we’ve seen some version of this story many, many times.

Documentation? Sparse or outdated.
Tests? There are some, but good luck trusting them.
Git commit messages? A series of “fixes” and “WIP”.
Hardcoded credentials? Of course.
Onboarding materials? There’s nobody to onboard.
Rails upgrades? “We’ll get to it eventually… maybe.”

A New Rails Podcast: On Rails

June 25, 2025

Today marks the launch of On Rails, a new podcast produced by the Rails Foundation and hosted by yours truly.

We’ve recorded the first batch of episodes, and Episode 1 is out now: Rosa Gutiérrez on Solid Queue.

The show dives into technical decision-making in the Ruby on Rails world. Not the shiny trend of the week… but the real conversations teams are having about how to scale, what trade-offs to make, and what long-term maintainability actually looks like.

You’ll hear from developers running real apps. Some are building internal tools. Others work on products you’ve probably used. A few are out there blogging and tweeting… but many are too deep in the day-to-day to stop and write about it. They’re just doing the work — shipping, fixing, refactoring, and figuring it out as they go.

The idea for On Rails started with those hallway conversations at conferences. The ones that don’t make it into keynotes or blog posts. It grew out of the calls I have with clients at Planet Argon. And, of course, out of years of hosting Maintainable.fm.

You’d think that after recording over 200 episodes of Maintainable, I wouldn’t be so nervous to hit record on something new… but here we are. New show jitters are real.

We’re approaching this podcast with depth and focus. Fewer episodes. Longer interviews. Conversations that aim to surface lessons learned…and the thinking behind the decisions that shape real systems.

If you’re a Rails fan, I hope you’ll give it a listen. If subscribing is your thing, you know what to do. And if you’ve got a story worth sharing — I’d love to hear from you.

🎧 Listen to Episode 1: Rosa Gutiérrez: Solid Queue

🌐 Browse all episodes: onrails.buzzsprout.com

📢 Official announcement: Ruby on Rails blog

Backstory

Earlier this year, I dusted off this blog — which I started back in 2005 — and found myself reflecting on Maintainable, the podcast I’ve hosted for the past few years about long-term software health.

At the same time, I was toying with the idea of spinning off something more Rails-focused. A show that could spotlight the kinds of conversations I was already having…with clients, with other devs, and in those casual, between-session moments at conferences.

Right around then, Amanda from the Rails Foundation reached out with a prompt:

“A podcast of Rails devs talking about the nitty gritty technical decisions they’ve made along the way.”

…which aligned nicely with what I had been ruminating on. The timing was perfect and we decided to make it happen.

One of the early shifts for me was adapting to a more collaborative production process. I’ve been running Planet Argon for more than two decades — and I’m used to moving quickly, often without needing to pitch or workshop ideas with others. But with On Rails, I’ve had the opportunity to work closely with Amanda, the Foundation, and DHH. They’ve all taken an active interest in shaping the vision, the guests, and the format.

Another early challenge? The first round of guests were pitched to me — which meant jumping into the deep end with folks I hadn’t already spoken with. That raised the bar for prep. On Maintainable, I’ve occasionally relied on some degree of improvisation. Here, I knew I’d need to come in more prepared…and that’s been a good thing.

So On Rails was born.

I’ll still be hosting Maintainable (though likely on a slower cadence). And I’m excited to run both of these shows side by side — each with their own tone and focus.

Hope you get a chance to give it a listen.

Steering Rails Apps Out of Technical Debt - Rails World

February 01, 2025

Drowning in technical debt?

It doesn’t have to be this way.

Back in September at Rails World 2024, I shared what I’ve learned from helping teams tack their way out of trouble—less theory, more battle-tested strategies. Lessons from Planet Argon’s clients, Maintainable.fm guests, and real-world Rails teams.

Have 25 minutes? Watch it here:

Your future self (and your app) will thank you.

« Previous

1 2 3 4 5 ... 25

Recent Posts

Why So Serious?

Who Keeps the Lights On?

Architecture for Contraction

Organizations, Like Code, Deserve Refactoring

Talking Shop with Ruby & Rails Maintainers at Rails World 2025

7 Stages of Software Tech Stack Adoption (You're Probably in Stage 5)