When Your Cache Has a Bigger Carbon Footprint Than Your Users

Caching in Rails is like duct tape. Sometimes it saves the day. Sometimes it just makes a sticky mess you’ll regret later.

Nowhere is this more true than in data-heavy apps… like a custom CRM analytics tool that glues together a few systems. You know the type: dashboards full of metrics, funnel charts, KPIs, and reports that customers swear they need “real-time.”

And that’s where the caching debates begin.

All valid. All expensive in their own way.


The Dashboard Problem

Your SaaS app has grown to 2,000 customers, each with multiple users.

For the overwhelming majority, dashboards load just fine. Nobody complains.

But then your whales log in. The Fortune 500 accounts your sales reps obsess over. Their dashboards pull data from half a dozen APIs, crunch millions of rows, and stitch together a wall of charts. It’s not just a page. It’s practically a data warehouse in disguise.

These dashboards are slow. Painfully slow. And you hear about it… through support tickets, account managers, and sometimes even a terse email from someone with “Chief” in their title.

So your engineering team digs in. You fire up AppSignal, Datadog, or Sentry and zero in on the slowest dashboard requests. You look at traces, database query timings, and request logs. You chart out the p95 and p99 response times to understand how bad it gets for the biggest customers.

From there, you start experimenting:

  • Are we missing database indexes?
  • Are there N+1 queries lurking in the background?
  • Can we preload or memoize expensive calls?
  • Could a few data points be cached individually, without touching the rest?

You squeeze what you can out of the obvious optimizations. Maybe things improve… but not enough.

So the conversation shifts.


Negotiating “Real-Time” Expectations

When your product team actually sits down with those whale customers, the conversation shifts.

They start by saying: we need real-time data. But after a little probing, everyone realizes “real-time” doesn’t always mean right now this second.

Maybe what they really need is a reliable snapshot of activity as of the end of the previous day. That’s good enough for the kinds of decisions their leadership is making in the morning meeting. Nobody is making million dollar calls based on a lead that just landed five minutes ago.

And your team can remind them: there are other real-time metrics in the system. For example:

  • New leads created today.
  • Active users in the past hour.
  • Pipeline changes as they happen.

So now you’ve reframed the dashboard story. Instead of one giant “real-time” data warehouse, you split it into two categories:

  1. Daily rollups. Crunch the heavy stuff once a night. End-of-day data is sufficient, and it’s reliable.
  2. Today’s activity. Show a few real-time metrics that are fast to calculate. Give customers the dopamine hit of “live” data without boiling the ocean.

That’s usually enough to recalibrate expectations. Customers feel like they’re still getting fresh data, while your app no longer sets itself on fire every time a big account logs in.


The 1 AM Job (and Its Hidden Cost)

Armed with that agreement, your team ships the “reasonable” solution most of us have built at least once:

  • Every night at 1 AM, loop through all 2,000 customers.
  • Generate every dashboard report.
  • Cache the results somewhere “safe.”

The next morning, dashboards are instant. Support tickets quiet down. Account reps breathe easier. Everyone celebrates.

But here’s the kicker: the whales were the problem. The rest of your customers never needed this optimization in the first place. Their dashboards were already fine.

So now you’ve turned one customer’s problem into everyone’s nightly job. And under the hood, you’ve cranked through hours of CPU, memory, and database load… just to prepare data for customers who won’t even log in later today.

Worse, you’ve stuffed your background job queue with 2,000 little tasks every night. Which means your queue system—whether it’s Sidekiq, Solid Queue, or GoodJob—is spending precious time juggling busy work instead of focusing on the jobs that actually matter. And when those queues get stuck, or a worker crashes, you’re left wading through a mountain of pending jobs just to catch up.

This is what I call Cache Pollution: the buildup of unnecessary caching work that bloats your systems, slows down your queues, and leaves your caching strategy with a far bigger carbon footprint than it needs to. Another benefit of tackling Cache Pollution early is future flexibility — you might eventually solve the computation challenges in a different way, and you won’t be anchored to big, scary scheduled tasks that churn through all of your customers every night.


Frequency Matters More Than We Admit

Do these reports need to run every single day… or only on weekdays when your customers actually log in?

If your traffic drops on Saturdays and Sundays, consider a lighter schedule. Or even none at all. Because “slow” isn’t so slow when almost nobody is around. A BigCorp admin poking the dashboard on Sunday morning might be fine with an on-demand render… especially if the weekday experience is snappy.

And here’s another angle: if your scheduled job runs at 1 AM, that means when a BigCorp user logs in later that same day, they’re still looking at data that’s less than 24 hours old. For most business use cases, that’s plenty. You don’t need to rerun heavy jobs every few hours just because you can.

This is all about right-sizing frequency:

  • Weekday cadence: nightly rollups for whales; maybe twice a day if usage demands it.
  • Weekend cadence: pause, or run a narrower subset.
  • Holiday mode: same idea… different switch.

If your dashboard code doesn’t rely on the cache to render, you keep the option to not precompute. That flexibility is where the savings live. As the business grows, the cost of overly eager schedules grows with it… so design for dials, not hard-coded habits.


Other Things to Consider with Recurring Tasks

One more question to ask about recurring scheduled jobs: do you really need to iterate through all users or all organizations?

In many cases, the answer is no. Most customers don’t trigger the conditions that require a heavy recompute. Yet teams often design jobs to blast across every top-level object in the database, every night, without discrimination.

Instead, look for signals that help you scope the work down:

  • Which organizations actually logged in today?
  • Which customers have datasets large enough to need optimization?
  • Which accounts crossed a threshold since the last run?

By narrowing the set of work each job touches, you cut down on wasted compute, reduce queue congestion, and avoid the kind of Cache Pollution that grows silently as your business scales.


Clever Tricks We’ve Seen

The trick isn’t just caching everything for everyone. It’s knowing who to cache for and when.

  • Selective pre-caching. Only build nightly rollups for your whales. Maybe 50 out of 2,000 customers. Everyone else can render on demand, which was fine all along.
  • Cache on login. If you know a user from BigCorp is signing in, enqueue a background job to warm up their dashboard before they hit it. You can even anticipate who they are based on a cookie value when they land on the Sign In page — before they’ve had a chance to trigger 1Password or type in their credentials, the system is already working behind the scenes to prep their dashboard. Even a 10–20 second head start can smooth the experience.
  • Cache on demand… with a fallback. If cached data is missing, build fresh on the spot. Outages happen when teams assume the cache will always be there.

And here’s a bonus: if your job fails at 1 AM, re-running it for 50 customers is a whole lot faster than crawling through 2,000.

Extra credit: scope your scheduled tasks so that when a customer crosses a certain threshold—say, user count, dataset size, or request volume—they automatically join the “whale” group. No manual babysitting required.


Other Patterns

Not all caching challenges look like dashboards.

Case Study: The Press Release Problem

We once managed a public-facing site for a massive brand. Whenever they dropped a big press release, it spread fast across social media. Traffic would spike within minutes.

Of course, that’s when the CEO would notice a typo. Or the PR team would need to update a paragraph to reflect a question from the media. Despite their editorial workflows, changes still had to happen after publication.

So we had to get clever. We couldn’t cache those fresh pages for hours. Instead, we used a sliding window approach:

  • First 5 minutes: cache for 30 seconds at a time.
  • After 5 minutes: increase to 1 minute.
  • After 10 minutes: increase to 2 minutes.
  • After 20 minutes: increase to 5 minutes.
  • After 6 hours: safe to cache for an hour.
  • After a day: cache for a few hours at a time.

This let us protect our Rails servers from massive traffic spikes when a new article was spreading fast, while still giving editors the ability to push corrections through quickly. Older articles, once stable, could safely sit in Akamai’s cache for hours.

At the time, Akamai could take up to seven minutes to guarantee a purge across their global network. Not ideal. We had to plan for that lag. Today, most CDNs can purge instantly, but back then… it was a constraint we had to design around.


A Final Challenge

A lot of what we’ve talked about here comes down to avoiding Cache Pollution.

That’s the unnecessary churn your system takes on when it generates data nobody asked for. It’s the background job queue bloated with thousands of tasks that fight with more important work. It’s the 1 AM process chewing through CPU just to prep dashboards for customers who never log in.

Cache Pollution looks like optimization on the surface… but underneath it’s just waste.

So before your team spins up the next caching project, stop and ask:

  • Who really needs this cache?
  • How fresh does it need to be?
  • What happens if the cache isn’t there?
  • Do we need to run it this often, or for this many customers?
  • Could we scale down the busy work instead of scaling it up?

Because the goal isn’t just faster dashboards. The goal is to keep your caching strategy lean, resilient, and focused — instead of leaving behind a trail of Cache Pollution that grows with every new customer you add.

Hi, I'm Robby.

Robby Russell

I run Planet Argon, where we help organizations keep their Ruby on Rails apps maintainable—so they don't have to start over. I created Oh My Zsh to make developers more efficient and host the Maintainable.fm podcast to explore what it takes to build software that lasts.