Recent Posts

Building a RAG Tool in Ruby: What Actually Happened

February 26, 2026

I had never personally worked with embeddings, vector databases, or retrieval-augmented generation before this project. I knew the words. I did not know where the sharp edges were. Folks on our team do… but I felt it was time to wrap my own head around it.

What I did have was a real problem, a team that loves Ruby, and enough curiosity to see where things broke.

This is the story of that experiment… what worked, what surprised me, and what I’d tell another Ruby developer who’s considering something similar.

The Problem

At Planet Argon, we manage several client projects. We live in Jira (I know… I know…). We keep decisions in Confluence. We ship code from GitHub. Over years a lot of institutional knowledge piles up across those systems… past bugs, old tradeoffs, and the “we tried that once” stories.

The problem is that nobody remembers all of it. A new ticket comes in: “users can’t export reports to PDF”. Somewhere in Jira there’s a ticket from eight months ago where we debugged a Safari-specific PDF export issue. Of course it was Safari. Somewhere in Confluence there’s a permissions matrix that’s suddenly relevant. If you weren’t assigned to the project back then, you would never know to look.

So we start over. We ask clarifying questions from scratch. We search Slack to ask if anyone has asked something like this before. Tickets go into development with vague acceptance criteria, and the back-and-forth that should have happened before coding shows up during code review and/or when we’re QAing on staging instead.

A vague ticket is a polite way to ask engineers to guess. Guessing can be expensive.

I wanted to build something that could surface that historical context automatically. Point it at a ticket and get suggested clarifying questions grounded in what we actually know “remember” about this project.

Why Ruby, Why Minimal Dependencies

Ruby is what our team loves working in. If we were going to learn embeddings, vector search, and LLM integration, I wanted everything around those ideas to feel familiar.

I also wanted to keep the dependency footprint deliberately small. This is an internal tool for a small team. Every Ruby gem you add is a gem you maintain. I’ve watched too many internal tools rot after someone pulled in thirty dependencies for a weekend project, then nobody wanted to deal with the upgrade treadmill six months later.

Related: Internal Tooling Maturity Ladder is an approach that I’ve been exploring with our internal tools. The idea is to start with the simplest possible implementation (a script that solves the problem for one person), then evolve it through stages of maturity (CLI tool, shared server, versioned gem) as the need becomes clearer and the team is ready to invest more.

Rather than listing the full Gemfile here, I’ll call out the handful of gems that did the heavy lifting… because that’s the part you can steal directly if you’re building something similar.

The gems that do the real work:

  • thor: CLI framework. Subcommands, flags, help text out of the box.
  • ruby-openai: the workhorse. Handles both embedding generation (text-embedding-3-small) and LLM completions (gpt-4o-mini). One gem, two critical jobs.
  • pinecone: Ruby client for Pinecone, our production vector database.
  • chroma-db: Ruby client for Chroma, a local vector database you can run in Docker.
  • faraday: HTTP client for talking to Jira, Confluence, and GitHub APIs.
  • nokogiri: needed to strip HTML from Confluence page bodies before embedding.
  • concurrent-ruby: thread pools and futures for parallel data ingestion.
  • mcp: Model Context Protocol server for Claude Code integration (this came later, and it changed everything).
  • The tty-* family: progress bars, spinners, colored output, prompts. Not necessary… but nicer when you’re watching a 20-minute ingestion run.

Beyond that, I leaned on Ruby’s standard library wherever possible: JSON, URI, Struct, Set, Time, FileUtils. The instinct to reach for a gem is strong, but for most things the stdlib is genuinely sufficient. The goal is not cleverness. The goal is leverage. Also leaned on minitest for testing, but that’s a story for another post.

Why Not a Server

Early on I made a decision that shaped the whole architecture: no running HTTP server with an endpoint. (at least, not yet).

A server is a commitment. Hosting. VPNs. Monitoring. Security reviews. Someone eventually asking, “who owns this?”. For an internal experiment that might not pan out, that felt like a lot of ceremony up front.

So I built it as a CLI tool. Each engineer runs it locally on their own machine. The only shared infrastructure is Pinecone, a cloud-hosted vector database. Everyone gets API keys to the same Pinecone index, but each client’s data lives in its own namespace. Engineers use their own Atlassian and GitHub API tokens when they want to run an ingestion.

Here’s what the environment setup looks like:

# .env: each engineer has their own copy
# OpenAI (for embeddings and analysis)
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini

# Atlassian (shared instance, individual tokens)
ATLASSIAN_BASE_URL=https://planetargon.atlassian.net
ATLASSIAN_EMAIL=you@planetargon.com
ATLASSIAN_API_TOKEN=ATATT3x...

# GitHub (individual tokens)
GITHUB_TOKEN=ghp_...

# Pinecone (shared index, namespaces isolate client data)
PINECONE_API_KEY=...
PINECONE_INDEX_NAME=clarion

This kept the experiment low-stakes. No deployment pipeline, no server to maintain, no VPN to configure. If it didn’t work out, there was nothing to decommission. Engineers pull updates from the main branch, run bundle install, and they’re current. It’ll likely become a proper gem we version at some point, but for now the simplicity of “pull main and go” is working fine.

What the CLI Looks Like

The entrypoint is dead simple:

#!/usr/bin/env ruby
require "bundler/setup"
require_relative "../lib/clarion"

Clarion::CLI.start(ARGV)

Here’s the help output:

$ bin/clarion help

Commands:
  clarion analyze TICKET_ID    # Analyze a Jira ticket and suggest clarifications
  clarion help [COMMAND]       # Describe available commands or one specific command
  clarion ingest SUBCOMMAND    # Ingest data from various sources
  clarion ingest_all CLIENT    # Ingest Jira, Confluence, and GitHub data for a client
  clarion mcp                  # Start MCP server (for Claude Code integration)

$ bin/clarion help ingest

Commands:
  clarion ingest confluence    # Ingest Confluence pages for a specific space
  clarion ingest github        # Ingest GitHub repository data
  clarion ingest help          # Describe subcommands or one specific subcommand
  clarion ingest jira          # Ingest Jira tickets for a specific project

The CLI: Thor Subcommands

I chose Thor over raw OptionParser because the tool has several distinct commands with different flag sets. Thor gives you subcommands, required options, type validation, and auto-generated help text with minimal boilerplate.

Here’s the skeleton of the CLI:

module Clarion
  class CLI < Thor
    desc "analyze TICKET_ID", "Analyze a Jira ticket and suggest clarifications"
    option :verbose, type: :boolean, desc: "Enable verbose output"
    def analyze(ticket_id)
      validate_ticket_id!(ticket_id)
      analyzer = Clarion::Analyzer.new(ticket_id, verbose: options[:verbose])
      puts analyzer.analyze
    end

    desc "ingest_all CLIENT", "Ingest Jira, Confluence, and GitHub data for a client"
    option :limit, type: :numeric, default: 100
    option :parallel, type: :boolean, default: true
    def ingest_all(client_name)
      # Looks up client config, dispatches to parallel ingestion
    end

    desc "mcp", "Start MCP server (for Claude Code integration)"
    option :namespace, type: :string, desc: "Client namespace (auto-detected if omitted)"
    def mcp
      Clarion::McpServer.new(namespace: options[:namespace]).run
    end

    # Nested subcommand for individual ingestion
    desc "ingest SUBCOMMAND", "Ingest data from various sources"
    subcommand "ingest", Ingest

    private

    def validate_ticket_id!(ticket_id)
      return if ticket_id =~ /^[A-Z]+-\d+$/
      raise Thor::Error, "Invalid ticket ID format. Expected: PROJECT-123"
    end
  end
end

The Ingest subcommand is its own Thor class, giving us scoped commands for each data source:

# Analyze a ticket
$ bin/clarion analyze WR-123

# Ingest everything for a client (parallel by default)
$ bin/clarion ingest_all waystar --limit=500

# Or ingest individual sources
$ bin/clarion ingest jira --namespace=waystar --project=WR --limit=500
$ bin/clarion ingest confluence --namespace=waystar --space=WR
$ bin/clarion ingest github --namespace=waystar --repo=planetargon/waystar-web

# Start an MCP server for Claude Code
$ bin/clarion mcp --namespace=waystar

Every ingest command requires explicit --namespace and source-specific scoping flags (--project, --space, --repo). This is deliberate. Operations should never run without explicit client scope.

Client Configuration

Each client maps to a namespace, a Jira project, a Confluence space, and optionally GitHub repos:

# config/clients.yml
clients:
  waystar:
    namespace: waystar
    jira_project: WR
    confluence_space: WR
    vector_store: pinecone
    github_repos:
      - planetargon/waystar-web
      - planetargon/waystar-api

  piedpiper:
    namespace: piedpiper
    jira_project: PP
    confluence_space: PP
    vector_store: pinecone
    github_repos:
      - planetargon/piedpiper-app

  pierpoint:
    namespace: pierpoint
    jira_project: PPC
    confluence_space: PPC
    vector_store: chroma    # Local Chroma for testing

Note the per-client vector_store setting. One client can use Pinecone (shared, cloud-hosted) while another uses Chroma (local Docker instance) for development. The tool doesn’t care. The vector store abstraction handles it.

Embeddings: Simpler Than I Expected, Until They Weren’t

Here’s what took me a while to internalize: you’re just turning text into a point in a very large space. Similar text ends up near similar points. That’s it. That’s the whole idea.

We use OpenAI’s text-embedding-3-small model, which produces 1,536-dimensional vectors. You send it a string, you get back an array of 1,536 floats. Store those floats alongside the original text, and later you can find “nearby” documents by comparing vectors.

The ruby-openai gem makes the embedding call straightforward:

EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSION = 1536

def generate_embedding(text)
  return Array.new(EMBEDDING_DIMENSION, 0.0) if text.nil? || text.strip.empty?

  response = @openai.embeddings(
    parameters: {
      model: EMBEDDING_MODEL,
      input: text.strip
    }
  )

  response["data"][0]["embedding"]
end

One thing I didn’t appreciate initially is that every embedding call costs money and adds latency. My early version used search, which takes a text string and internally calls OpenAI to generate an embedding before querying Pinecone:

# Before: each search() call generates its own embedding internally
similar  = @vector_store.search(query_text, filter: { source: "jira" })
docs     = @vector_store.search(query_text, filter: { source: ["confluence", "github"] })
resolved = @vector_store.search(query_text, filter: resolved_filter)

That’s three sequential calls to OpenAI’s embedding API for the exact same text, followed by three sequential calls to Pinecone. Six network round-trips, all in series.

Looking at the search method, you can see why. It generates a fresh embedding every time:

def search(query, filter: nil, top_k: 10)
  query_embedding = generate_embedding(query)  # Hits OpenAI every call
  search_by_vector(query_embedding, filter: filter, top_k: top_k)
end

The fix was two things at once: generate the embedding once, then pass that vector directly to search_by_vector (which skips the embedding step). Then run those three Pinecone queries concurrently:

# After: one embedding, three parallel vector searches
query_vector = @search.embed(query_text)

similar  = Thread.new { @search.search_by_vector(query_vector, source: "jira") }
docs     = Thread.new { @search.search_by_vector(query_vector, source: ["confluence", "github"]) }
resolved = Thread.new { @search.search_by_vector(query_vector, resolved_filter) }

The OpenAI embedding calls went from 3 to 1. The Pinecone queries stayed at 3 but now run concurrently instead of sequentially. Two wins from a small refactor.

I also learned about truncation the hard way. Some Jira tickets are enormous… long comment threads, embedded images described in markup, and extensive acceptance criteria. The embedding model has a token limit. We now truncate text at 30,000 characters before sending it for embedding.

Would’ve been nice to learn that from documentation rather than from a production error. Oh well.

The Vector Store Abstraction

I didn’t want to be locked into a single vector database, especially early on when I wasn’t sure which one would work best for us. So I built a simple abstraction layer. It’s a factory that returns different backends behind the same interface:

class VectorStore
  def self.new(namespace:, backend: nil)
    backend ||= ENV.fetch("VECTOR_STORE_BACKEND", "memory")
    case backend.downcase
    when "pinecone" then VectorStores::Pinecone.new(namespace: namespace)
    when "chroma"   then VectorStores::Chroma.new(namespace: namespace)
    when "memory"   then VectorStores::Memory.new(namespace: namespace)
    end
  end
end

All three backends implement the same base contract:

module VectorStores
  class Base
    attr_reader :namespace

    def initialize(namespace: nil)
      @namespace = namespace
    end

    def upsert(documents)
      raise NotImplementedError, "#{self.class}#upsert must be implemented"
    end

    def search(query, filter: nil, top_k: 10)
      raise NotImplementedError, "#{self.class}#search must be implemented"
    end

    def search_by_vector(vector, filter: nil, top_k: 10)
      raise NotImplementedError, "#{self.class}#search_by_vector must be implemented"
    end

    def embed(text)
      raise NotImplementedError, "#{self.class}#embed must be implemented"
    end

    def delete_all(namespace: nil)
      raise NotImplementedError, "#{self.class}#delete_all must be implemented"
    end

    def stats
      raise NotImplementedError, "#{self.class}#stats must be implemented"
    end
  end
end

Callers just use upsert, search, search_by_vector, stats. They never know or care whether they’re talking to Pinecone, Chroma, or an in-memory hash.

The Pinecone backend stores document text inside the metadata (Pinecone doesn’t have a native text field), then strips it back out on retrieval:

# During upsert: embed text into metadata
metadata = (doc[:metadata] || {}).merge(text: doc[:text])
{ id: doc[:id], values: embedding, metadata: metadata }

# During search: extract text back out, unescape newlines
matches.map do |match|
  result = match.dup
  if result["metadata"] && result["metadata"]["text"]
    text = result["metadata"]["text"]
    result["text"] = text.is_a?(String) ? text.gsub('\\n', "\n") : text
    result["metadata"] = result["metadata"].except("text")
  end
  result
end

This paid off quickly. We started with the in-memory backend (pure Ruby cosine similarity, persists to a JSON file) just to prove the concept worked at all. Then Chroma for local development. You can run it in Docker. No cloud account needed. Then Pinecone for the shared production dataset that the whole team can access.

Ingesting Messy Real-World Data

This is where things got messy.

Jira: Flattening the Ticket

Each Jira ticket gets transformed into a document with an ID, a text blob, and structured metadata:

def transform(ticket)
  key = ticket["key"]
  fields = ticket["fields"] || {}

  {
    id: "jira_#{@namespace}_#{key}",   # e.g., "jira_waystar_WR-123"
    text: build_text(key, fields),
    metadata: build_metadata(key, fields)
  }
end

The text blob concatenates everything meaningful about the ticket: the key, summary, description, comments (with author tags), labels, parent/subtask relationships, and any embedded Confluence links.

Jira’s rich text format is a nested JSON tree. Jira uses something called Atlassian Document Format (ADF) for ticket descriptions and comments. It’s not HTML. It’s not Markdown. It’s a deeply nested JSON structure with node types like paragraph, bulletList, taskItem, mention, inlineCard, and emoji. I had to write a recursive parser to walk that tree and flatten it into plain text:

class AdfParser
  def extract_text(adf_doc)
    return "" unless adf_doc.is_a?(Hash)
    extract_blocks(adf_doc).join(" ").strip
  end

  private

  def extract_blocks(adf_doc)
    return [] unless adf_doc["content"].is_a?(Array)
    adf_doc["content"].map { |node| format_block(node) }
  end

  def format_block(node)
    return "" unless node.is_a?(Hash)

    case node["type"]
    when "taskList" then format_task_list(node)
    when "bulletList", "orderedList" then format_list(node)
    else extract_from_node(node)
    end
  end

  def extract_from_node(node)
    case node["type"]
    when "text"      then node["text"] || ""
    when "hardBreak" then "\n"
    when "mention"   then "@#{node.dig('attrs', 'text') || 'user'}"
    when "emoji"     then node.dig("attrs", "shortName") || ""
    when "inlineCard", "blockCard" then node.dig("attrs", "url") || ""
    else inline_text(node)
    end
  end
end

Not complex, but the kind of thing you don’t anticipate until you see your first embedding full of raw JSON nodes. Thankfully, we can task Claude Code with figuring out some of this chaos.

Comment authors matter. We tag each Jira comment as [Team] or [Client] based on the commenter’s email domain:

def determine_author_type(email)
  if email.include?("@planetargon.com")
    "[Team]"
  elsif email.empty?
    ""
  else
    "[Client]"
  end
end

This matters more than I thought it would. The LLM can distinguish between internal engineering discussion and client-facing conversation when generating suggested questions.

Confluence: Chunking HTML

Confluence pages come back as raw HTML. Nokogiri strips the markup, then long pages get chunked into roughly 2,000-character segments with 200 characters of overlap, breaking at sentence boundaries where possible. Each chunk becomes its own document in the vector store. A 10-page Confluence spec might produce five or six chunks, each independently searchable.

GitHub: PRs, Issues, Docs, and Code

The GitHub ingester pulls from multiple sources: READMEs and documentation files, pull request descriptions (with merge dates and authors), issues, and source code files. Each becomes a document with source: "github" metadata, so the context builder can query for documentation specifically.

Batch Uploads and Deterministic IDs

Documents get uploaded to the vector store in batches of 20. Errors in one batch don’t abort subsequent batches:

class BatchUploader
  BATCH_SIZE = 20

  def upload(documents)
    documents.each_slice(BATCH_SIZE) do |batch|
      @vector_store.upsert(batch)
      @processed_count += batch.length
    rescue StandardError => e
      @error_count += batch.length
    end
  end
end

Every document gets a deterministic ID based on its source: jira_waystar_WR-123, confluence_waystar_12345_chunk_2, or github_waystar_waystar-web_pr_47. This means re-running ingestion overwrites old documents instead of creating duplicates. Engineers can re-ingest anytime without polluting the dataset.

When a Jira ticket updates, the next ingestion run replaces the old embedding with the new one. Same with Confluence pages and GitHub content. The vector store stays in sync with reality without complex change detection or deletion logic.

The tradeoff: someone needs to remember to run ingestion periodically. But the simplicity is worth it.

Parallel Ingestion with concurrent-ruby

When ingesting all sources for a client, the tool uses concurrent-ruby to run Jira, Confluence, and GitHub ingestions in parallel:

pool = Concurrent::FixedThreadPool.new(3)

futures = []
futures << Concurrent::Future.execute(executor: pool) { ingest_jira }
futures << Concurrent::Future.execute(executor: pool) { ingest_confluence }
github_repos.each do |repo|
  futures << Concurrent::Future.execute(executor: pool) { ingest_github(repo) }
end

# Wait for all to complete
futures.each(&:wait)

Thread-safe state tracking uses Concurrent::Hash:

@results = Concurrent::Hash.new
@timings = Concurrent::Hash.new
@status = Concurrent::Hash.new

After completion, the tool calculates time saved versus sequential execution and reports a speedup factor. In practice, parallel ingestion typically finishes in about 60% of the time sequential would take, since the API calls to Jira, Confluence, and GitHub can overlap.

Running an ingestion looks like this:

$ bin/clarion ingest_all waystar --limit=500

════════════════════════════════════════════════════════════
                  COMBINED DATA INGESTION
════════════════════════════════════════════════════════════

ℹ Client: waystar
ℹ Namespace: waystar
ℹ Vector store: pinecone
ℹ Jira project: WR
ℹ Confluence space: WR
ℹ GitHub repos: planetargon/waystar-web
ℹ Limit: 500 items per source
ℹ Mode: Parallel

  ✓ Jira (WR)                     Complete (487/500 processed)
  ✓ Confluence (WR)               Complete (245/500 processed)
  ✓ GitHub: waystar-web            Complete (498/500 processed)

════════════════════════════════════════════════════════════
                    INGESTION RESULTS
════════════════════════════════════════════════════════════

ℹ ✓ Jira: 487 processed, 0 errors (45.2s)
ℹ ✓ Confluence: 245 processed, 0 errors (38.1s)
ℹ ✓ Github Waystar Web: 498 processed, 0 errors (52.7s)

════════════════════════════════════════════════════════════
                   PERFORMANCE SUMMARY
════════════════════════════════════════════════════════════

ℹ Total documents processed: 1230
✓ Total time: 58.3s
ℹ Time saved vs sequential: 77.7s (2.3x speedup)

✓ Client 'waystar' is ready for analysis!

Retrieval and Re-Ranking

Raw cosine similarity gets you most of the way there, but not all the way. The vector search returns the 40 most similar Jira tickets, and some of them are similar for the wrong reasons… same boilerplate language, same component name, but not actually useful context.

The context builder generates one embedding, then runs three concurrent searches. Similar tickets. Resolved tickets filtered by component. Documentation from Confluence and GitHub.

def gather_all_context(ticket, ticket_id, current_key, created_time)
  query = @search.build_query(ticket)
  query_vector = @search.embed(query)

  similar_thread = Thread.new do
    results = @search.search_by_vector(query_vector, { source: "jira" }, 40)
    score_and_limit_results(results, ticket, current_key, created_time, 16)
  end

  resolved_thread = Thread.new do
    results = @search.search_by_vector(query_vector, resolved_filter, 12)
    format_resolved_tickets(results)
  end

  docs_thread = Thread.new do
    results = @search.search_by_vector(query_vector, { source: ["confluence", "github"] }, 32)
    process_and_limit_docs(results, ticket, ticket_id, created_time, 16)
  end

  {
    similar_tickets:  similar_thread.value,
    related_resolved: resolved_thread.value,
    documentation:    docs_thread.value
  }
end

After retrieval, I added two simple re-ranking heuristics that made a noticeable difference:

Relationship boost. If a retrieved ticket is a parent or subtask of the ticket being analyzed, its score gets a 1.5x multiplier:

def apply_relationship_boost(ticket_data, relationship_type)
  ticket_data[:relationship] = relationship_type
  ticket_data[:score] *= 1.5
end

Temporal decay. Tickets older than 7 days get a 0.7x multiplier. Older than 30 days, 0.3x:

def age_adjustment_params(days_before)
  return [0.3, "Created #{days_before} days before ticket"] if days_before > 30
  return [0.7, nil] if days_before > 7
  [nil, nil]
end

These aren’t machine learning models. They’re just multipliers applied after retrieval. I was surprised how much difference they made. A few lines of Ruby math moved the output from “interesting but noisy” to something I’d actually act on.

It’s still early days, I expect that we’ll likely need to tweak this a bunch as we see more real-world queries and get feedback from engineers.

The Prompting Side

Two things surprised me here.

First: structured JSON output. Huge deal. We set response_format: { type: "json_object" } on the LLM call, which means the response is always valid JSON. No regex parsing, no hoping the model follows your format instructions. The response comes back with a defined structure:

{
  "ticket_type": "feature",
  "clarity_assessment": "needs_clarification",
  "clarifying_questions": [
    {
      "question": "The question to ask the client",
      "rationale": "Why this matters for implementation",
      "reference": "WR-892: similar issue last quarter"
    }
  ],
  "suggested_acceptance_criteria": [
    "User can export all report types to PDF",
    "Export completes within 30 seconds",
    "Error message displays if export fails"
  ],
  "potential_edge_cases": [
    "Special characters in report data",
    "Very large reports (>10,000 rows)"
  ],
  "implementation_notes": "Brief notes on approach"
}

Once you have reliable structure, everything downstream gets simpler.

Second: the prompt is where your institutional voice lives. This is the part that can’t be replicated by generic tooling. Our system prompt doesn’t just say “generate clarifying questions”. It encodes how Planet Argon communicates with clients:

Instead of asking open-ended technical questions, frame them as confirmations:

“It sounds like this needs to work in Chrome. Should we also make sure it works in Safari and Firefox?”

Rather than:

“What browsers need to be supported?”

The prompt covers dozens of specific communication scenarios. A few examples from the actual prompt file:

When clients apologize for not being technical:

“No need to apologize. You’re describing exactly what we need to know. The ‘what’s broken’ is your expertise; the ‘why it’s broken’ is ours.”

When scope is creeping:

“There’s a lot of good stuff here. To make sure nothing gets lost, would it help to break this into separate tickets? That way we can track the export fix and the new filter feature independently.”

When clients describe workarounds they’re using:

“Good thinking on the CSV workaround. That’ll keep things moving. We’ll fix the PDF export so you don’t have to keep doing that extra step.”

When something is working as designed:

“So it turns out the system is doing what it was originally built to do, but I hear you that it’s not what you need it to do. Want us to write up a feature request to change this behavior?”

This is the part that makes it ours and not just another RAG wrapper. The vector search finds the history. The prompt makes it sound like us.

We also maintain two separate prompt files. prompts/analyzer_default.md is for open tickets (“what’s unclear?”). prompts/analyzer_completed.md is for closed tickets (retrospective analysis). The tool detects the ticket’s status and selects the right prompt automatically. It’s a small touch, but it means the output is always contextually appropriate.

The MCP Surprise

I didn’t expect this part to become the most useful thing in the whole project.

The tool started as a CLI experiment. Run bin/clarion analyze WR-123 in your terminal, get output, copy what’s useful. It worked, but there was friction. You had to switch contexts, looking at a Jira ticket, looking at and jumping away from your editor, and remember the command syntax.

Having spent a bunch of time recently in Claude code, I wondered… could we bring this analysis directly into the editor? I think this took me less than two hours from “I wonder if this could be an MCP server” to “oh wow, it’s actually working”.

I quickly found the mcp gem, which implements Anthropic’s Model Context Protocol. MCP lets you expose a tool as a server that Claude Code can call directly. Here’s what the server looks like:

class McpServer
  def initialize(namespace: nil, working_directory: Dir.pwd)
    @namespace = namespace
    @working_directory = working_directory
    @client = resolve_client
  end

  def run
    server = build_server
    transport = MCP::Server::Transports::StdioTransport.new(server)
    transport.open
  end

  private

  def build_server
    MCP::Server.new(
      name: "clarion",
      version: Clarion::VERSION,
      tools: [Mcp::AnalyzeTool.build(@client)]
    )
  end
end

The MCP tool itself is built dynamically. The tool description is baked in with the client’s namespace and ticket prefix at startup time, so Claude Code knows exactly what it can do:

module AnalyzeTool
  def self.build(client)
    tool = Class.new(MCP::Tool) do
      tool_name "analyze_ticket"
      description "Analyze a Jira ticket and suggest clarifying questions " \
                  "and acceptance criteria. Scoped to client '#{client.namespace}' " \
                  "(ticket prefix: #{client.ticket_prefix})."

      input_schema(
        properties: {
          ticket_key: {
            type: "string",
            description: "Jira ticket ID (e.g., #{client.ticket_prefix}-123)"
          }
        },
        required: ["ticket_key"]
      )
    end
    # ... wire up call, validation, and analysis methods
    tool
  end
end

Each MCP server instance is scoped to a single client namespace. When an engineer is working in a client’s repository, they drop a small JSON config file at the repo root:

{
  "mcpServers": {
    "clarion": {
      "command": "/path/to/clarion/bin/clarion-mcp",
      "args": ["--namespace=waystar"]
    }
  }
}

The bin/clarion-mcp wrapper is a one-liner. It sets the working directory, then delegates:

#!/bin/bash
cd "$(dirname "$0")/.."
exec bundle exec ruby -Ilib bin/clarion mcp "$@"

Now they can ask Claude Code to “analyze WR-123” and get the full analysis inline. Clarifying questions. Suggested acceptance criteria. Edge cases. Implementation notes. All without leaving their editor.

Auto-detection from git remote. If the client’s repo is configured in clients.yml with its github_repos, you can even skip the --namespace flag. The server shells out to git remote get-url origin, parses the owner/repo slug, and looks it up automatically.

One gotcha worth mentioning: TTY output breaks MCP’s stdio transport. All those nice spinners and progress bars and colored output that make the CLI experience polished? They corrupt the MCP response stream. I had to suppress stdout during MCP calls:

def run_analysis(key)
  config = AnalyzerConfig.build(key, result_formatter: PlainTextFormatter.new)
  original_stdout = $stdout
  $stdout = File.open(File::NULL, "w")
  begin
    Analyzer.new(config).analyze
  ensure
    $stdout = original_stdout
  end
end

Small thing, but it would have been confusing to debug without knowing to look for it. We also have a separate PlainTextFormatter that outputs clean text for MCP, versus the ResultFormatter that uses colored boxes and unicode for the CLI.

Where It Gets Really Interesting: MCP in Combination

Clarion as an MCP server is useful on its own. But the thing that got me excited was running it alongside other MCP servers in the same Claude Code session.

Our engineers can have Clarion (our embedded project history), the Atlassian MCP (live read/write access to Jira and Confluence), and the GitHub MCP all connected at once. That combination opens up workflows none of these tools could do alone:

Analysis to action without context-switching. Ask Clarion to analyze a ticket. It surfaces related historical context and suggests clarifying questions. Review the suggestions, adjust the wording, then use the Atlassian MCP to post a comment directly on the Jira ticket. All within Claude Code. The loop from “what should we ask?” to “we asked it” closes in a single session.

Breaking down epics. This is one we’re actively exploring. Point Clarion at an Epic, and it can pull in context from how similar large efforts were structured in the past. What the subtask breakdown looked like. What got missed. Where scope crept. Use that context to draft a breakdown into smaller tickets with clear acceptance criteria on each one. Then use the Atlassian MCP to create those subtasks in Jira, already populated with suggested AC. That’s different from asking a generic LLM to decompose an epic. It’s referencing how this team on this project has handled similar work before.

Cross-source research. An engineer can ask “what do we know about how authentication works in this project?” and get results from Jira tickets where auth bugs were fixed, Confluence pages documenting the auth flow, and GitHub PRs where the auth code was changed. All from one query, all scoped to that single client. With the GitHub MCP also connected, they can then inspect the actual current code to verify whether those docs are still accurate.

Pre-development discovery. Before an engineer, or an AI coding agent, starts building, the ticket should be clear. Clarion sits at that boundary: after the client describes what they want, before anyone writes code. The suggested questions aren’t generic. They’re informed by the specific history of this project. “Last time we did a PDF export on this project, Safari caused problems” is more useful than “have you considered browser compatibility?”.

Multi-Tenant Scoping: The Hard Constraint

One constraint that shaped everything: Planet Argon uses a single Atlassian account across most of our client projects (some clients own their own Atlassian accounts). Same Jira instance, same Confluence instance, one set of API credentials.

That means data isolation has to be enforced in our code, not by infrastructure boundaries. Every operation requires an explicit client namespace. The vector store uses that namespace to partition data. One Pinecone index. Many isolated namespaces. Ticket IDs are validated against the expected prefix before any analysis runs.

Granted, our engineers do have access to reference different clients at the same time in their Atlassian account, but the tool itself is always scoped to one client per run. That’s the important part.

The validation happens at multiple layers. In the CLI:

def validate_ticket_id!(ticket_id)
  return if ticket_id =~ /^[A-Z]+-\d+$/
  raise Thor::Error, "Invalid ticket ID format. Expected: PROJECT-123"
end

And again in the MCP tool, where it also checks the prefix matches the scoped client:

def validate_ticket_prefix!(key)
  unless key.match?(/^[A-Z]+-\d+$/)
    raise ArgumentError, "Invalid ticket ID format: #{key}. Expected: PROJECT-123"
  end

  prefix = key.split("-").first
  return if prefix == scoped_client.ticket_prefix

  raise ClientScopeError,
        "Ticket #{key} does not belong to client " \
        "'#{scoped_client.namespace}' (expected prefix: #{scoped_client.ticket_prefix})"
end

If you’re working in the waystar namespace and try to analyze PP-123, you get a clear error: "Ticket PP-123 does not belong to client 'waystar' (expected prefix: WR)". Not results from the wrong client.

It’s a simple system. Namespaces and prefix checks. Again, engineers technically have access to all clients’ data in Atlassian, but the tool enforces discipline. You have to be intentional about which client’s context you’re working in. We don’t want someone accidentally running an analysis against the wrong client’s project and making assumptions based on irrelevant history.

What’s Next

We’re looking at other LLM models. The ruby-openai gem handles everything we need today, but things are moving fast.

Atlassian is building AI features into Jira and Confluence, and some of that will overlap with what we’ve built. But Atlassian’s tooling only knows about what’s inside Atlassian. It can’t see GitHub repos, PR histories, or how past implementations actually played out in code. Our tool bridges that gap — context across all three systems, shaped by how we work.

Our team is also experimenting more with LLM-assisted code generation. But this tool sits deliberately upstream of that. It’s about the collaboration layer. Making sure what we’re about to build is well-understood before anyone writes code. A perfectly generated pull request against a vague ticket is still a miss.

We’ll probably open source this eventually, but the codebase is full of references to real client projects in tests and config. Scrubbing that is on the list. Not the priority right now.

If you’re thinking about building something like this… just start. Ruby has what you need. The gems are there. It’s more approachable than it looks from the outside.

Most Developers Don’t Build New Things

February 18, 2026

The industry tends to celebrate beginnings.

New repositories. Clean architecture diagrams. The excitement of choosing tools before real constraints show up. First commits get attention because they feel like authorship.

Most developers, though, don’t spend their careers starting from scratch. They spend them stepping into systems that already exist, already have users, and already have decisions embedded in them.

You open the repository and there are hundreds of thousands of lines of code waiting for you. Patterns layered over time. Workarounds that solved real problems in earlier moments. Comments that hint at context you weren’t there for. You didn’t choose the framework version. You didn’t pick the database. You didn’t design the way billing logic is structured.

You inherited it.

That inheritance is the work.

The Rails Default Debate

This is why debates about what DHH changed in the default rails new template sometimes feel slightly disconnected from day-to-day reality.

Threads fill up with opinions about JavaScript bundling, database adapters, testing philosophy. The conversation centers on defaults and direction.

Meanwhile, most developers working with Ruby on Rails haven’t typed rails new in years, at least not on anything beyond a side project or experiment.

They’re inside a twelve year old system that has been upgraded across versions, shaped by product demands, team turnover, and business pressure. Their attention is on stability. On how to change something without breaking adjacent workflows. On whether a refactor will surface an edge case that only appears in production.

The generator shapes the beginning.

Maintenance shapes everything after that.

Software Has Second Acts

We often talk about software in terms of launch and decline, as if those are the only meaningful chapters. In practice, most valuable products spend the majority of their life in a long second act.

The second act is what happens after product-market fit, after early growth, after the architecture has already been bent once or twice. It’s when the team has turned over. When new engineers are maintaining code they didn’t write. When uptime and predictability matter more than novelty.

It’s the stage where you stop asking, “What should we build?” and start asking, “How do we keep this adaptable without destabilizing it?”

That’s the environment most engineers operate in.

Not at the starting line, but in the middle of a system that already has gravity.

Legacy Code Is Accumulated Context

A mature codebase is accumulated decision-making. Each file reflects what someone believed was reasonable under the constraints they faced at the time.

A couple of years ago you introduced an abstraction that felt thoughtful and clean. It simplified things. It made sense with what you knew then. Today it may feel slightly overbuilt or misaligned with how the system evolved. The code you once defended now requires explanation.

Legacy code is not a moral category.

It is accumulated context.

Most legacy systems were built by people responding to real constraints. That includes you. The second act is less about erasing that history and more about working within it deliberately.

The Rewrite Fantasy

The urge to rewrite usually comes from frustration. We could design this better. We understand the domain more clearly now. The stack feels dated.

The real cost of a rewrite isn’t just risk or budget. It’s attention.

The moment a team starts to think “we’re rewriting this anyway,” the current system quietly becomes second-class. Tests stop improving. Refactors are postponed. Documentation is deferred. People stop investing in clarity because the future version will “fix it.” I’ve written more about this dynamic in The Cost of Leaving a Software Rewrite on the Table, because it shows up more often than teams expect.

Months pass. Sometimes years.

When the rewrite finally ships, it inherits the same domain complexity, the same business pressures, and often a team that hasn’t practiced maintaining what they already had.

Rewrites don’t just reset code. They reset discipline.

That’s the cost teams underestimate.

Stewardship Is the Craft

Most careers unfold inside inherited systems. The skill is learning how to move those systems forward without pretending they began today.

We inherit. We understand. We stabilize. We extend. We improve what we can without destabilizing what already works.

This kind of work rarely attracts attention. It looks like incremental improvement and steady compounding over time.

But if most of your career is going to be spent in the second act, then the real question isn’t whether you get to start something new.

It’s whether what you inherit gets better because you were there.

The Handoff Test

February 16, 2026

I keep hearing the same story in conference hallway tracks. An engineer leans in and tells me their old boss or client still texts them every four to six months with a “quick question.” There’s usually a slight eye roll. They’ve moved on. They’re not being paid anymore. It feels like a boundary issue.

Then I ask what the question was.

It’s rarely random. It’s about the custom annual report they used to run before board meetings… whether it’s safe to delete a specific SPF value in DNS… why three staging hostnames are still sitting in the load balancer and the new DevOps person is afraid to touch them. Sometimes it’s a background job chain that behaves strangely under load. Sometimes it’s a migration that looked simple but has sharp edges nobody remembers.

These aren’t casual interruptions. They’re loose threads that never got tied off.

And that’s when the uncomfortable question shows up…

Did we fully hand off the system before we left?

What We Really Mean by Handoff

In long-lived systems, knowledge doesn’t live only in the codebase. It lives in decisions, tradeoffs, and context. The repository might show what the system does, but it rarely captures why it ended up there or what constraints shaped it.

While you’re still inside the system, none of this feels fragile. You answer quickly. You remember the nuance. You act as the compression layer between complexity and the rest of the team. It feels efficient and, if we’re honest, a little flattering. Being the one who “just knows” carries quiet status.

The risk appears later.

If only one person can safely interpret parts of the system, succession hasn’t happened. When that person leaves, the system doesn’t just lose a contributor. It loses confidence. The code still runs. The infrastructure still exists. But people hesitate before touching it. That hesitation is usually what triggers the text.

The House You Sold

Think about selling a house you’ve lived in for years. You know which breaker controls the backyard outlet. You know the upstairs shower takes a minute before it gets hot. You know where the sprinkler shutoff is hidden and which switch looks functional but does nothing.

You move out without leaving notes.

Three months later, the new owner texts asking about a breaker or a pipe noise. They aren’t trying to pull you back into ownership. They inherited something that works but feels uncertain because the context behind it is invisible.

I’ve bought a house from people who were excellent stewards. They labeled panels, left manuals, and wrote down the quirks. It made a meaningful difference. When I did text them once or twice, I had already flipped switches, checked breakers, and read the binder before reaching out. The message wasn’t laziness. It was caution.

You don’t owe infinite support after you sell a house, but you do owe a clean handoff while you still own it.

Software systems aren’t any different. If someone has to text you to feel safe deleting a DNS record or modifying a report query, the system may function technically… but it wasn’t fully transferred operationally.

The Part That’s Hard to Admit

There’s a pattern underneath those hallway conversations.

You can be the one they still text, or the one they never have to.

Being the one they still text can feel good. It confirms that you were critical. It reinforces the idea that you were the person who truly understood the system. There’s status in being indispensable, even if we don’t consciously chase it.

But indispensability is often just concentrated context, and concentrated context is fragile.

If your absence creates anxiety, you didn’t build resilience. If your absence creates confidence, you did. That difference has nothing to do with generosity after you leave. It has everything to do with whether you treated knowledge as something to hold or something to distribute.

There’s also no harm in changing your phone number. That’s definitely cheaper. It just doesn’t fix the underlying issue.

Run the Handoff Test

Open a blank document and write: “If I left tomorrow…”

Then list what would likely trigger a text four to six months later. Be specific. Would someone hesitate to run the annual board report without your guidance? Would they avoid touching a background job because they don’t fully understand its retry behavior? Would they pause before cleaning up infrastructure because they can’t see what depends on it? Are production accounts or key decisions still too closely tied to your memory?

This list isn’t an accusation. It’s a map of concentrated knowledge.

Each item marks a place where the system depends more on familiarity than on structure. It shows you where succession planning for code ownership hasn’t happened yet. Many engineers quietly wonder how things are going after they leave… did the migration succeed… did the architecture hold up… did the team refactor the risky part? When I ask whether they’ve reached out to find out, most haven’t.

That hesitation tells you something.

This Is Structural, Not Just Personal

Handoff isn’t just an individual virtue. It’s an organizational expectation. Engineering leaders should design for it, and teams should normalize it as part of professional completion. If your culture rewards heroics but never budgets time for transfer, you’ll recreate the same dependency patterns over and over. Engineers will leave, and the same operational uncertainty will resurface.

Succession planning for code ownership isn’t documentation theater. It’s risk management. It ensures systems can evolve without leaning on someone who no longer works there. Engineers need to externalize context. Organizations need to create space for that work.

Replaceability isn’t weakness. It’s maturity.

Redefining Done

Most teams define done as merged, tested, and deployed. That keeps velocity high. For long-lived systems, done should also include transferability. Someone else can run the board report without fear. Someone else can rotate keys confidently. Someone else can remove a staging hostname or refactor a risky job without Slack archaeology.

The real question isn’t whether you’ll ever get that text.

It’s whether you left the system in a state where they needed to send it.

I Didn’t Want AI to Be Good at This

February 12, 2026

Over the past few months, I’ve been begrudgingly coming around to something I didn’t expect to admit publicly: AI is getting legitimately useful at building software. Not magical. Not autonomous. Not “paste in requirements and press BUILD.” We’re far from that. But the tooling has crossed a threshold where it meaningfully lowers friction in ways I can no longer dismiss.

What surprised me most isn’t that it can generate code. It’s that the cost of context has dropped. Exploring an idea. Scaffolding a feature. Writing tests. Refactoring awkward logic. Documenting decisions. Iterating without feeling like you just signed a six-month commitment.

For someone who has spent over two decades maintaining real systems, that shift isn’t abstract.

It changes the calculus.

We’ve Been Here Before

When I first started building web applications in the early 2000s, nearly everything was custom. If your company wanted software that matched how you actually worked, you built it. That was the default.

Then SaaS matured.

Suddenly you didn’t have to maintain your own systems. You could buy something that handled seventy to eighty percent of what you needed. On paper, it was obvious. Lower upfront cost. Someone else handles upgrades. Predictable pricing. Fewer late nights worrying about infrastructure.

For many organizations, that was the right move.

But over time, something subtle happened. Companies began shaping themselves around generic tools instead of shaping tools around what made them different. The software worked… mostly. But it wasn’t built around their secret sauce. It was built for a “company like yours.”

That tradeoff made sense when custom felt heavy and risky.

It feels different now.

The Economics of Context Just Changed

I’ve built a consultancy around the idea that software is heavy. Context is fragile. Continuity takes intention. We’ve helped organizations maintain and evolve applications that other teams walked away from.

So when the cost of building drops, an uncomfortable question shows up: if rebuilding gets cheaper, what does that mean for everything we optimized around maintaining?

Recently, I replaced a small SaaS subscription we were paying about $80 per month for. It was a Slackbot that tracked birthdays and anniversaries for our team and clients. Quiet. Useful. Forgettable.

Rebuilt it in half a day.

It’s small. Focused. It does exactly one thing. The maintenance footprint is tiny. No sprawling feature roadmap. No abstraction layers for hypothetical users we’ll never have.

That constraint is the feature.

And it got me thinking about something bigger.

Build Around Your Secret Sauce

The most interesting software I’ve worked on over the years wasn’t generic. It wasn’t interchangeable. It was built around what made a particular organization different.

Their workflows.
Their judgment.
Their weird edge cases.
Their advantage.

…their secret sauce.

Recently, I’ve been building a personal project for my band. It’s a custom CRM built for a user base of… me and a few of my bandmates.

(Yes, The Mighty Missoula. Shameless, I know.)

The tool tracks relationships with other musicians in the Pacific Northwest, surfaces shows we might want to attend, and nudges us when we haven’t connected in a while. Under the hood, it ingests messy event data from APIs and scraped sites where nothing is structured the same way twice.

Band names aren’t consistent. Sometimes “The” matters. Sometimes it doesn’t. Sometimes a band in Bolivia shares a name with a group we’ve played with a few miles from home. Event data is messy because humans are messy.

That mess is not a bug.

It’s reality.

A few years ago, I would have assumed weeks of plumbing before getting anything usable. Instead, I’ve been iterating quickly on a Rails 8 app, hanging out inside some of my newer CLI workflows, refining heuristics and tightening assumptions as I go.

It isn’t perfect. There are bugs. There are mismatches. But the code being produced and then reviewed through my own workflows is better than a lot of code I’ve seen over the years.

Not the best I’ve ever seen.

But better than the code written by people who cared “just enough” to get it working and move on.

The cost of caring more is cheaper now.

And that changes what’s possible when you’re building around something that actually matters to you.

The Seventy Percent Trap

We’ve been paying for a CRM for years. Thousands per year. And if I’m honest, nobody loves using it. We scatter information across other tools anyway. Conversations live in email. Notes in docs. Context in Slack. The CRM becomes the place we update because we’re supposed to.

It covers maybe seventy percent of what we need.

But our relationships, our way of thinking about clients, our actual differentiators… they don’t live comfortably inside its abstractions.

Do we really need to keep spending that money?

Or have we been tolerating software that flattens our secret sauce because rebuilding felt too heavy?

This isn’t an argument for rewriting everything. Rebuilding generic systems just because you can is ego. Reclaiming the parts of your workflow that actually define you is strategy.

SaaS still makes sense for commoditized systems. Payroll. Accounting. Infrastructure. The boring parts that don’t differentiate you.

But when the workflow itself is your advantage… when it shapes how you think, decide, and relate… settling for seventy percent might be more expensive than it looks.

Maybe this moment isn’t about replacing everything.

Maybe it’s about selectively reclaiming the parts that make you different.

Humans in the Loop

January 20, 2026

The Oh My Zsh core team recently met up in person at GitHub Universe in San Francisco. Getting the maintainers into the same room matters more than most people realize. It strips away the abstractions and forces the conversation back to reality… what’s working, what’s breaking, what’s quietly draining energy, and what’s worth protecting long term.

We spent a good amount of time talking about AI. Not as a culture war or a prediction exercise, but as something already embedded in our day-to-day work. It shows up in our jobs, our creative projects, our personal workflows, and increasingly, in open source contributions to projects like Oh My Zsh. We don’t all share the same worldview on AI usage. That’s fine. Alignment on outcomes matters more than agreement on philosophy.

What we do share is stewardship of a project that millions of people rely on. And stewardship means occasionally slowing down long enough to name a problem instead of pretending it’ll sort itself out.

The pattern we couldn’t ignore

Over the past year, we’ve seen a noticeable increase in contributions that appear to lean heavily on AI tooling. That, by itself, is not a problem. People should use whatever tools help them learn and build. Forks are a playground. Experimentation is healthy.

What changed was the shape and cost of the work landing on maintainers’ desks.

We’re seeing larger pull requests from first-time contributors. Broader scope than necessary. Changes that touch parts of the codebase unrelated to the stated goal. PR descriptions and follow-up comments that feel polished but oddly disconnected from the actual implementation. In some cases, we genuinely can’t tell whether the contributor understands the changes they’re proposing.

That uncertainty matters because review is the bottleneck. Not code generation. Review. When a PR is sprawling, optimistic, and hard to reason about, it doesn’t matter how fast it was produced. It consumes volunteer time. AI doesn’t remove that constraint. In many cases, it amplifies it.

We needed clarity, not vibes

At some point, “we’ll handle it case by case” stops being fair to contributors and exhausting for maintainers. We needed something explicit we could point to. Not a moral stance. Not a ban. Just clarity around expectations and accountability.

The team agreed that I’d take the first pass at researching how other open source projects are approaching AI usage and propose a path forward. We already maintain a private GitHub project where we collaborate on behind-the-scenes decisions… security considerations, moderation questions, and process changes that don’t belong in public threads. That gave us space to pressure-test ideas before bringing anything to the community.

What I found was interesting. Many projects treat AI as a separate category entirely, with standalone policies layered on top of CONTRIBUTING.md. Others get extremely prescriptive, trying to enumerate when AI is allowed, how much is allowed, and under what circumstances.

I understand the impulse. But it also felt like a distraction.

CONTRIBUTING.md already exists to describe how humans contribute responsibly. Tools change. Responsibility doesn’t. Treating AI as something fundamentally different risks avoiding the harder conversation… ownership.

Where does AI start and end anyway?

If your editor suggests a line of code, is that AI? If autocomplete finishes a function, does it count? If a tool rewrites your PR description, is that different from asking a colleague to proofread it? If a Copilot agent updates a handful of links, is that categorically different from doing a global search-and-replace by hand?

Bright lines fall apart quickly. In reality, maintainers rely on judgment, pattern recognition, and experience. We always have. AI just makes it cheaper to submit work that looks plausible without being deeply understood. Volume goes up. Review cost goes up with it.

So we didn’t try to play detective. We’re not interested in policing tooling. We’re interested in accountability.

The approach we took

We chose to integrate guidance about AI-assisted contributions directly into our existing contribution guidelines, instead of treating it as a separate class of work. Concretely, that meant adding a “Working with AI tools” section to our CONTRIBUTING.md and updating our PR templates so that contributors can disclose how they used AI when it’s relevant to their work.

The standard is straightforward: if you submit code to core, you own it. You must understand every line. You must be able to explain what changed and why. You must test what you touched and keep the scope focused. And if something breaks, you should be able to debug it without regenerating your way out of the problem.

That’s not an “AI policy.” That’s basic stewardship.

If you’re curious, the pull request where we proposed and merged these changes is ohmyzsh/ohmyzsh#13520.

A quick story, because this matters

We’ve already had people reach out to say they’re done using Oh My Zsh because I once used GitHub Copilot to help update a few links in the codebase and referred to a class of low-effort contributions as “slop.”

Yes, I could have done those updates manually. I’ve been writing shell scripts for decades. The point wasn’t capability. The point was experimentation.

I opened a GitHub issue, let the Copilot agent propose a change, and then did what maintainers always do: we reviewed it. A human noticed a problem. We modified the PR. We merged it.

Exactly like we always have.

The only difference is that I didn’t need to fire up my editor to replace a handful of URLs. The human checkpoints didn’t disappear. Responsibility didn’t disappear. Review didn’t disappear.

That’s the entire point.

What hasn’t changed

Nothing gets merged without human review. Every approval still represents a maintainer making an informed decision on behalf of the community. AI doesn’t remove that responsibility. It increases it.

We reserve the right to ask contributors to explain their code. To show how they tested it. To narrow scope. To revise. To collaborate. And yes… to decline changes that don’t meet the bar.

Not because we’re precious about the code. Because volunteer time is the most finite resource in open source.

Oh My Zsh exists to make the terminal a little more delightful for humans, keystroke by keystroke. If your contribution moves us in that direction, we’re excited to review it. If it reads like output optimized for confidence instead of clarity, we’ll say no.

We’ll be friendly about it. But we’ll say no.

If you want to follow along, the project lives at github.com/ohmyzsh/ohmyzsh, and the public-facing home for documentation and installation is ohmyz.sh.

Tools evolve. Stewardship remains a human job.

Why So Serious?

December 01, 2025

The question Sheon Han poses — “Is Ruby a serious programming language?” — says a lot about what someone thinks programming is supposed to feel like. For some folks, if a tool feels good to use… that must mean it isn’t “serious.”

Ruby never agreed to that definition. If it did, I missed the memo.

If you arrived late, you missed a chapter when the language felt like a quiet rebellion. The community was small. The energy was playful. Ruby tapped you on the shoulder and asked what would happen if programming didn’t have to feel intimidating… what might be possible if clarity and joy were allowed.

The early skeptics were predictable. Java architects. Enterprise traditionalists. Anyone whose identity depended on programming being a stern activity. They said Ruby was unserious. And the community mostly shrugged… because we were busy building things.

Ruby made programming approachable. Not simplistic… approachable. That distinction matters. It helped beginners see the path forward. It helped small teams build momentum before anxiety caught up. It helped experienced developers rediscover a sense of lightness in their work.

This is why bootcamps embraced it. Why tiny startups found traction with it. Ruby wasn’t trying to win benchmarks… it was trying to keep you moving. When you’re creating something new, that matters far more than the theoretical purity of your type system.

And yes… critics love the Twitter example. But look closer. Ruby carried them further than most companies will ever reach. They outgrew their shoes. That’s not an indictment… that’s success.

In my world… running a software consultancy for a few decades… I’ve never seen a team fail because they chose Ruby. I have seen them fail because they chose complexity. Because they chose indecision. Because they chose “seriousness” over momentum. Ruby just needed to stay out of the way so people could focus on the real work.

And while folks keep debating its “credibility,” the receipts are plain. Shopify moves billions through Ruby. Doximity supports most physicians in the US with Ruby. GitHub held the world’s source code together for years using Ruby. This isn’t sentiment. This is proof.

What outsiders often miss is the culture. Ruby attracts people who care how code feels to write and read. Not because of nostalgia… but because most of our careers are spent living inside someone else’s decisions. Joy isn’t a luxury. It’s how sustainable software gets made.

I don’t know Sheon personally, but I’m guessing we have as much in common about music tastes as we do whether _why’s Poignant Guide to Ruby made any sense to them. And that’s fine. That’s actually the point.

And on that note… there’s one thing I genuinely agree with Sheon about. Ruby doesn’t seem to be for them. That’s not a failure of the language. That’s just taste. Some people like jazz. Some like metal. Some prefer the comfort of ceremony. Ruby has never tried to convert anyone. It simply resonates with the people it resonates with.

Since we’re noting taste, I’ll add something of my own. As an atheist, it feels oddly appropriate to mention my lack of religion here… mostly because it mirrors how strangely irrelevant it was for the article to bring up Matz’s religion at all. It didn’t add context. It didn’t deepen the argument. It was just… there. A detail reaching for meaning that wasn’t actually connected to the point.

Sheon mentions approaching Ruby without “the forgiving haze of sentimentality.” Fair enough. But the sentiment wasn’t nostalgia. It was gratitude. Gratitude for a language that centers the human being. Gratitude for a community that believes programming can be expressive. Gratitude for a tool that makes the work feel lighter without making it careless.

But here’s the part the discourse keeps missing… this isn’t just about the past.

The future of programming is fuzzy for everyone. Anyone claiming to have the master recipe for what’s coming is bullshitting you. The future won’t be owned by one paradigm or one language or one ideology. It’ll be a blend… a messy collage of ideas, old and new, borrowed and rediscovered.

And in that future… Ruby’s values aren’t relics. They’re an anchor. Readability will matter more as AI writes more code. Maintainability will matter more as products live longer. Joy will matter more as burnout becomes the default state.

And if you need a reminder that seriousness isn’t the reliable signal people wish it were…

The serious candidate doesn’t always get elected.
The serious musician doesn’t always get signed.
The serious artist doesn’t always sell.
The serious man doesn’t always find a serious relationship.
The serious startup doesn’t always find product-market fit.
The serious engineer doesn’t always write the code that lasts.
The serious rewrite doesn’t always solve the real problem.

Culture doesn’t reliably reward the serious. Neither does business.
It rewards the resonant. The clear. The human. The work that connects.

Ruby has always leaned toward that side of the craft. Toward the part of programming that remembers people are involved. Toward the part that says maybe the code should serve the team… not the other way around.

And honestly… I think unserious people will play an important role in the future too. The curious ones. The playful ones. The ones who keep the door propped open instead of guarding it. They’ll keep the industry honest. They’ll keep it human.

So is Ruby “serious”? I still think that’s the wrong question.

A better one is… does Ruby still have something meaningful to contribute to the next chapter of software?

It does.
And if that makes it “unserious”… maybe that’s exactly why it belongs in the conversation.

Who Keeps the Lights On?

October 20, 2025

Every so often, someone in the Ruby community will ask,
“So… what does Planet Argon actually do these days?”

Fair question.

Architecture for Contraction

October 13, 2025

We’ve spent the last decade optimizing for scale. How do we handle more traffic? More users? More engineers? The assumptions were baked in: Growth is coming. Prepare accordingly.

So we split things apart. We mapped services to teams. We built for the org chart we were about to have.

Then 2023 happened. And 2024. And now 2025.

Turns out, the future isn’t always bigger.

Organizations, Like Code, Deserve Refactoring

October 09, 2025

I’ve been thinking about what happens when open source organizations hit their breaking point… when funding dries up, relationships fracture, and everyone’s scrambling to make sense of what went wrong.

It turns out, the patterns look familiar.

Talking Shop with Ruby & Rails Maintainers at Rails World 2025

September 22, 2025

As the opening keynote on Day 2 of Rails World 2025, I had the chance to host a panel with three people who’ve been shaping the direction of both Ruby and Rails from deep within the internals.

We covered a lot in an hour:

  • What they’ve been working on behind the scenes
  • Which areas of Ruby and Rails could use more community support
  • The evolving release process for the language
  • Why Hiroshi’s focused on improving the experience for developers on Windows
  • How security fixes are coordinated across multiple versions
  • Performance work related to YJIT and ZJIT
  • JSON parsing performance and compatibility
  • What keeps them motivated to continue maintaining the ecosystem

There’s even a moment where Aaron and Jean get into a friendly disagreement about performance and priorities. If you enjoy technical nuance and sharp perspectives, you’ll appreciate that exchange.

And yes… I asked Aaron about his favorite Regular Expression. His response did not disappoint.

It was a fun, thoughtful, and occasionally surprising conversation — and a reminder that Ruby and Rails continue to evolve in the hands of people who care deeply about their future.

If you weren’t in Amsterdam or want to revisit it, the full panel is now available:

Also worth pairing with this interview with Jean on the On Rails podcast, where we dig into IO-bound workloads, misconceptions, and what it’s like maintaining Rails at scale.

A solid pairing if you’re curious where the ecosystem is headed next.

7 Stages of Software Tech Stack Adoption (You're Probably in Stage 5)

September 20, 2025

I’ve been part of the Ruby on Rails ecosystem for over two decades. I’ve watched teams adopt Rails with wild enthusiasm… evolve their systems… struggle through growing pains… and eventually find themselves in an uncomfortable position; debating whether to abandon the tools that once brought them so much joy.

I don’t think that’s necessary… or even wise.

But I do think it’s understandable.

After working with and talking to hundreds of teams… many of them using Rails, Laravel, Ember.js, or even React… I’ve noticed a pattern. A lifecycle of sorts. The way teams internally adopt and evolve their relationship with a technical stack. I’ve seen it reflected in our consulting clients at Planet Argon, the guests on my podcasts (Maintainable and On Rails), and peers who’ve been part of the various peak waves of these ecosystems.

And while every team is different, the stages of internal tech stack adoption often follow a similar spiral.

This post is an attempt to describe that spiral.

Not as a fully baked theory; but as a conversation starter. A mirror. And maybe a compass.

Because whether your team is building your core product with Rails, or you’re a non-software company maintaining internal tools on Laravel, understanding where you are in this lifecycle might help you understand what comes next.

🌀 The Spiral of Internal Tech Stack Adoption

Before we go deeper, here’s a quick overview of the seven stages I’ve observed. These aren’t fixed; your team might skip around or revisit them multiple times. But in general, this is the pattern I’ve seen:

  1. Adopting: A small group of enthusiastic engineers selects and introduces the stack while building a prototype or MVP.

  2. Expanding: The stack proves useful… so it spreads. More features, more developers, more tooling.

  3. Normalizing: The stack becomes the default. Teams standardize around it. Hiring pipelines and best practices emerge.

  4. Fragmenting: Pain points surface. Teams bolt on new tools or sidestep old ones. Internal consistency erodes.

  5. Drifting: The stack feels sluggish. Upgrades are deferred. The excitement is gone.

  6. Debating: Conversations shift to rewrites or migrations. Confidence is shaken.

  7. Recommitting: Teams pause, reflect, and decide to reinvest in the stack… and their shared future with it.

Again, these stages aren’t a ladder; they’re a spiral.

And the question your team has to ask is: Are we spiraling upward… or downward?

Because while The Downward Spiral is a great album, it doesn’t have to be your trajectory.


♻️ It’s a Cycle, Not a Ladder

It might be tempting to look at this lifecycle and think, “Our goal is to get to the Recommitting stage and stay there forever.”

But that’s not how this works.

Every team will move through these stages multiple times over the lifespan of their product. Shifting priorities, team turnover, organizational pivots… they all create new dynamics that ripple across your tech stack.

Recommitting isn’t a finish line. It’s an inflection point. One that clears the fog, sharpens priorities, and invites your team to move forward with intent.

Just don’t mistake clarity for comfort… the spiral keeps turning.

When Your Cache Has a Bigger Carbon Footprint Than Your Users

August 28, 2025

Caching in Rails is like duct tape. Sometimes it saves the day. Sometimes it just makes a sticky mess you’ll regret later.

Nowhere is this more true than in data-heavy apps… like a custom CRM analytics tool that glues together a few systems. You know the type: dashboards full of metrics, funnel charts, KPIs, and reports that customers swear they need “real-time.”

And that’s where the caching debates begin.

All valid. All expensive in their own way.


The Dashboard Problem

Your SaaS app has grown to 2,000 customers, each with multiple users.

For the overwhelming majority, dashboards load just fine. Nobody complains.

But then your whales log in. The Fortune 500 accounts your sales reps obsess over. Their dashboards pull data from half a dozen APIs, crunch millions of rows, and stitch together a wall of charts. It’s not just a page. It’s practically a data warehouse in disguise.

These dashboards are slow. Painfully slow. And you hear about it… through support tickets, account managers, and sometimes even a terse email from someone with “Chief” in their title.

So your engineering team digs in. You fire up AppSignal, Datadog, or Sentry and zero in on the slowest dashboard requests. You look at traces, database query timings, and request logs. You chart out the p95 and p99 response times to understand how bad it gets for the biggest customers.

From there, you start experimenting:

  • Are we missing database indexes?
  • Are there N+1 queries lurking in the background?
  • Can we preload or memoize expensive calls?
  • Could a few data points be cached individually, without touching the rest?

You squeeze what you can out of the obvious optimizations. Maybe things improve… but not enough.

So the conversation shifts.


Negotiating “Real-Time” Expectations

When your product team actually sits down with those whale customers, the conversation shifts.

They start by saying: we need real-time data. But after a little probing, everyone realizes “real-time” doesn’t always mean right now this second.

Maybe what they really need is a reliable snapshot of activity as of the end of the previous day. That’s good enough for the kinds of decisions their leadership is making in the morning meeting. Nobody is making million dollar calls based on a lead that just landed five minutes ago.

And your team can remind them: there are other real-time metrics in the system. For example:

  • New leads created today.
  • Active users in the past hour.
  • Pipeline changes as they happen.

So now you’ve reframed the dashboard story. Instead of one giant “real-time” data warehouse, you split it into two categories:

  1. Daily rollups. Crunch the heavy stuff once a night. End-of-day data is sufficient, and it’s reliable.
  2. Today’s activity. Show a few real-time metrics that are fast to calculate. Give customers the dopamine hit of “live” data without boiling the ocean.

That’s usually enough to recalibrate expectations. Customers feel like they’re still getting fresh data, while your app no longer sets itself on fire every time a big account logs in.


The 1 AM Job (and Its Hidden Cost)

Armed with that agreement, your team ships the “reasonable” solution most of us have built at least once:

  • Every night at 1 AM, loop through all 2,000 customers.
  • Generate every dashboard report.
  • Cache the results somewhere “safe.”

The next morning, dashboards are instant. Support tickets quiet down. Account reps breathe easier. Everyone celebrates.

But here’s the kicker: the whales were the problem. The rest of your customers never needed this optimization in the first place. Their dashboards were already fine.

So now you’ve turned one customer’s problem into everyone’s nightly job. And under the hood, you’ve cranked through hours of CPU, memory, and database load… just to prepare data for customers who won’t even log in later today.

Worse, you’ve stuffed your background job queue with 2,000 little tasks every night. Which means your queue system—whether it’s Sidekiq, Solid Queue, or GoodJob—is spending precious time juggling busy work instead of focusing on the jobs that actually matter. And when those queues get stuck, or a worker crashes, you’re left wading through a mountain of pending jobs just to catch up.

This is what I call Cache Pollution: the buildup of unnecessary caching work that bloats your systems, slows down your queues, and leaves your caching strategy with a far bigger carbon footprint than it needs to. Another benefit of tackling Cache Pollution early is future flexibility — you might eventually solve the computation challenges in a different way, and you won’t be anchored to big, scary scheduled tasks that churn through all of your customers every night.


Frequency Matters More Than We Admit

Do these reports need to run every single day… or only on weekdays when your customers actually log in?

If your traffic drops on Saturdays and Sundays, consider a lighter schedule. Or even none at all. Because “slow” isn’t so slow when almost nobody is around. A BigCorp admin poking the dashboard on Sunday morning might be fine with an on-demand render… especially if the weekday experience is snappy.

And here’s another angle: if your scheduled job runs at 1 AM, that means when a BigCorp user logs in later that same day, they’re still looking at data that’s less than 24 hours old. For most business use cases, that’s plenty. You don’t need to rerun heavy jobs every few hours just because you can.

This is all about right-sizing frequency:

  • Weekday cadence: nightly rollups for whales; maybe twice a day if usage demands it.
  • Weekend cadence: pause, or run a narrower subset.
  • Holiday mode: same idea… different switch.

If your dashboard code doesn’t rely on the cache to render, you keep the option to not precompute. That flexibility is where the savings live. As the business grows, the cost of overly eager schedules grows with it… so design for dials, not hard-coded habits.


Other Things to Consider with Recurring Tasks

One more question to ask about recurring scheduled jobs: do you really need to iterate through all users or all organizations?

In many cases, the answer is no. Most customers don’t trigger the conditions that require a heavy recompute. Yet teams often design jobs to blast across every top-level object in the database, every night, without discrimination.

Instead, look for signals that help you scope the work down:

  • Which organizations actually logged in today?
  • Which customers have datasets large enough to need optimization?
  • Which accounts crossed a threshold since the last run?

By narrowing the set of work each job touches, you cut down on wasted compute, reduce queue congestion, and avoid the kind of Cache Pollution that grows silently as your business scales.


Clever Tricks We’ve Seen

The trick isn’t just caching everything for everyone. It’s knowing who to cache for and when.

  • Selective pre-caching. Only build nightly rollups for your whales. Maybe 50 out of 2,000 customers. Everyone else can render on demand, which was fine all along.
  • Cache on login. If you know a user from BigCorp is signing in, enqueue a background job to warm up their dashboard before they hit it. You can even anticipate who they are based on a cookie value when they land on the Sign In page — before they’ve had a chance to trigger 1Password or type in their credentials, the system is already working behind the scenes to prep their dashboard. Even a 10–20 second head start can smooth the experience.
  • Cache on demand… with a fallback. If cached data is missing, build fresh on the spot. Outages happen when teams assume the cache will always be there.

And here’s a bonus: if your job fails at 1 AM, re-running it for 50 customers is a whole lot faster than crawling through 2,000.

Extra credit: scope your scheduled tasks so that when a customer crosses a certain threshold—say, user count, dataset size, or request volume—they automatically join the “whale” group. No manual babysitting required.


Other Patterns

Not all caching challenges look like dashboards.

Case Study: The Press Release Problem

We once managed a public-facing site for a massive brand. Whenever they dropped a big press release, it spread fast across social media. Traffic would spike within minutes.

Of course, that’s when the CEO would notice a typo. Or the PR team would need to update a paragraph to reflect a question from the media. Despite their editorial workflows, changes still had to happen after publication.

So we had to get clever. We couldn’t cache those fresh pages for hours. Instead, we used a sliding window approach:

  • First 5 minutes: cache for 30 seconds at a time.
  • After 5 minutes: increase to 1 minute.
  • After 10 minutes: increase to 2 minutes.
  • After 20 minutes: increase to 5 minutes.
  • After 6 hours: safe to cache for an hour.
  • After a day: cache for a few hours at a time.

This let us protect our Rails servers from massive traffic spikes when a new article was spreading fast, while still giving editors the ability to push corrections through quickly. Older articles, once stable, could safely sit in Akamai’s cache for hours.

At the time, Akamai could take up to seven minutes to guarantee a purge across their global network. Not ideal. We had to plan for that lag. Today, most CDNs can purge instantly, but back then… it was a constraint we had to design around.


A Final Challenge

A lot of what we’ve talked about here comes down to avoiding Cache Pollution.

That’s the unnecessary churn your system takes on when it generates data nobody asked for. It’s the background job queue bloated with thousands of tasks that fight with more important work. It’s the 1 AM process chewing through CPU just to prep dashboards for customers who never log in.

Cache Pollution looks like optimization on the surface… but underneath it’s just waste.

So before your team spins up the next caching project, stop and ask:

  • Who really needs this cache?
  • How fresh does it need to be?
  • What happens if the cache isn’t there?
  • Do we need to run it this often, or for this many customers?
  • Could we scale down the busy work instead of scaling it up?

Because the goal isn’t just faster dashboards. The goal is to keep your caching strategy lean, resilient, and focused — instead of leaving behind a trail of Cache Pollution that grows with every new customer you add.