Navigation

Design Principles

  1. Plain text everywhere. Bash, Python, SQL, Markdown, JSON. If you can read it in a text editor, you can understand what's happening. No magic formats, no vendor lock-in.
  2. Let Claude do the tedious parts. You say what you want to track. Claude writes the scripts, runs the queries, spots the patterns. You stay focused on questions worth asking.
  3. No infrastructure. No Docker, no servers, no queues, no accounts to create. It runs on your laptop. When you're done, delete the folder. That's it.
  4. Reproducible from scratch. Blow away the database, re-run the scripts, get the same result. Every transformation is a script you can read.
  5. Trust but verify. Every file ingestion is logged. Every purge is tracked. When something looks wrong, you can trace it back to the exact source file that caused it.

Mental Model

Rack pulls scattered data into one local database you can actually query.

Here's the problem: your data is everywhere. GitHub has your PRs. Stripe has your revenue. Some competitor's pricing lives behind a login. A spreadsheet someone emailed you. None of it talks to each other.

Rack gives you one place to put it all — a DuckDB file on your machine — and a simple way to keep it fresh. The trick is that data collection, while tedious, is pretty automatable. So Claude writes the fetch scripts. You just say what you care about.

The loop looks like this:

Describe
what to track
Claude
writes scripts
Refresh
runs them
Dashboards
appear

Under the hood, three phases run in order:

  1. Fetch — Shell scripts call APIs and dump JSON files
  2. Ingest — Python scripts load those JSON files into DuckDB tables
  3. Derive — Python scripts query the database and spit out Markdown dashboards

All plain text. Open any file and you can see exactly what it does.

Directory Structure

A rack is just a folder with some conventions. If it has a playbook.md, it's a rack.

my-rack/ ├── playbook.md # identifies this as a rack ├── .claude/CLAUDE.md # Claude reads this for context ├── data/ │ ├── inbox/ # drop files here for manual ingest │ ├── sources/ │ │ ├── api/ # fetch output (watched by rackd) │ │ └── browser/ # scrape output (watched by rackd) │ └── my-rack.duckdb # the database ├── scripts/ │ ├── fetch/ # pull data (*.sh) │ ├── ingest/ # load to DuckDB (*.py) │ ├── derive/ # generate dashboards (*.py) │ ├── ops/ # helpers (signal, derive, snapshot) │ └── refresh.sh # orchestrates everything ├── dashboards/ # markdown output └── .rack/ ├── signals.md # alert log ├── activity.log # run history └── snapshots/ └── temporal/ # for rack query --at

Naming Conventions

Item Convention Example
Rack directory Prefixed with @ @company-metrics
Database file {dirname}.duckdb company-metrics.duckdb
Fetch scripts {source}.sh github-prs.sh
Ingest scripts Match fetch name github-prs.py
Derive scripts Dashboard name pr-velocity.py
JSON output Script basename github-prs.json
DuckDB tables Underscores, not hyphens github_prs

Output Organization

As your rack grows, organize derived outputs beyond flat dashboards:

derived/ ├── snapshots/ # Dated dashboard archives ├── deep-dives/ # Detailed segment analyses ├── cohorts/ # User funnel analyses ├── competitive/ # Competitor tracking └── reports/ # Ad-hoc analyses

Data Flow

Data gets in through two paths:

API Sources (automated)

Most data comes from CLIs and REST APIs — gh, stripe, aws, plain curl. Shell scripts fetch JSON into data/sources/api/. These can run on a cron. No browser needed.

Browser Sources (interactive)

Some data hides behind logins — competitor pricing pages, internal dashboards, that one report only accessible through a web UI. For these, Claude uses Playwright to scrape the page while you watch. Output lands in data/sources/browser/. Screenshots get saved too, so you have evidence of what was seen.

Why Sources Are Immutable

Once a source file is written, we don't overwrite it. Need to re-pull the same day? Append a timestamp. This sounds fussy, but it means you can always trace a weird number back to the exact file that produced it. Silent overwrites are how audit trails die.

How Ingestion Works

Both paths feed into DuckDB the same way:

  • Check schema and required fields
  • Skip files we've already ingested (checksum match)
  • For individual records, only update if something changed
  • Stamp everything with _ingested_at so you can query freshness
  • Log every attempt — row counts, status, what happened

The Pipeline

Run rack refresh and it executes scripts/refresh.sh, which just runs the three phases in order: fetch, ingest, derive. Nothing fancy.

Fetch Scripts

Live in scripts/fetch/*.sh. Output JSON to data/sources/.

Prefer CLIs when they exist (gh, stripe, aws) — they handle auth and pagination. Fall back to curl. Use Playwright only when you have to log in through a browser.

Complete fetch example
#!/bin/bash
# scripts/fetch/github-prs.sh
# Fetches open PRs from a GitHub repo

gh pr list --repo owner/repo \
  --state all \
  --limit 100 \
  --json number,title,author,createdAt,mergedAt,state \
  > data/sources/api/github-prs.json

Ingest Scripts

Live in scripts/ingest/*.py. Read JSON with DuckDB's read_json_auto(), write to a table with CREATE OR REPLACE TABLE.

Naming follows a pattern: github-prs.py reads github-prs.json and creates table github_prs. (DuckDB tables use underscores, not hyphens. One of those things.)

Complete ingest example
#!/usr/bin/env python3
# scripts/ingest/github-prs.py
# Loads GitHub PRs into DuckDB

import duckdb
from pathlib import Path

DB = Path(__file__).parent.parent.parent / "data" / "my-rack.duckdb"
SOURCE = Path(__file__).parent.parent.parent / "data" / "sources" / "api" / "github-prs.json"

conn = duckdb.connect(str(DB))
conn.execute("""
    CREATE OR REPLACE TABLE pull_requests AS
    SELECT * FROM read_json_auto(?)
""", [str(SOURCE)])

print(f"Loaded {conn.execute('SELECT COUNT(*) FROM pull_requests').fetchone()[0]} PRs")

Derive Scripts

Live in scripts/derive/*.py. Query the database, write Markdown to dashboards/.

Some helpers exist in scripts/ops/derive_helper.py for common patterns. See Dashboard Helpers for the full API.

Helper Purpose Output
bar(val, max, width) Inline bar chart ████▎
sparkline(values) Trend visualization ▁▂▅█▆▁
trend_arrow(curr, prev) Delta indicator ▲ +23%
signal(title, details) Emit alert Appends to .rack/signals.md
Complete derive example
#!/usr/bin/env python3
# scripts/derive/pr-velocity.py
# Generates weekly PR velocity dashboard

import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "ops"))

from derive_helper import bar, sparkline, trend_arrow
import duckdb
from datetime import datetime

DB = Path(__file__).parent.parent.parent / "data" / "my-rack.duckdb"
OUT = Path(__file__).parent.parent.parent / "dashboards" / "pr-velocity.md"

conn = duckdb.connect(str(DB), read_only=True)

weekly = conn.execute("""
    SELECT DATE_TRUNC('week', createdAt)::DATE as week,
           COUNT(*) as opened,
           COUNT(*) FILTER (WHERE mergedAt IS NOT NULL) as merged
    FROM pull_requests
    WHERE createdAt >= CURRENT_DATE - INTERVAL '8 weeks'
    GROUP BY 1 ORDER BY 1
""").fetchall()

# Build dashboard
lines = [
    "---",
    "title: PR Velocity",
    f"generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}",
    "---",
    "",
    "[[← index]]",
    "",
    "# PR Velocity",
    "",
]

if weekly:
    opened_vals = [r[1] for r in weekly]
    merged_vals = [r[2] for r in weekly]
    max_opened = max(opened_vals)

    lines.extend([
        f"**{sum(opened_vals)}** PRs opened, **{sum(merged_vals)}** merged over 8 weeks.",
        "",
        f"Trend: `{sparkline(opened_vals)}`",
        "",
        "| Week | Opened | | Merged |",
        "|------|-------:|---|-------:|",
    ])

    for week, opened, merged in weekly:
        lines.append(f"| {week} | {opened} | `{bar(opened, max_opened, 12)}` | {merged} |")

lines.extend([
    "",
    "---",
    "*Auto-generated. Interpretations are hypotheses.*",
])

OUT.write_text("\n".join(lines) + "\n")
print(f"Generated {OUT.name}")

Dashboard Helpers

Professional-grade dashboard formatting lives in scripts/ops/derive_helper.py. These helpers make your dashboards look polished and consistent.

Color Palette

A consistent color scheme for dashboard elements:

Color Hex Purpose
Cyan #00d4ff Primary metrics, headers
Green #7ee787 Positive deltas, growth
Red #f85149 Negative deltas, alerts
Yellow #d29922 Warnings, neutral changes
Muted #8b949e Secondary text, labels

Visualization Functions

Function Purpose Example Output
c(text, color) Wrap text in color span <span style="color:#7ee787">+12%</span>
muted(text) Dim secondary text <span style="color:#8b949e">vs last week</span>
sparkline(values) Trend visualization ▁▂▅█▆▁
bar(val, max, width) Inline bar chart ████▎
colored_spark(values) Sparkline with color Green/red based on trend direction

Delta Formatting

Two ways to show change, depending on your metric type:

Function Use When Example
delta_pct(curr, prev) Showing % change of a value +23% (green) or -15% (red)
delta_pp(curr, prev) Comparing percentages +2.1pp (e.g., conversion 5.1% → 7.2%)
Use delta_pct for "revenue grew 23%". Use delta_pp for "conversion rate improved from 5% to 7%" (that's +2 percentage points, not +40%).

Structure Helpers

Functions that add consistent structure to dashboards:

Function Purpose
frontmatter(title, generated) YAML header with title and timestamp
tldr(bullet_points) Executive summary at top of dashboard
data_source_callout(sources) Shows where data comes from
caveat_callout(text) Adds warning box for data limitations
Example: Using structure helpers
from derive_helper import frontmatter, tldr, data_source_callout

lines = []
lines.extend(frontmatter("Weekly Growth"))
lines.extend(tldr([
    "Signups up 12% week-over-week",
    "Conversion holding steady at 4.2%",
    "Revenue on track for monthly goal",
]))
lines.extend(data_source_callout(["Stripe", "Segment"]))

Playbook System

The playbook.md file isn't just a marker — it's instructions for Claude. When you open the rack in Claude Code and ask a question, Claude reads the playbook first to understand what this rack is about.

What to Put in It

  • Focus: What you're tracking and why you care
  • Sources: Where the data comes from — which APIs, what browser sessions
  • Questions: What you actually want to know
  • Metrics: The numbers that matter
  • Browser targets: URLs and login requirements for anything that needs scraping

Why This Works

Claude doesn't know your domain. It doesn't know that "velocity" means PRs merged per week, or that your competitor's pricing is on a particular URL. The playbook tells it. Then when you ask "why did velocity drop?" it has context to write the right query and make sense of the answer.

Example playbook.md
# Company Metrics Rack

## Focus
Track engineering velocity and shipping cadence.

## Sources
- GitHub PRs (via gh CLI)
- Linear issues (via Linear API)
- Deploy events (browser scrape from internal dashboard)

## Questions
- How many PRs ship per week?
- What's our mean time to merge?
- Which areas of the codebase are most active?

## Metrics
- PRs opened/merged per week
- Cycle time (PR open to merge)
- Review turnaround time

Commands

The main things you'll run:

Command Purpose Key flags
rack init [name] Create new rack --template, --focus
rack refresh Run full pipeline
rack query "SQL" Direct SQL access --at DATE for temporal
rack check Show inbox, DB size, warnings
rack lineage Dependency visualization --format json
rack snapshot Save current DB state
rack snapshots List available snapshots
rack diff Compare pre/post refresh
rack signals Show recent alerts -n 20, --clear
rack mcp serve Start MCP server --allow-refresh
rack mcp install Configure AI tools Claude, Cursor, VS Code

Lineage System

Run rack lineage to see how data flows through your scripts. It parses the code and figures out what reads what.

Detection Patterns

Phase Input Detection Output Detection
Fetch N/A > file, -o, --output
Ingest read_json_auto('file'), SOURCE= CREATE TABLE, INSERT INTO
Derive FROM table, JOIN table OUT=, write_text()

Normalization Rules

  • Fetch outputs normalized to data/sources/api/ or data/sources/browser/
  • Ingest outputs suffixed as tablename (table)
  • Derive outputs default to dashboards/{scriptname}.md if not detected
  • Reserved SQL keywords excluded from table detection

Output Formats

Default text output (grouped by phase):

FETCH
  github-prs.sh
    → data/sources/api/github-prs.json

INGEST
  github-prs.py
    ← data/sources/api/github-prs.json
    → pull_requests (table)

DERIVE
  pr-velocity.py
    ← pull_requests (table)
    → dashboards/pr-velocity.md

JSON output (--format json) returns full graph structure for programmatic use.

Temporal Queries

Sometimes you need to know what the data looked like last Tuesday. Rack takes snapshots, so you can:

# Save current state
rack snapshot

# Query a specific date
rack query --at 2024-01-10 "SELECT * FROM users"

# See what changed since last refresh
rack diff

Snapshot Types

Type Location Purpose
Temporal .rack/snapshots/temporal/YYYY-MM-DD-HHMMSS.duckdb Historical queries via --at
Pre/Post refresh .rack/snapshots/pre.json, post.json Diff captures state before/after each refresh

How Diff Works

Before each refresh, we snapshot row counts. After refresh, snapshot again. rack diff compares them. Simple, but useful — you can see at a glance that you added 47 new PRs and updated 3 users.

Signal System

The problem with dashboards is you have to look at them. Signals flip that around — when something interesting happens, the data tells you.

Your derive scripts can emit signals when thresholds cross: churn above 5%, a big revenue spike, whatever matters. Run rack signals and see what's happened since you last looked. It's like a changelog for your metrics.

from signal_helper import signal

if churn_rate > 0.05:
    signal("churn crossed 5%", details=f"Current: {churn_rate:.1%}")

Signal Format

Signals are stored in Markdown with timestamps:

## 2024-01-15 10:23
churn crossed 5%
Current: 5.2%

## 2024-01-14 09:15
revenue spike detected
$8,400 from 3 annual plans

Commands

Command Purpose
rack signals Show 10 most recent
rack signals -n 20 Show 20 most recent
rack signals --clear Reset signal log

Lifecycle Operations

How Data Accumulates

On a typical day:

  1. API fetches run — maybe on a cron, maybe you triggered rack refresh
  2. Browser scrapes run when Claude's active and the data is stale
  3. New files in sources/ get ingested
  4. Same file twice? Skipped (checksum match)
  5. Same record twice? Only updated if it changed
  6. Metrics recomputed only for affected dates

Everything Must Be Re-runnable

Scripts should be idempotent. Use CREATE OR REPLACE TABLE, not CREATE TABLE. That way, if something fails mid-run, you just run it again. No cleanup required.

When Bad Data Gets In

It happens. When it does:

  • Purge by source file: Delete everything that came from a specific file
  • Purge by date range: Wipe records in a time window
  • Always see row counts before confirming
  • Always take a snapshot first
  • Every purge gets logged

Error Handling

Things will fail. The goal is to fail gracefully — don't block the whole pipeline, don't corrupt good data, make it easy to pick up where you left off.

When Scrapes Fail

  • Screenshot whatever state you reached — helpful for debugging
  • Save partial data with an "error" field so it's obvious something went wrong
  • Log it, but don't crash the session
  • Since everything's idempotent, you can just try again later

When Ingestion Fails

  • Log status='failed' so it's visible
  • Don't partially commit — the whole file either makes it or it doesn't
  • Surface the error for a human to look at

A failed fetch shouldn't stop other fetches. A failed ingest shouldn't touch existing tables. Partial success is fine — better to have 90% of the data than 0%.

MCP Server

MCP is how AI tools talk to external data sources. Rack includes an MCP server so Claude Code, Cursor, or VS Code can query your rack directly.

Quick Setup

# Auto-configure Claude Code, Cursor, or VS Code
rack mcp install

# Manual start (for debugging)
rack mcp serve /path/to/@my-rack

Available Tools

Tool Input Output Safe
query SQL (SELECT only) Results with auto-LIMIT Yes
tables None Schema via information_schema Yes
lineage None Dependency graph JSON Yes
refresh None Pipeline output Requires --allow-refresh

Resources

URI Content
schema://tables Full schema JSON with columns/types
docs://playbook Raw playbook.md content

Safety Limits

Limit Default Flag
Max rows returned 1,000 --max-rows
Max output size 100 KB --max-output-kb
Non-SELECT queries Blocked
Forgot to add LIMIT to your query? The server adds one for you. Output too big? It truncates. Hard to shoot yourself in the foot.

Advanced Patterns

Production racks develop patterns beyond basic fetch/ingest/derive. These are battle-tested approaches from racks with 12+ dashboards and months of history.

Dashboard Archive Pattern

Before each refresh, archive the current dashboard state. This lets you track how your metrics reporting evolves, not just the metrics themselves.

#!/bin/bash
# Archive dashboards before refresh
DATE=$(date +%Y-%m-%d)
ARCHIVE=derived/snapshots/$DATE

mkdir -p $ARCHIVE
cp dashboards/*.md $ARCHIVE/ 2>/dev/null || true

# Then run refresh
./scripts/refresh.sh

Over time, you build a changelog of your dashboards — useful for understanding how your analysis evolved.

Competitive Intelligence

Track competitors with a structured registry. A competitors.yaml file keeps metadata organized:

# data/competitors.yaml
competitors:
  - name: Acme Corp
    website: https://acme.com
    signals:
      - type: pricing
        url: https://acme.com/pricing
        frequency: weekly
      - type: hiring
        url: https://acme.com/careers
        frequency: weekly
      - type: news
        query: "Acme Corp funding OR launch"
        frequency: daily

Signal types you might track:

Signal Type Source Typical Frequency
Pricing changes Pricing page scrape Weekly
Hiring velocity Careers page, LinkedIn Weekly
App store metrics App Store / Play Store Daily
News mentions News API, Google Alerts Daily
Product changes Changelog, Twitter Daily

Emit a signal when data goes stale:

if days_since_last_scrape > 14:
    signal(f"Stale: {competitor} pricing data is {days} days old")

Qualitative Data

Numbers alone miss the story. Blend qualitative insights with quantitative data:

  • Interview notes — Store as structured markdown with tags
  • Customer feedback — Sentiment alongside NPS scores
  • Support tickets — Theme categorization with volume data

Example: correlating interview themes with funnel data.

# data/sources/manual/interviews/2024-01-15-user-a.md
---
date: 2024-01-15
user_segment: power_user
themes: [onboarding, pricing, mobile]
---

## Key Quotes
- "I almost churned because the mobile app was so bad"
- "Pricing makes sense once you understand it"

## Action Items
- [ ] Investigate mobile drop-off in funnel

Your derive script can then pull themes alongside conversion data to spot patterns humans would miss.

Quirks & Edge Cases

Some things that might trip you up, collected here so you don't have to discover them the hard way:

Behavior Details
Rack detection Must have playbook.md and scripts/refresh.sh
Directory naming Prefix with @ (e.g., @metrics). Spaces become hyphens.
Database naming {directory-name}.duckdb (@ prefix stripped)
Script discovery Any *.sh in fetch/, *.py in ingest/derive/
Execution order Alphabetical within each phase
Idempotency Scripts must be re-runnable (use CREATE OR REPLACE)
Duplicate prevention Executor blocks concurrent runs of same rack
Execution timeout 5 minutes (configurable in rackd)
Watcher debounce 2 seconds (prevents refresh thrashing)
Watcher paths data/sources/api/ and data/sources/browser/
Cross-rack queries Use ATTACH via ~/.rack/racks.json registry

Common Gotchas

Table names: DuckDB doesn't like hyphens in table names. Use underscores. github-prs.py creates github_prs, not github-prs.
Path resolution: Use Path(__file__) to find files relative to the script, not os.getcwd(). The working directory isn't always what you think.
Snapshot size: Temporal snapshots are full copies of the database. If your DB is 500MB, each snapshot is 500MB. Plan accordingly.

Configuration

Most of the time you don't need to configure anything. But when you do, here's where things live:

Global Files

File Purpose
~/.rack/racks.json Registry for cross-rack queries
~/.rack/schedules.json Cron-style schedules

MCP Config Locations

Tool Config File
Claude Code ~/.claude.json
Cursor .cursor/mcp.json
VS Code .vscode/mcp.json

Per-Rack Locations

Path Purpose
.rack/signals.md Alert log
.rack/activity.log Execution history
.rack/annotations.yaml Timeline annotations
.rack/snapshots/temporal/ Historical database copies
.rack/launchd.log Scheduler logs (macOS)

Schedule Format

~/.rack/schedules.json example:

[
  {
    "rack_name": "metrics",
    "rack_path": "/absolute/path",
    "target": "refresh",
    "schedule": "every 1h",
    "created": "2024-01-15T10:00:00Z"
  }
]

Schedule patterns: every 15m, every 1h, daily at 19:00

rackd Daemon

If you have multiple racks, rackd watches them all and auto-refreshes when source files change. You drop a new JSON file in, it notices and runs the pipeline.

What It Does

  • Watches data/sources/api/ and data/sources/browser/ across all your racks
  • Triggers refresh when JSON files appear or change
  • Runs refreshes in a worker pool (configurable concurrency)
  • Exposes a REST API on port 9876 for status and manual triggers

Configuration

rackd.yaml example:

log:
  level: info          # debug, info, warn, error
  format: text         # or json

server:
  port: 9876           # API server port

watcher:
  root_path: ~/Documents   # Where to scan for @racks
  debounce: 2s             # Coalesce file events
  patterns:
    - "*.json"              # Watch for JSON changes

executor:
  timeout: 5m          # Max execution time
  concurrency: 1       # Parallel racks

health:
  interval: 30s        # Health check frequency

API Endpoints

Method Endpoint Purpose
GET /health Server liveness
GET /api/v1/racks List all discovered racks
GET /api/v1/racks/{name} Get single rack status
POST /api/v1/racks/{name}/refresh Queue refresh (202 if queued, 409 if running)
GET /api/v1/racks/{name}/logs Fetch execution history

Key Limits

Limit Value
Queue capacity 100 racks
Execution timeout 5 minutes
Logs per rack Last 100 executions
Debounce window 2 seconds
Health check interval 30 seconds

Startup Flow

  1. Load config (rackd.yaml)
  2. Scan for @name directories
  3. Start executor worker goroutines
  4. Begin file system watching
  5. Start health checks
  6. Listen on API port
  7. Wait for SIGINT/SIGTERM for graceful shutdown