Rack v0.3.0

Navigation

Design Principles

Plain text everywhere. Bash, Python, SQL, Markdown, JSON. If you can read it in a text editor, you can understand what's happening. No magic formats, no vendor lock-in.
Let Claude do the tedious parts. You say what you want to track. Claude writes the scripts, runs the queries, spots the patterns. You stay focused on questions worth asking.
No infrastructure. No Docker, no servers, no queues, no accounts to create. It runs on your laptop. When you're done, delete the folder. That's it.
Reproducible from scratch. Blow away the database, re-run the scripts, get the same result. Every transformation is a script you can read.
Trust but verify. Every file ingestion is logged. Every purge is tracked. When something looks wrong, you can trace it back to the exact source file that caused it.

Mental Model

Rack pulls scattered data into one local database you can actually query.

Here's the problem: your data is everywhere. GitHub has your PRs. Stripe has your revenue. Some competitor's pricing lives behind a login. A spreadsheet someone emailed you. None of it talks to each other.

Rack gives you one place to put it all — a DuckDB file on your machine — and a simple way to keep it fresh. The trick is that data collection, while tedious, is pretty automatable. So Claude writes the fetch scripts. You just say what you care about.

The loop looks like this:

Describe

what to track

→

Claude

writes scripts

→

Refresh

runs them

→

Dashboards

appear

Under the hood, three phases run in order:

Fetch — Shell scripts call APIs and dump JSON files
Ingest — Python scripts load those JSON files into DuckDB tables
Derive — Python scripts query the database and spit out Markdown dashboards

All plain text. Open any file and you can see exactly what it does.

Directory Structure

A rack is just a folder with some conventions. If it has a playbook.md, it's a rack.

my-rack/ ├── playbook.md # identifies this as a rack ├── .claude/CLAUDE.md # Claude reads this for context ├── data/ │ ├── inbox/ # drop files here for manual ingest │ ├── sources/ │ │ ├── api/ # fetch output (watched by rackd) │ │ └── browser/ # scrape output (watched by rackd) │ └── my-rack.duckdb # the database ├── scripts/ │ ├── fetch/ # pull data (*.sh) │ ├── ingest/ # load to DuckDB (*.py) │ ├── derive/ # generate dashboards (*.py) │ ├── ops/ # helpers (signal, derive, snapshot) │ └── refresh.sh # orchestrates everything ├── dashboards/ # markdown output └── .rack/ ├── signals.md # alert log ├── activity.log # run history └── snapshots/ └── temporal/ # for rack query --at

Naming Conventions

Item	Convention	Example
Rack directory	Prefixed with `@`	`@company-metrics`
Database file	`{dirname}.duckdb`	`company-metrics.duckdb`
Fetch scripts	`{source}.sh`	`github-prs.sh`
Ingest scripts	Match fetch name	`github-prs.py`
Derive scripts	Dashboard name	`pr-velocity.py`
JSON output	Script basename	`github-prs.json`
DuckDB tables	Underscores, not hyphens	`github_prs`

Output Organization

As your rack grows, organize derived outputs beyond flat dashboards:

derived/ ├── snapshots/ # Dated dashboard archives ├── deep-dives/ # Detailed segment analyses ├── cohorts/ # User funnel analyses ├── competitive/ # Competitor tracking └── reports/ # Ad-hoc analyses

Data Flow

Data gets in through two paths:

API Sources (automated)

Most data comes from CLIs and REST APIs — gh, stripe, aws, plain curl. Shell scripts fetch JSON into data/sources/api/. These can run on a cron. No browser needed.

Browser Sources (interactive)

Some data hides behind logins — competitor pricing pages, internal dashboards, that one report only accessible through a web UI. For these, Claude uses Playwright to scrape the page while you watch. Output lands in data/sources/browser/. Screenshots get saved too, so you have evidence of what was seen.

Why Sources Are Immutable

Once a source file is written, we don't overwrite it. Need to re-pull the same day? Append a timestamp. This sounds fussy, but it means you can always trace a weird number back to the exact file that produced it. Silent overwrites are how audit trails die.

How Ingestion Works

Both paths feed into DuckDB the same way:

Check schema and required fields
Skip files we've already ingested (checksum match)
For individual records, only update if something changed
Stamp everything with _ingested_at so you can query freshness
Log every attempt — row counts, status, what happened

The Pipeline

Run rack refresh and it executes scripts/refresh.sh, which just runs the three phases in order: fetch, ingest, derive. Nothing fancy.

Fetch Scripts

Live in scripts/fetch/*.sh. Output JSON to data/sources/.

Prefer CLIs when they exist (gh, stripe, aws) — they handle auth and pagination. Fall back to curl. Use Playwright only when you have to log in through a browser.

Complete fetch example

#!/bin/bash
# scripts/fetch/github-prs.sh
# Fetches open PRs from a GitHub repo

gh pr list --repo owner/repo \
  --state all \
  --limit 100 \
  --json number,title,author,createdAt,mergedAt,state \
  > data/sources/api/github-prs.json

Ingest Scripts

Live in scripts/ingest/*.py. Read JSON with DuckDB's read_json_auto(), write to a table with CREATE OR REPLACE TABLE.

Naming follows a pattern: github-prs.py reads github-prs.json and creates table github_prs. (DuckDB tables use underscores, not hyphens. One of those things.)

Complete ingest example

#!/usr/bin/env python3
# scripts/ingest/github-prs.py
# Loads GitHub PRs into DuckDB

import duckdb
from pathlib import Path

DB = Path(__file__).parent.parent.parent / "data" / "my-rack.duckdb"
SOURCE = Path(__file__).parent.parent.parent / "data" / "sources" / "api" / "github-prs.json"

conn = duckdb.connect(str(DB))
conn.execute("""
    CREATE OR REPLACE TABLE pull_requests AS
    SELECT * FROM read_json_auto(?)
""", [str(SOURCE)])

print(f"Loaded {conn.execute('SELECT COUNT(*) FROM pull_requests').fetchone()[0]} PRs")

Derive Scripts

Live in scripts/derive/*.py. Query the database, write Markdown to dashboards/.

Some helpers exist in scripts/ops/derive_helper.py for common patterns. See Dashboard Helpers for the full API.

Helper	Purpose	Output
`bar(val, max, width)`	Inline bar chart	`████▎`
`sparkline(values)`	Trend visualization	`▁▂▅█▆▁`
`trend_arrow(curr, prev)`	Delta indicator	`▲ +23%`
`signal(title, details)`	Emit alert	Appends to `.rack/signals.md`

Complete derive example

#!/usr/bin/env python3
# scripts/derive/pr-velocity.py
# Generates weekly PR velocity dashboard

import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "ops"))

from derive_helper import bar, sparkline, trend_arrow
import duckdb
from datetime import datetime

DB = Path(__file__).parent.parent.parent / "data" / "my-rack.duckdb"
OUT = Path(__file__).parent.parent.parent / "dashboards" / "pr-velocity.md"

conn = duckdb.connect(str(DB), read_only=True)

weekly = conn.execute("""
    SELECT DATE_TRUNC('week', createdAt)::DATE as week,
           COUNT(*) as opened,
           COUNT(*) FILTER (WHERE mergedAt IS NOT NULL) as merged
    FROM pull_requests
    WHERE createdAt >= CURRENT_DATE - INTERVAL '8 weeks'
    GROUP BY 1 ORDER BY 1
""").fetchall()

# Build dashboard
lines = [
    "---",
    "title: PR Velocity",
    f"generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}",
    "---",
    "",
    "[[← index]]",
    "",
    "# PR Velocity",
    "",
]

if weekly:
    opened_vals = [r[1] for r in weekly]
    merged_vals = [r[2] for r in weekly]
    max_opened = max(opened_vals)

    lines.extend([
        f"**{sum(opened_vals)}** PRs opened, **{sum(merged_vals)}** merged over 8 weeks.",
        "",
        f"Trend: `{sparkline(opened_vals)}`",
        "",
        "| Week | Opened | | Merged |",
        "|------|-------:|---|-------:|",
    ])

    for week, opened, merged in weekly:
        lines.append(f"| {week} | {opened} | `{bar(opened, max_opened, 12)}` | {merged} |")

lines.extend([
    "",
    "---",
    "*Auto-generated. Interpretations are hypotheses.*",
])

OUT.write_text("\n".join(lines) + "\n")
print(f"Generated {OUT.name}")

Dashboard Helpers

Professional-grade dashboard formatting lives in scripts/ops/derive_helper.py. These helpers make your dashboards look polished and consistent.

Color Palette

A consistent color scheme for dashboard elements:

Color	Hex	Purpose
Cyan	`#00d4ff`	Primary metrics, headers
Green	`#7ee787`	Positive deltas, growth
Red	`#f85149`	Negative deltas, alerts
Yellow	`#d29922`	Warnings, neutral changes
Muted	`#8b949e`	Secondary text, labels

Visualization Functions

Function	Purpose	Example Output
`c(text, color)`	Wrap text in color span	`<span style="color:#7ee787">+12%</span>`
`muted(text)`	Dim secondary text	`<span style="color:#8b949e">vs last week</span>`
`sparkline(values)`	Trend visualization	`▁▂▅█▆▁`
`bar(val, max, width)`	Inline bar chart	`████▎`
`colored_spark(values)`	Sparkline with color	Green/red based on trend direction

Delta Formatting

Two ways to show change, depending on your metric type:

Function	Use When	Example
`delta_pct(curr, prev)`	Showing % change of a value	`+23%` (green) or `-15%` (red)
`delta_pp(curr, prev)`	Comparing percentages	`+2.1pp` (e.g., conversion 5.1% → 7.2%)

Use delta_pct for "revenue grew 23%". Use delta_pp for "conversion rate improved from 5% to 7%" (that's +2 percentage points, not +40%).

Structure Helpers

Functions that add consistent structure to dashboards:

Function	Purpose
`frontmatter(title, generated)`	YAML header with title and timestamp
`tldr(bullet_points)`	Executive summary at top of dashboard
`data_source_callout(sources)`	Shows where data comes from
`caveat_callout(text)`	Adds warning box for data limitations

Example: Using structure helpers

from derive_helper import frontmatter, tldr, data_source_callout

lines = []
lines.extend(frontmatter("Weekly Growth"))
lines.extend(tldr([
    "Signups up 12% week-over-week",
    "Conversion holding steady at 4.2%",
    "Revenue on track for monthly goal",
]))
lines.extend(data_source_callout(["Stripe", "Segment"]))

Playbook System

The playbook.md file isn't just a marker — it's instructions for Claude. When you open the rack in Claude Code and ask a question, Claude reads the playbook first to understand what this rack is about.

What to Put in It

Focus: What you're tracking and why you care
Sources: Where the data comes from — which APIs, what browser sessions
Questions: What you actually want to know
Metrics: The numbers that matter
Browser targets: URLs and login requirements for anything that needs scraping

Why This Works

Claude doesn't know your domain. It doesn't know that "velocity" means PRs merged per week, or that your competitor's pricing is on a particular URL. The playbook tells it. Then when you ask "why did velocity drop?" it has context to write the right query and make sense of the answer.

Example playbook.md

# Company Metrics Rack

## Focus
Track engineering velocity and shipping cadence.

## Sources
- GitHub PRs (via gh CLI)
- Linear issues (via Linear API)
- Deploy events (browser scrape from internal dashboard)

## Questions
- How many PRs ship per week?
- What's our mean time to merge?
- Which areas of the codebase are most active?

## Metrics
- PRs opened/merged per week
- Cycle time (PR open to merge)
- Review turnaround time

Commands

The main things you'll run:

Command	Purpose	Key flags
`rack init [name]`	Create new rack	`--template`, `--focus`
`rack refresh`	Run full pipeline	—
`rack query "SQL"`	Direct SQL access	`--at DATE` for temporal
`rack check`	Show inbox, DB size, warnings	—
`rack lineage`	Dependency visualization	`--format json`
`rack snapshot`	Save current DB state	—
`rack snapshots`	List available snapshots	—
`rack diff`	Compare pre/post refresh	—
`rack signals`	Show recent alerts	`-n 20`, `--clear`
`rack mcp serve`	Start MCP server	`--allow-refresh`
`rack mcp install`	Configure AI tools	Claude, Cursor, VS Code

Lineage System

Run rack lineage to see how data flows through your scripts. It parses the code and figures out what reads what.

Detection Patterns

Phase	Input Detection	Output Detection
Fetch	N/A	`> file`, `-o`, `--output`
Ingest	`read_json_auto('file')`, `SOURCE=`	`CREATE TABLE`, `INSERT INTO`
Derive	`FROM table`, `JOIN table`	`OUT=`, `write_text()`

Normalization Rules

Fetch outputs normalized to data/sources/api/ or data/sources/browser/
Ingest outputs suffixed as tablename (table)
Derive outputs default to dashboards/{scriptname}.md if not detected
Reserved SQL keywords excluded from table detection

Output Formats

Default text output (grouped by phase):

FETCH
  github-prs.sh
    → data/sources/api/github-prs.json

INGEST
  github-prs.py
    ← data/sources/api/github-prs.json
    → pull_requests (table)

DERIVE
  pr-velocity.py
    ← pull_requests (table)
    → dashboards/pr-velocity.md

JSON output (--format json) returns full graph structure for programmatic use.

Temporal Queries

Sometimes you need to know what the data looked like last Tuesday. Rack takes snapshots, so you can:

# Save current state
rack snapshot

# Query a specific date
rack query --at 2024-01-10 "SELECT * FROM users"

# See what changed since last refresh
rack diff

Snapshot Types

Type	Location	Purpose
Temporal	`.rack/snapshots/temporal/YYYY-MM-DD-HHMMSS.duckdb`	Historical queries via `--at`
Pre/Post refresh	`.rack/snapshots/pre.json`, `post.json`	Diff captures state before/after each refresh

How Diff Works

Before each refresh, we snapshot row counts. After refresh, snapshot again. rack diff compares them. Simple, but useful — you can see at a glance that you added 47 new PRs and updated 3 users.

Signal System

The problem with dashboards is you have to look at them. Signals flip that around — when something interesting happens, the data tells you.

Your derive scripts can emit signals when thresholds cross: churn above 5%, a big revenue spike, whatever matters. Run rack signals and see what's happened since you last looked. It's like a changelog for your metrics.

from signal_helper import signal

if churn_rate > 0.05:
    signal("churn crossed 5%", details=f"Current: {churn_rate:.1%}")

Signal Format

Signals are stored in Markdown with timestamps:

## 2024-01-15 10:23
churn crossed 5%
Current: 5.2%

## 2024-01-14 09:15
revenue spike detected
$8,400 from 3 annual plans

Commands

Command	Purpose
`rack signals`	Show 10 most recent
`rack signals -n 20`	Show 20 most recent
`rack signals --clear`	Reset signal log

Lifecycle Operations

How Data Accumulates

On a typical day:

API fetches run — maybe on a cron, maybe you triggered rack refresh
Browser scrapes run when Claude's active and the data is stale
New files in sources/ get ingested
Same file twice? Skipped (checksum match)
Same record twice? Only updated if it changed
Metrics recomputed only for affected dates

Everything Must Be Re-runnable

Scripts should be idempotent. Use CREATE OR REPLACE TABLE, not CREATE TABLE. That way, if something fails mid-run, you just run it again. No cleanup required.

When Bad Data Gets In

It happens. When it does:

Purge by source file: Delete everything that came from a specific file
Purge by date range: Wipe records in a time window
Always see row counts before confirming
Always take a snapshot first
Every purge gets logged

Error Handling

Things will fail. The goal is to fail gracefully — don't block the whole pipeline, don't corrupt good data, make it easy to pick up where you left off.

When Scrapes Fail

Screenshot whatever state you reached — helpful for debugging
Save partial data with an "error" field so it's obvious something went wrong
Log it, but don't crash the session
Since everything's idempotent, you can just try again later

When Ingestion Fails

Log status='failed' so it's visible
Don't partially commit — the whole file either makes it or it doesn't
Surface the error for a human to look at

A failed fetch shouldn't stop other fetches. A failed ingest shouldn't touch existing tables. Partial success is fine — better to have 90% of the data than 0%.

MCP Server

MCP is how AI tools talk to external data sources. Rack includes an MCP server so Claude Code, Cursor, or VS Code can query your rack directly.

Quick Setup

# Auto-configure Claude Code, Cursor, or VS Code
rack mcp install

# Manual start (for debugging)
rack mcp serve /path/to/@my-rack

Available Tools

Tool	Input	Output	Safe
`query`	SQL (SELECT only)	Results with auto-LIMIT	Yes
`tables`	None	Schema via information_schema	Yes
`lineage`	None	Dependency graph JSON	Yes
`refresh`	None	Pipeline output	Requires `--allow-refresh`

Resources

URI	Content
`schema://tables`	Full schema JSON with columns/types
`docs://playbook`	Raw playbook.md content

Safety Limits

Limit	Default	Flag
Max rows returned	1,000	`--max-rows`
Max output size	100 KB	`--max-output-kb`
Non-SELECT queries	Blocked	—

Forgot to add LIMIT to your query? The server adds one for you. Output too big? It truncates. Hard to shoot yourself in the foot.

Advanced Patterns

Production racks develop patterns beyond basic fetch/ingest/derive. These are battle-tested approaches from racks with 12+ dashboards and months of history.

Dashboard Archive Pattern

Before each refresh, archive the current dashboard state. This lets you track how your metrics reporting evolves, not just the metrics themselves.

#!/bin/bash
# Archive dashboards before refresh
DATE=$(date +%Y-%m-%d)
ARCHIVE=derived/snapshots/$DATE

mkdir -p $ARCHIVE
cp dashboards/*.md $ARCHIVE/ 2>/dev/null || true

# Then run refresh
./scripts/refresh.sh

Over time, you build a changelog of your dashboards — useful for understanding how your analysis evolved.

Competitive Intelligence

Track competitors with a structured registry. A competitors.yaml file keeps metadata organized:

# data/competitors.yaml
competitors:
  - name: Acme Corp
    website: https://acme.com
    signals:
      - type: pricing
        url: https://acme.com/pricing
        frequency: weekly
      - type: hiring
        url: https://acme.com/careers
        frequency: weekly
      - type: news
        query: "Acme Corp funding OR launch"
        frequency: daily

Signal types you might track:

Signal Type	Source	Typical Frequency
Pricing changes	Pricing page scrape	Weekly
Hiring velocity	Careers page, LinkedIn	Weekly
App store metrics	App Store / Play Store	Daily
News mentions	News API, Google Alerts	Daily
Product changes	Changelog, Twitter	Daily

Emit a signal when data goes stale:

if days_since_last_scrape > 14:
    signal(f"Stale: {competitor} pricing data is {days} days old")

Qualitative Data

Numbers alone miss the story. Blend qualitative insights with quantitative data:

Interview notes — Store as structured markdown with tags
Customer feedback — Sentiment alongside NPS scores
Support tickets — Theme categorization with volume data

Example: correlating interview themes with funnel data.

# data/sources/manual/interviews/2024-01-15-user-a.md
---
date: 2024-01-15
user_segment: power_user
themes: [onboarding, pricing, mobile]
---

## Key Quotes
- "I almost churned because the mobile app was so bad"
- "Pricing makes sense once you understand it"

## Action Items
- [ ] Investigate mobile drop-off in funnel

Your derive script can then pull themes alongside conversion data to spot patterns humans would miss.

Quirks & Edge Cases

Some things that might trip you up, collected here so you don't have to discover them the hard way:

Behavior	Details
Rack detection	Must have `playbook.md` and `scripts/refresh.sh`
Directory naming	Prefix with `@` (e.g., `@metrics`). Spaces become hyphens.
Database naming	`{directory-name}.duckdb` (@ prefix stripped)
Script discovery	Any `.sh` in fetch/, `.py` in ingest/derive/
Execution order	Alphabetical within each phase
Idempotency	Scripts must be re-runnable (use CREATE OR REPLACE)
Duplicate prevention	Executor blocks concurrent runs of same rack
Execution timeout	5 minutes (configurable in rackd)
Watcher debounce	2 seconds (prevents refresh thrashing)
Watcher paths	`data/sources/api/` and `data/sources/browser/`
Cross-rack queries	Use ATTACH via `~/.rack/racks.json` registry

Common Gotchas

Table names: DuckDB doesn't like hyphens in table names. Use underscores. github-prs.py creates github_prs, not github-prs.

Path resolution: Use Path(__file__) to find files relative to the script, not os.getcwd(). The working directory isn't always what you think.

Snapshot size: Temporal snapshots are full copies of the database. If your DB is 500MB, each snapshot is 500MB. Plan accordingly.

Configuration

Most of the time you don't need to configure anything. But when you do, here's where things live:

Global Files

File	Purpose
`~/.rack/racks.json`	Registry for cross-rack queries
`~/.rack/schedules.json`	Cron-style schedules

MCP Config Locations

Tool	Config File
Claude Code	`~/.claude.json`
Cursor	`.cursor/mcp.json`
VS Code	`.vscode/mcp.json`

Per-Rack Locations

Path	Purpose
`.rack/signals.md`	Alert log
`.rack/activity.log`	Execution history
`.rack/annotations.yaml`	Timeline annotations
`.rack/snapshots/temporal/`	Historical database copies
`.rack/launchd.log`	Scheduler logs (macOS)

Schedule Format

~/.rack/schedules.json example:

[
  {
    "rack_name": "metrics",
    "rack_path": "/absolute/path",
    "target": "refresh",
    "schedule": "every 1h",
    "created": "2024-01-15T10:00:00Z"
  }
]

Schedule patterns: every 15m, every 1h, daily at 19:00

rackd Daemon

If you have multiple racks, rackd watches them all and auto-refreshes when source files change. You drop a new JSON file in, it notices and runs the pipeline.

What It Does

Watches data/sources/api/ and data/sources/browser/ across all your racks
Triggers refresh when JSON files appear or change
Runs refreshes in a worker pool (configurable concurrency)
Exposes a REST API on port 9876 for status and manual triggers

Configuration

rackd.yaml example:

log:
  level: info          # debug, info, warn, error
  format: text         # or json

server:
  port: 9876           # API server port

watcher:
  root_path: ~/Documents   # Where to scan for @racks
  debounce: 2s             # Coalesce file events
  patterns:
    - "*.json"              # Watch for JSON changes

executor:
  timeout: 5m          # Max execution time
  concurrency: 1       # Parallel racks

health:
  interval: 30s        # Health check frequency

API Endpoints

Method	Endpoint	Purpose
GET	`/health`	Server liveness
GET	`/api/v1/racks`	List all discovered racks
GET	`/api/v1/racks/{name}`	Get single rack status
POST	`/api/v1/racks/{name}/refresh`	Queue refresh (202 if queued, 409 if running)
GET	`/api/v1/racks/{name}/logs`	Fetch execution history

Key Limits

Limit	Value
Queue capacity	100 racks
Execution timeout	5 minutes
Logs per rack	Last 100 executions
Debounce window	2 seconds
Health check interval	30 seconds

Startup Flow

Load config (rackd.yaml)
Scan for @name directories
Start executor worker goroutines
Begin file system watching
Start health checks
Listen on API port
Wait for SIGINT/SIGTERM for graceful shutdown