Navigation
Rack v0.3.0
How it works
Design Principles
- Plain text everywhere. Bash, Python, SQL, Markdown, JSON. If you can read it in a text editor, you can understand what's happening. No magic formats, no vendor lock-in.
- Let Claude do the tedious parts. You say what you want to track. Claude writes the scripts, runs the queries, spots the patterns. You stay focused on questions worth asking.
- No infrastructure. No Docker, no servers, no queues, no accounts to create. It runs on your laptop. When you're done, delete the folder. That's it.
- Reproducible from scratch. Blow away the database, re-run the scripts, get the same result. Every transformation is a script you can read.
- Trust but verify. Every file ingestion is logged. Every purge is tracked. When something looks wrong, you can trace it back to the exact source file that caused it.
Mental Model
Rack pulls scattered data into one local database you can actually query.
Here's the problem: your data is everywhere. GitHub has your PRs. Stripe has your revenue. Some competitor's pricing lives behind a login. A spreadsheet someone emailed you. None of it talks to each other.
Rack gives you one place to put it all — a DuckDB file on your machine — and a simple way to keep it fresh. The trick is that data collection, while tedious, is pretty automatable. So Claude writes the fetch scripts. You just say what you care about.
The loop looks like this:
Under the hood, three phases run in order:
- Fetch — Shell scripts call APIs and dump JSON files
- Ingest — Python scripts load those JSON files into DuckDB tables
- Derive — Python scripts query the database and spit out Markdown dashboards
All plain text. Open any file and you can see exactly what it does.
Directory Structure
A rack is just a folder with some conventions. If it has a playbook.md, it's a rack.
Naming Conventions
| Item | Convention | Example |
|---|---|---|
| Rack directory | Prefixed with @ |
@company-metrics |
| Database file | {dirname}.duckdb |
company-metrics.duckdb |
| Fetch scripts | {source}.sh |
github-prs.sh |
| Ingest scripts | Match fetch name | github-prs.py |
| Derive scripts | Dashboard name | pr-velocity.py |
| JSON output | Script basename | github-prs.json |
| DuckDB tables | Underscores, not hyphens | github_prs |
Output Organization
As your rack grows, organize derived outputs beyond flat dashboards:
Data Flow
Data gets in through two paths:
API Sources (automated)
Most data comes from CLIs and REST APIs — gh, stripe, aws, plain curl. Shell scripts fetch JSON into data/sources/api/. These can run on a cron. No browser needed.
Browser Sources (interactive)
Some data hides behind logins — competitor pricing pages, internal dashboards, that one report only accessible through a web UI. For these, Claude uses Playwright to scrape the page while you watch. Output lands in data/sources/browser/. Screenshots get saved too, so you have evidence of what was seen.
Why Sources Are Immutable
Once a source file is written, we don't overwrite it. Need to re-pull the same day? Append a timestamp. This sounds fussy, but it means you can always trace a weird number back to the exact file that produced it. Silent overwrites are how audit trails die.
How Ingestion Works
Both paths feed into DuckDB the same way:
- Check schema and required fields
- Skip files we've already ingested (checksum match)
- For individual records, only update if something changed
- Stamp everything with
_ingested_atso you can query freshness - Log every attempt — row counts, status, what happened
The Pipeline
Run rack refresh and it executes scripts/refresh.sh, which just runs the three phases in order: fetch, ingest, derive. Nothing fancy.
Fetch Scripts
Live in scripts/fetch/*.sh. Output JSON to data/sources/.
Prefer CLIs when they exist (gh, stripe, aws) — they handle auth and pagination. Fall back to curl. Use Playwright only when you have to log in through a browser.
Complete fetch example
#!/bin/bash
# scripts/fetch/github-prs.sh
# Fetches open PRs from a GitHub repo
gh pr list --repo owner/repo \
--state all \
--limit 100 \
--json number,title,author,createdAt,mergedAt,state \
> data/sources/api/github-prs.json
Ingest Scripts
Live in scripts/ingest/*.py. Read JSON with DuckDB's read_json_auto(), write to a table with CREATE OR REPLACE TABLE.
Naming follows a pattern: github-prs.py reads github-prs.json and creates table github_prs. (DuckDB tables use underscores, not hyphens. One of those things.)
Complete ingest example
#!/usr/bin/env python3
# scripts/ingest/github-prs.py
# Loads GitHub PRs into DuckDB
import duckdb
from pathlib import Path
DB = Path(__file__).parent.parent.parent / "data" / "my-rack.duckdb"
SOURCE = Path(__file__).parent.parent.parent / "data" / "sources" / "api" / "github-prs.json"
conn = duckdb.connect(str(DB))
conn.execute("""
CREATE OR REPLACE TABLE pull_requests AS
SELECT * FROM read_json_auto(?)
""", [str(SOURCE)])
print(f"Loaded {conn.execute('SELECT COUNT(*) FROM pull_requests').fetchone()[0]} PRs")
Derive Scripts
Live in scripts/derive/*.py. Query the database, write Markdown to dashboards/.
Some helpers exist in scripts/ops/derive_helper.py for common patterns. See Dashboard Helpers for the full API.
| Helper | Purpose | Output |
|---|---|---|
bar(val, max, width) |
Inline bar chart | ████▎ |
sparkline(values) |
Trend visualization | ▁▂▅█▆▁ |
trend_arrow(curr, prev) |
Delta indicator | ▲ +23% |
signal(title, details) |
Emit alert | Appends to .rack/signals.md |
Complete derive example
#!/usr/bin/env python3
# scripts/derive/pr-velocity.py
# Generates weekly PR velocity dashboard
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "ops"))
from derive_helper import bar, sparkline, trend_arrow
import duckdb
from datetime import datetime
DB = Path(__file__).parent.parent.parent / "data" / "my-rack.duckdb"
OUT = Path(__file__).parent.parent.parent / "dashboards" / "pr-velocity.md"
conn = duckdb.connect(str(DB), read_only=True)
weekly = conn.execute("""
SELECT DATE_TRUNC('week', createdAt)::DATE as week,
COUNT(*) as opened,
COUNT(*) FILTER (WHERE mergedAt IS NOT NULL) as merged
FROM pull_requests
WHERE createdAt >= CURRENT_DATE - INTERVAL '8 weeks'
GROUP BY 1 ORDER BY 1
""").fetchall()
# Build dashboard
lines = [
"---",
"title: PR Velocity",
f"generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}",
"---",
"",
"[[← index]]",
"",
"# PR Velocity",
"",
]
if weekly:
opened_vals = [r[1] for r in weekly]
merged_vals = [r[2] for r in weekly]
max_opened = max(opened_vals)
lines.extend([
f"**{sum(opened_vals)}** PRs opened, **{sum(merged_vals)}** merged over 8 weeks.",
"",
f"Trend: `{sparkline(opened_vals)}`",
"",
"| Week | Opened | | Merged |",
"|------|-------:|---|-------:|",
])
for week, opened, merged in weekly:
lines.append(f"| {week} | {opened} | `{bar(opened, max_opened, 12)}` | {merged} |")
lines.extend([
"",
"---",
"*Auto-generated. Interpretations are hypotheses.*",
])
OUT.write_text("\n".join(lines) + "\n")
print(f"Generated {OUT.name}")
Dashboard Helpers
Professional-grade dashboard formatting lives in scripts/ops/derive_helper.py. These helpers make your dashboards look polished and consistent.
Color Palette
A consistent color scheme for dashboard elements:
| Color | Hex | Purpose |
|---|---|---|
| Cyan | #00d4ff |
Primary metrics, headers |
| Green | #7ee787 |
Positive deltas, growth |
| Red | #f85149 |
Negative deltas, alerts |
| Yellow | #d29922 |
Warnings, neutral changes |
| Muted | #8b949e |
Secondary text, labels |
Visualization Functions
| Function | Purpose | Example Output |
|---|---|---|
c(text, color) |
Wrap text in color span | <span style="color:#7ee787">+12%</span> |
muted(text) |
Dim secondary text | <span style="color:#8b949e">vs last week</span> |
sparkline(values) |
Trend visualization | ▁▂▅█▆▁ |
bar(val, max, width) |
Inline bar chart | ████▎ |
colored_spark(values) |
Sparkline with color | Green/red based on trend direction |
Delta Formatting
Two ways to show change, depending on your metric type:
| Function | Use When | Example |
|---|---|---|
delta_pct(curr, prev) |
Showing % change of a value | +23% (green) or -15% (red) |
delta_pp(curr, prev) |
Comparing percentages | +2.1pp (e.g., conversion 5.1% → 7.2%) |
delta_pct for "revenue grew 23%". Use delta_pp for "conversion rate improved from 5% to 7%" (that's +2 percentage points, not +40%).
Structure Helpers
Functions that add consistent structure to dashboards:
| Function | Purpose |
|---|---|
frontmatter(title, generated) |
YAML header with title and timestamp |
tldr(bullet_points) |
Executive summary at top of dashboard |
data_source_callout(sources) |
Shows where data comes from |
caveat_callout(text) |
Adds warning box for data limitations |
Example: Using structure helpers
from derive_helper import frontmatter, tldr, data_source_callout
lines = []
lines.extend(frontmatter("Weekly Growth"))
lines.extend(tldr([
"Signups up 12% week-over-week",
"Conversion holding steady at 4.2%",
"Revenue on track for monthly goal",
]))
lines.extend(data_source_callout(["Stripe", "Segment"]))
Playbook System
The playbook.md file isn't just a marker — it's instructions for Claude. When you open the rack in Claude Code and ask a question, Claude reads the playbook first to understand what this rack is about.
What to Put in It
- Focus: What you're tracking and why you care
- Sources: Where the data comes from — which APIs, what browser sessions
- Questions: What you actually want to know
- Metrics: The numbers that matter
- Browser targets: URLs and login requirements for anything that needs scraping
Why This Works
Claude doesn't know your domain. It doesn't know that "velocity" means PRs merged per week, or that your competitor's pricing is on a particular URL. The playbook tells it. Then when you ask "why did velocity drop?" it has context to write the right query and make sense of the answer.
Example playbook.md
# Company Metrics Rack
## Focus
Track engineering velocity and shipping cadence.
## Sources
- GitHub PRs (via gh CLI)
- Linear issues (via Linear API)
- Deploy events (browser scrape from internal dashboard)
## Questions
- How many PRs ship per week?
- What's our mean time to merge?
- Which areas of the codebase are most active?
## Metrics
- PRs opened/merged per week
- Cycle time (PR open to merge)
- Review turnaround time
Commands
The main things you'll run:
| Command | Purpose | Key flags |
|---|---|---|
rack init [name] |
Create new rack | --template, --focus |
rack refresh |
Run full pipeline | — |
rack query "SQL" |
Direct SQL access | --at DATE for temporal |
rack check |
Show inbox, DB size, warnings | — |
rack lineage |
Dependency visualization | --format json |
rack snapshot |
Save current DB state | — |
rack snapshots |
List available snapshots | — |
rack diff |
Compare pre/post refresh | — |
rack signals |
Show recent alerts | -n 20, --clear |
rack mcp serve |
Start MCP server | --allow-refresh |
rack mcp install |
Configure AI tools | Claude, Cursor, VS Code |
Lineage System
Run rack lineage to see how data flows through your scripts. It parses the code and figures out what reads what.
Detection Patterns
| Phase | Input Detection | Output Detection |
|---|---|---|
| Fetch | N/A | > file, -o, --output |
| Ingest | read_json_auto('file'), SOURCE= |
CREATE TABLE, INSERT INTO |
| Derive | FROM table, JOIN table |
OUT=, write_text() |
Normalization Rules
- Fetch outputs normalized to
data/sources/api/ordata/sources/browser/ - Ingest outputs suffixed as
tablename (table) - Derive outputs default to
dashboards/{scriptname}.mdif not detected - Reserved SQL keywords excluded from table detection
Output Formats
Default text output (grouped by phase):
FETCH
github-prs.sh
→ data/sources/api/github-prs.json
INGEST
github-prs.py
← data/sources/api/github-prs.json
→ pull_requests (table)
DERIVE
pr-velocity.py
← pull_requests (table)
→ dashboards/pr-velocity.md
JSON output (--format json) returns full graph structure for programmatic use.
Temporal Queries
Sometimes you need to know what the data looked like last Tuesday. Rack takes snapshots, so you can:
# Save current state
rack snapshot
# Query a specific date
rack query --at 2024-01-10 "SELECT * FROM users"
# See what changed since last refresh
rack diff
Snapshot Types
| Type | Location | Purpose |
|---|---|---|
| Temporal | .rack/snapshots/temporal/YYYY-MM-DD-HHMMSS.duckdb |
Historical queries via --at |
| Pre/Post refresh | .rack/snapshots/pre.json, post.json |
Diff captures state before/after each refresh |
How Diff Works
Before each refresh, we snapshot row counts. After refresh, snapshot again. rack diff compares them. Simple, but useful — you can see at a glance that you added 47 new PRs and updated 3 users.
Signal System
The problem with dashboards is you have to look at them. Signals flip that around — when something interesting happens, the data tells you.
Your derive scripts can emit signals when thresholds cross: churn above 5%, a big revenue spike, whatever matters. Run rack signals and see what's happened since you last looked. It's like a changelog for your metrics.
from signal_helper import signal
if churn_rate > 0.05:
signal("churn crossed 5%", details=f"Current: {churn_rate:.1%}")
Signal Format
Signals are stored in Markdown with timestamps:
## 2024-01-15 10:23
churn crossed 5%
Current: 5.2%
## 2024-01-14 09:15
revenue spike detected
$8,400 from 3 annual plans
Commands
| Command | Purpose |
|---|---|
rack signals |
Show 10 most recent |
rack signals -n 20 |
Show 20 most recent |
rack signals --clear |
Reset signal log |
Lifecycle Operations
How Data Accumulates
On a typical day:
- API fetches run — maybe on a cron, maybe you triggered
rack refresh - Browser scrapes run when Claude's active and the data is stale
- New files in sources/ get ingested
- Same file twice? Skipped (checksum match)
- Same record twice? Only updated if it changed
- Metrics recomputed only for affected dates
Everything Must Be Re-runnable
Scripts should be idempotent. Use CREATE OR REPLACE TABLE, not CREATE TABLE. That way, if something fails mid-run, you just run it again. No cleanup required.
When Bad Data Gets In
It happens. When it does:
- Purge by source file: Delete everything that came from a specific file
- Purge by date range: Wipe records in a time window
- Always see row counts before confirming
- Always take a snapshot first
- Every purge gets logged
Error Handling
Things will fail. The goal is to fail gracefully — don't block the whole pipeline, don't corrupt good data, make it easy to pick up where you left off.
When Scrapes Fail
- Screenshot whatever state you reached — helpful for debugging
- Save partial data with an
"error"field so it's obvious something went wrong - Log it, but don't crash the session
- Since everything's idempotent, you can just try again later
When Ingestion Fails
- Log status='failed' so it's visible
- Don't partially commit — the whole file either makes it or it doesn't
- Surface the error for a human to look at
A failed fetch shouldn't stop other fetches. A failed ingest shouldn't touch existing tables. Partial success is fine — better to have 90% of the data than 0%.
MCP Server
MCP is how AI tools talk to external data sources. Rack includes an MCP server so Claude Code, Cursor, or VS Code can query your rack directly.
Quick Setup
# Auto-configure Claude Code, Cursor, or VS Code
rack mcp install
# Manual start (for debugging)
rack mcp serve /path/to/@my-rack
Available Tools
| Tool | Input | Output | Safe |
|---|---|---|---|
query |
SQL (SELECT only) | Results with auto-LIMIT | Yes |
tables |
None | Schema via information_schema | Yes |
lineage |
None | Dependency graph JSON | Yes |
refresh |
None | Pipeline output | Requires --allow-refresh |
Resources
| URI | Content |
|---|---|
schema://tables |
Full schema JSON with columns/types |
docs://playbook |
Raw playbook.md content |
Safety Limits
| Limit | Default | Flag |
|---|---|---|
| Max rows returned | 1,000 | --max-rows |
| Max output size | 100 KB | --max-output-kb |
| Non-SELECT queries | Blocked | — |
Advanced Patterns
Production racks develop patterns beyond basic fetch/ingest/derive. These are battle-tested approaches from racks with 12+ dashboards and months of history.
Dashboard Archive Pattern
Before each refresh, archive the current dashboard state. This lets you track how your metrics reporting evolves, not just the metrics themselves.
#!/bin/bash
# Archive dashboards before refresh
DATE=$(date +%Y-%m-%d)
ARCHIVE=derived/snapshots/$DATE
mkdir -p $ARCHIVE
cp dashboards/*.md $ARCHIVE/ 2>/dev/null || true
# Then run refresh
./scripts/refresh.sh
Over time, you build a changelog of your dashboards — useful for understanding how your analysis evolved.
Competitive Intelligence
Track competitors with a structured registry. A competitors.yaml file keeps metadata organized:
# data/competitors.yaml
competitors:
- name: Acme Corp
website: https://acme.com
signals:
- type: pricing
url: https://acme.com/pricing
frequency: weekly
- type: hiring
url: https://acme.com/careers
frequency: weekly
- type: news
query: "Acme Corp funding OR launch"
frequency: daily
Signal types you might track:
| Signal Type | Source | Typical Frequency |
|---|---|---|
| Pricing changes | Pricing page scrape | Weekly |
| Hiring velocity | Careers page, LinkedIn | Weekly |
| App store metrics | App Store / Play Store | Daily |
| News mentions | News API, Google Alerts | Daily |
| Product changes | Changelog, Twitter | Daily |
Emit a signal when data goes stale:
if days_since_last_scrape > 14:
signal(f"Stale: {competitor} pricing data is {days} days old")
Qualitative Data
Numbers alone miss the story. Blend qualitative insights with quantitative data:
- Interview notes — Store as structured markdown with tags
- Customer feedback — Sentiment alongside NPS scores
- Support tickets — Theme categorization with volume data
Example: correlating interview themes with funnel data.
# data/sources/manual/interviews/2024-01-15-user-a.md
---
date: 2024-01-15
user_segment: power_user
themes: [onboarding, pricing, mobile]
---
## Key Quotes
- "I almost churned because the mobile app was so bad"
- "Pricing makes sense once you understand it"
## Action Items
- [ ] Investigate mobile drop-off in funnel
Your derive script can then pull themes alongside conversion data to spot patterns humans would miss.
Quirks & Edge Cases
Some things that might trip you up, collected here so you don't have to discover them the hard way:
| Behavior | Details |
|---|---|
| Rack detection | Must have playbook.md and scripts/refresh.sh |
| Directory naming | Prefix with @ (e.g., @metrics). Spaces become hyphens. |
| Database naming | {directory-name}.duckdb (@ prefix stripped) |
| Script discovery | Any *.sh in fetch/, *.py in ingest/derive/ |
| Execution order | Alphabetical within each phase |
| Idempotency | Scripts must be re-runnable (use CREATE OR REPLACE) |
| Duplicate prevention | Executor blocks concurrent runs of same rack |
| Execution timeout | 5 minutes (configurable in rackd) |
| Watcher debounce | 2 seconds (prevents refresh thrashing) |
| Watcher paths | data/sources/api/ and data/sources/browser/ |
| Cross-rack queries | Use ATTACH via ~/.rack/racks.json registry |
Common Gotchas
github-prs.py creates github_prs, not github-prs.
Path(__file__) to find files relative to the script, not os.getcwd(). The working directory isn't always what you think.
Configuration
Most of the time you don't need to configure anything. But when you do, here's where things live:
Global Files
| File | Purpose |
|---|---|
~/.rack/racks.json |
Registry for cross-rack queries |
~/.rack/schedules.json |
Cron-style schedules |
MCP Config Locations
| Tool | Config File |
|---|---|
| Claude Code | ~/.claude.json |
| Cursor | .cursor/mcp.json |
| VS Code | .vscode/mcp.json |
Per-Rack Locations
| Path | Purpose |
|---|---|
.rack/signals.md |
Alert log |
.rack/activity.log |
Execution history |
.rack/annotations.yaml |
Timeline annotations |
.rack/snapshots/temporal/ |
Historical database copies |
.rack/launchd.log |
Scheduler logs (macOS) |
Schedule Format
~/.rack/schedules.json example:
[
{
"rack_name": "metrics",
"rack_path": "/absolute/path",
"target": "refresh",
"schedule": "every 1h",
"created": "2024-01-15T10:00:00Z"
}
]
Schedule patterns: every 15m, every 1h, daily at 19:00
rackd Daemon
If you have multiple racks, rackd watches them all and auto-refreshes when source files change. You drop a new JSON file in, it notices and runs the pipeline.
What It Does
- Watches
data/sources/api/anddata/sources/browser/across all your racks - Triggers refresh when JSON files appear or change
- Runs refreshes in a worker pool (configurable concurrency)
- Exposes a REST API on port 9876 for status and manual triggers
Configuration
rackd.yaml example:
log:
level: info # debug, info, warn, error
format: text # or json
server:
port: 9876 # API server port
watcher:
root_path: ~/Documents # Where to scan for @racks
debounce: 2s # Coalesce file events
patterns:
- "*.json" # Watch for JSON changes
executor:
timeout: 5m # Max execution time
concurrency: 1 # Parallel racks
health:
interval: 30s # Health check frequency
API Endpoints
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /health |
Server liveness |
| GET | /api/v1/racks |
List all discovered racks |
| GET | /api/v1/racks/{name} |
Get single rack status |
| POST | /api/v1/racks/{name}/refresh |
Queue refresh (202 if queued, 409 if running) |
| GET | /api/v1/racks/{name}/logs |
Fetch execution history |
Key Limits
| Limit | Value |
|---|---|
| Queue capacity | 100 racks |
| Execution timeout | 5 minutes |
| Logs per rack | Last 100 executions |
| Debounce window | 2 seconds |
| Health check interval | 30 seconds |
Startup Flow
- Load config (
rackd.yaml) - Scan for
@namedirectories - Start executor worker goroutines
- Begin file system watching
- Start health checks
- Listen on API port
- Wait for SIGINT/SIGTERM for graceful shutdown