ClosedLoop.AI

Claude Code Expert Training Module

Teach Me Something New About ClaudeCode in 15 Minutes

A ClosedLoop.AI pre-read chapter for engineers learning the Claude Code primitives that compound speed, cost efficiency, and first-pass code quality.

Module Chapters

Give Claude Better Context
Burn Less Tokens Today
Land Better Code The First Time

Chapter 1: Give Claude Better Context

Problem: Your engineers are typing the same instructions repeatedly, losing context mid-session, and getting inconsistent results across the team.

Objective: This lesson covers the three mechanisms that fix that — persistent memory, sub-agent delegation, and strategic compaction.

Part 1: The Memory System (~5 min)

Most users know about CLAUDE.md — the project instructions file checked into your repo. What they don't know is that Claude Code has a four-layer memory hierarchy and an auto-memory system that learns from your sessions.

The Four Layers

Memory loads from broadest to narrowest scope, each layer adding context:

Layer	Location	Who writes it	Scope
Managed policy	/Library/Application Support/ClaudeCode/CLAUDE.md	IT/Platform team	Org-wide, cannot be excluded
User	~/.claude/CLAUDE.md	Individual developer	All their projects
Project	./CLAUDE.md or ./.claude/CLAUDE.md	Team	Checked into version control
Local	./CLAUDE.local.md	Individual developer	.gitignore'd, personal preferences

The org implication: That managed policy layer means your platform team can enforce standards — security rules, code style, compliance constraints — across a large engineering organization without relying on anyone reading a wiki. Claude loads it at session start, every time, and it cannot be excluded.

Auto-Memory: Claude Learns From Your Sessions

Run /memory to see what Claude has learned about you. It stores insights in ~/.claude/projects/<project>/memory/ as plain markdown files across four types:

User memories — your role, expertise, preferences ("senior Go engineer, new to React")
Feedback memories — corrections and confirmed approaches ("don't mock the database in integration tests")
Project memories — ongoing work context ("merge freeze starts June 5 for mobile release")
Reference memories — pointers to external systems ("pipeline bugs tracked in Linear project INGEST")

This is not a black box. These are markdown files you can read, edit, and delete. The MEMORY.md index is capped at 200 lines and loads every session. Topic files load on-demand.

Why this matters at scale: When an engineer corrects Claude once — "we use Zod for validation, not manual type checks" — that correction persists across sessions. Multiply that across many engineers and teams teaching Claude their local conventions, and you've got an institutional knowledge system that reduces repeated mistakes org-wide.

Path-Scoped Rules: Conditional Context

Most users put everything in one CLAUDE.md. The better pattern is .claude/rules/:

# .claude/rules/api-security.md
---
paths:
  - "src/api/**/*.ts"
  - "src/middleware/**/*.ts"
---

All endpoints must validate input with Zod schemas.
Never return raw database errors to clients.
Use the Result<T> error model.

These rules only load when Claude reads matching files. Your API security rules don't burn tokens when someone's editing a React component. For a large monorepo, this is the difference between 200 lines of always-loaded context and targeted injection of relevant standards.

The @ Import System

CLAUDE.md supports file imports:

@docs/coding-standards.md @packages/database/CLAUDE.md

These resolve relative to the importing file, support up to 4 levels of nesting, and expand at session start. This lets you modularize instructions instead of maintaining one massive file.

Part 2: Sub-Agent Delegation (~5 min)

This is the feature most power users underutilize. When you ask Claude to "search the codebase for all authentication patterns," it reads dozens of files — and every one of those files stays in your context window, consuming tokens on every subsequent turn.

Sub-agents solve this. They run in isolated context windows, do the heavy lifting, and return only a summary to your main session.

How It Works

Claude spawns a sub-agent with its own system prompt, tool restrictions, and context. The sub-agent explores, analyzes, or implements — then sends back a concise result. The verbose exploration output (file contents, grep results, error traces) stays in the sub-agent's context, not yours.

Built-In Sub-Agents

Explore (runs on Haiku — fast/cheap): File discovery and codebase analysis. Three thoroughness levels: quick, medium, very thorough.
Plan (inherits your model): Read-only codebase research before planning changes.
General-purpose: Full tool access for complex multi-step tasks.

Creating Custom Sub-Agents

This is where it gets interesting at enterprise scale. Run /agents → Library → Create New, or create a markdown file:

# .claude/agents/security-reviewer.md
---
name: security-reviewer
description: Review code changes for security vulnerabilities. Use proactively.
tools: Read, Grep, Glob, Bash
model: sonnet
maxTurns: 10
---

You are a security specialist. For every code change:
1. Check for injection vulnerabilities (SQL, XSS, command)
2. Verify input validation at system boundaries
3. Check for exposed secrets or API keys
4. Verify authentication/authorization checks

Report findings by severity: Critical, High, Medium, Low.

Drop this in .claude/agents/ and it's available to every engineer on the project. The description field includes "Use proactively" — meaning Claude will automatically delegate security reviews without being asked.

Prompt Best Practices for Sub-Agents

The biggest mistake engineers make with sub-agents: writing vague prompts. Sub-agents start with zero context from your conversation. Brief them like a colleague who just walked into the room:

Bad: "Review the auth changes"
Good: "Review the changes in src/auth/middleware.ts for OWASP Top 10 vulnerabilities. The middleware handles JWT validation and session management. We recently added refresh token rotation — focus on whether the old token is properly invalidated. Report findings only, don't fix."

Key principles:

Explain what and why — the sub-agent doesn't know your task
Specify the output format — "report in under 200 words" or "list findings by severity"
State whether to read or write — "research only, don't edit files"
Use foreground when you need results before proceeding; background for parallel work

Parallel Sub-Agents

You can launch multiple sub-agents simultaneously. This is transformative for investigation work:

"Research the authentication module, the database schema, and the API routing layer in parallel using separate sub-agents. For each, report: what patterns are used, what tests exist, and any inconsistencies."

Three sub-agents run concurrently. You get three focused reports. Your main context stays clean.

Part 3: Strategic Compaction (~5 min)

Every message in your conversation costs tokens — not just when it's sent, but on every subsequent turn. A 100-message session means Claude re-reads all 100 messages every time you ask a question. This is where /compact earns its keep.

What /compact Actually Does

Claude generates a summary of your entire conversation
The summary replaces your full message history
Full history is preserved in the session transcript (Claude can reference it if needed)
All CLAUDE.md files and auto-memory reload from disk
Every subsequent turn now processes the shorter summary instead of the full history

  /compact vs /clear

	/compact	/clear
History	Summarized, retained as context	Deleted entirely
Session	Continues	Resets
Use when	Between tasks in a long session	Starting fresh work

Guided Compaction

This is the feature most people miss. You can tell /compact what to prioritize:

  /compact focusing on the database migration decisions and the three failing test cases

This shapes the summary to preserve what you'll need next, instead of Claude guessing what matters.

Auto-Compaction

At ~95% context capacity, Claude auto-compacts. You can tune the threshold:

  CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50  # Compact at 50% instead of 95%

For long coding sessions, setting this lower means you hit compaction earlier but each turn is cheaper. Across a large engineering organization, this becomes a meaningful cost lever.

The Compaction Workflow

The optimal pattern for long sessions:

Start task — investigate, implement, test
Finish task — /compact focusing on [decisions made, patterns established]
Start next task — clean context, relevant history preserved
Repeat

Each compaction resets your token cost per turn back down. Without it, costs grow linearly with conversation length.

What Doesn't Survive Compaction

Instructions you typed in conversation (put them in CLAUDE.md instead)
Nested CLAUDE.md files from subdirectories (reload when Claude reads matching files again)
Exact file contents (Claude re-reads files it needs)

Key takeaway: Anything you want Claude to remember across compactions belongs in CLAUDE.md or auto-memory — not in conversation messages.

Chapter 2: Burn Less Tokens Today

Problem: Token costs scale with adoption. At enterprise scale, even a modest reduction in per-session token usage saves real money.

Objective: This lesson covers four techniques that reduce token consumption without reducing capability.

Part 1: RTK — The Transparent Token Optimizer (~4 min)

RTK (Rust Token Killer) is a CLI proxy that sits between Claude Code and your shell commands. It compresses command output before Claude sees it, achieving 60-90% token savings on common dev operations.

How It Works

RTK installs as a Claude Code hook. When Claude runs git status, the hook transparently rewrites it to rtk git status. RTK runs the real command, strips noise (ANSI codes, verbose formatting, repetitive output), and returns a compressed result.

You and your engineers type commands normally. The compression is invisible.

Setup

RTK integrates via a PreToolUse hook in your settings.json. Once configured, every Bash command Claude runs goes through RTK automatically. No workflow changes required.

Measuring Savings

  rtk gain              # Cumulative token savings this session
  rtk gain --history    # Per-command breakdown with savings
  rtk discover          # Analyze past sessions for missed RTK opportunities

  rtk discover is particularly valuable during rollout — it reads your Claude Code history and identifies commands that could have been compressed but weren't, helping you tune coverage.

Why This Matters at Scale

Consider: a large engineering org × repeated git/build commands × hundreds of working days. If each command returns 500 tokens of output and RTK compresses that by 70%, the annual savings can quickly reach hundreds of millions of tokens on command output alone. That's before counting the compounding effect — those saved tokens don't get re-read on every subsequent turn either.

Part 2: Pointers Instead of Full Text (~4 min)

The most expensive habit in Claude Code is reading entire files when you need three lines. Every token Claude reads stays in context for the rest of the session (or until compaction). Teaching your engineers to think in "pointers" reduces this dramatically.

Technique 1: Targeted File Reading

The Read tool accepts offset and limit parameters:

  Read file.ts from line 50, 20 lines only

Instead of loading a 2000-line file (burning ~8000 tokens that persist all session), you load 20 lines (~80 tokens). That's a 99% reduction on a single read.

Practical tip: Use Grep first to find the relevant line numbers, then Read with offset/limit. Two cheap operations instead of one expensive one.

Technique 2: Search Before Read

  Grep("handleAuth", glob="src/**/*.ts", output_mode="files_with_matches")

This returns a list of file paths (~50 tokens) instead of file contents (~5000+ tokens). Claude then reads only the relevant file, at the relevant lines.

The anti-pattern is asking Claude to "look through the codebase for authentication logic" without any targeting. That triggers sequential reads of dozens of files. Instead: "grep for handleAuth in src/, then read the top result."

Technique 3: @ Imports in CLAUDE.md

Instead of pasting your coding standards into every CLAUDE.md:

  # CLAUDE.md
  @docs/coding-standards.md
  @docs/api-conventions.md

The content loads once at session start. It doesn't get re-imported or duplicated. And if coding-standards.md changes, every engineer's next session picks it up automatically.

Technique 4: Sub-Agent Isolation for Exploration

When Claude needs to explore broadly (reading 20+ files to understand a system), delegate to a sub-agent. The exploration output stays in the sub-agent's context. Your main session gets a 200-word summary instead of 20 files worth of content.

This is especially impactful for onboarding — new engineers ask broad "how does X work?" questions that trigger extensive exploration. With sub-agents, that exploration is contained.

Part 3: Skills Over Long-Winded Prompts (~4 min)

Every time an engineer types a 500-word prompt describing their deployment checklist, that's 500 words of tokens consumed. If they do it daily, that's 100,000+ tokens/year on prompt text alone — per person.

Skills encapsulate these repeated workflows into single commands.

What Skills Are

A skill is a markdown file in .claude/skills/ with frontmatter and instructions. When invoked with /skill-name, the full content loads into the session. When not invoked, only the one-line description loads (cheap).

# .claude/skills/deploy/SKILL.md
---
name: deploy
description: Walk through deployment checklist for stage and prod environments
---

## Pre-Deploy
1. Run `pnpm test` and verify all pass
2. Run `pnpm lint` and verify clean
3. Check `git status` for uncommitted changes
4. Verify environment variables in `.env.local`

## Stage Deploy
1. Push to stage branch
2. Wait for CI green
3. Verify staging environment

## Prod Deploy
1. Merge main → production
2. Monitor deploy-production.yml
3. Verify Slack notification

Now every engineer types /deploy instead of remembering (or misremembering) the checklist.

Skills vs CLAUDE.md

	CLAUDE.md	Skills
Token cost	Always loaded, every turn	Only when invoked
Best for	Facts, conventions, rules	Procedures, workflows, checklists
Scope	Passive context	Active invocation

The rule of thumb: If it's a fact Claude should always know → CLAUDE.md. If it's a procedure Claude should execute on demand → skill.

Auto-Invocation

Skills with disable-model-invocation: false (the default) can be triggered automatically when Claude determines they're relevant. A skill described as "security review before merging" will fire when an engineer says "I'm ready to merge."

Skills with disable-model-invocation: true only fire on explicit /skill-name invocation. Use this for destructive operations (deploys, database migrations) where you want the engineer to opt in.

Organizational Impact

Check in skills to .claude/skills/ in your repo. Every engineer on the project gets them. Update the skill once, everyone's next session has the update. Compare this to a Confluence page that 30% of the team has bookmarked and 5% have read recently.

For a distributed engineering organization, skills become your executable runbooks. The deploy procedure isn't a document someone might follow — it's a command that Claude follows exactly.

Part 4: The Compound Effect (~3 min)

These techniques compound. Consider a typical 45-minute coding session:

Without optimization:

3 broad codebase explorations: ~50,000 tokens each = 150,000
15 command outputs (git, test, build): ~500 tokens each = 7,500
Those tokens re-read on every turn (~30 turns): 157,500 × 30 = 4,725,000
Plus the engineer's repeated prompt patterns: ~5,000

With RTK + pointers + sub-agents + skills:

Explorations via sub-agents (only summaries in main context): ~2,000 each = 6,000
RTK-compressed command output: ~150 tokens each = 2,250
Strategic compaction after exploration phase resets context
Skills replace repeated prompts: ~50 tokens per invocation

The per-session savings are 60-80%. Across a broad engineering population and a full working year, the math becomes significant.

Chapter 3: Land Better Code The First Time

Problem: The most expensive bug is the one that makes it to production.

Objective: This lesson covers how Claude Code's model selection, review tooling, and execution modes help engineers write correct code on the first pass.

Part 1: Model Selection — The Right Brain for the Job (~4 min)

Claude Code supports three model tiers. Most engineers default to whatever's configured and never think about it. That's leaving capability and cost on the table.

The Three Models

Model	Strength	Default Effort	When to Use
Opus 4.8	Deepest reasoning, best for architectural decisions	high	Complex refactors, system design, multi-file changes with dependencies
Sonnet 4.6	Best balance of speed and capability	high	Day-to-day coding, tests, reviews, most tasks
Haiku 4.5	Fast, cheap, good for scoped work	N/A (no effort control)	Exploration, simple edits, quick lookups

The opusplan Shortcut

Run /model opusplan and you get the best of both worlds: Opus for planning, automatic switch to Sonnet for execution. This means your architect-level reasoning happens at Opus quality, but the actual code generation runs at Sonnet cost.

This is particularly effective for tasks where getting the approach right matters more than the implementation mechanics — refactoring a state machine, redesigning an API surface, restructuring a database schema.

Sub-Agent Model Overrides

You can assign different models to different sub-agents:

# .claude/agents/explorer.md
---
name: explorer
model: haiku
tools: Read, Grep, Glob
---

Haiku for exploration (fast, cheap), Sonnet for implementation (balanced), Opus for review (thorough). Your team can standardize these patterns across the org.

Part 2: Effort Levels — Depth on Demand (~3 min)

Effort level controls how much Claude thinks on each step. This is different from model selection — it's about reasoning depth within a given model.

The Spectrum

Level	Behavior	Use Case
low	Minimal reasoning, fast	Trivial edits, formatting, renames
medium	Moderate reasoning	Standard bug fixes, feature adds
high	Default. Full reasoning	Most development work
xhigh	Deep analysis	Complex debugging, security review
max	No token constraint on thinking	Architecture decisions, novel problems (session-only, can't persist)

Set with /effort or --effort at startup. You can also set effortLevel in project settings to standardize across the team.

ultracode — Dynamic Effort

The ultracode setting sends xhigh effort to the model and automatically orchestrates dynamic workflows for substantive tasks. It's the "I need Claude to be thorough and proactive" mode — useful for complex multi-file changes where Claude needs to investigate before implementing.

Part 3: /fast Mode — Speed Without Sacrifice (~2 min)

This is the most misunderstood feature. /fast mode does not change the model or reduce quality. It's the same Opus model with a different API configuration that prioritizes output speed.

2.5x faster response latency
Identical quality — same model, same reasoning
Higher per-token cost (roughly 3-10x depending on model version)

Toggle with /fast. Use it during:

Live debugging sessions where you're iterating rapidly
Demos and pair programming where latency breaks flow
Time-sensitive incident response

It shares a rate limit pool, so when you hit the ceiling, it falls back to standard speed automatically (indicated by a gray icon) and re-enables when the cooldown expires.

The cost tradeoff is explicit: you're paying more per token for lower latency. At enterprise scale, you'd gate this to specific use cases — incident response, time-critical demos — rather than enabling it org-wide.

Part 4: Automated Code Reviews (~4 min)

Claude Code has two review mechanisms, and most users only know about one.

GitHub App Reviews (Automated)

When configured, Claude automatically reviews every PR with multiple specialist agents running in parallel:

Multiple agents analyze different aspects (security, logic, style) concurrently
A verification step checks for false positives
Results are deduplicated and ranked
Findings post as inline comments on changed lines

Severity levels:

Important (red) — bugs that should be fixed before merge
Nit (yellow) — minor issues, worth fixing but not blocking
Pre-existing (purple) — bugs in the codebase that weren't introduced by this PR

The pre-existing category is particularly valuable — it surfaces tech debt contextually, at the moment someone is already working in that area of code.

Trigger with @claude review on any PR, or configure for automatic review on every push.

Local CLI Reviews

Run /code-review (or the skill shorthand) for a review without the GitHub App:

  /code-review high          # Thorough local review
  /code-review --fix         # Review and auto-fix findings
  /code-review --comment     # Review and post as GitHub PR comments

Effort levels apply here too — low catches obvious issues quickly, high does deep analysis. The --fix flag is powerful: Claude identifies the issue and applies the fix in one pass. For straightforward findings (unused imports, missing null checks, inconsistent naming), this eliminates the review-fix-re-review cycle.

Custom Review Rules

Two files shape review behavior:

CLAUDE.md — project conventions. Violations flagged as nits.
REVIEW.md — review-specific rules with highest priority. Put your "never merge without" rules here.

For a distributed engineering organization, REVIEW.md becomes your enforceable code quality standard. Not a style guide people might read — a review rule Claude applies on every PR.

  /code-review --fix — The Unified Review Pass

After implementation, run /code-review --fix. It reviews your changes for correctness bugs and applies cleanup fixes — code reuse, unnecessary complexity, and efficiency improvements — all in one pass. The effort level tunes depth:

  /code-review --fix        # Full review + auto-fix at default effort
  /code-review high --fix   # Deep analysis + auto-fix
  /code-review              # Review only (report findings, no fixes)

For a quicker cleanup-only pass that skips bug-hunting, the /simplify command still runs a focused review covering reuse, simplification, efficiency, and altitude — and applies those fixes without the full correctness audit.

Part 5: Multi-Model Workflows in Practice (~2 min)

Pulling it all together, here's how a senior engineer might handle a complex task:

Plan with Opus — /model opusplan, describe the task. Opus reasons through the architecture, identifies risks, proposes an approach.
Implement with Sonnet — opusplan auto-switches. Sonnet generates the code efficiently.
Explore with Haiku sub-agents — Delegate codebase research to Haiku agents running in parallel. Fast, cheap, keeps main context clean.
Review with /code-review high — Thorough analysis of the changes.
Finalize with /code-review --fix — Apply fixes and cleanup in one pass (or /simplify for cleanup only).
Fast-iterate with /fast — If there's a bug in the review findings, toggle fast mode for rapid fix-test cycles.

Each step uses the right model at the right effort level. The engineer doesn't pay Opus prices for boilerplate implementation or Sonnet prices for quick file lookups.

The org-level insight: Standardize these patterns in your CLAUDE.md and custom agents. When teams follow the same plan-implement-review workflow with appropriate model selection, you get more consistent code quality at optimized cost. The patterns are checked into the repo, not locked in anyone's head.