Teach Me Something New About ClaudeCode in 15 Minutes
A ClosedLoop.AI pre-read chapter for engineers learning the Claude Code primitives that compound speed, cost efficiency, and first-pass code quality.
Module Chapters
- Give Claude Better Context
- Burn Less Tokens Today
- Land Better Code The First Time
Chapter 1: Give Claude Better Context
Objective: This lesson covers the three mechanisms that fix that — persistent memory, sub-agent delegation, and strategic compaction.
Part 1: The Memory System (~5 min)
Most users know about CLAUDE.md — the project instructions file checked into your repo. What they don't know is that Claude Code has a four-layer memory hierarchy and an auto-memory system that learns from your sessions.
The Four Layers
Memory loads from broadest to narrowest scope, each layer adding context:
| Layer | Location | Who writes it | Scope |
|---|---|---|---|
| Managed policy | /Library/Application Support/ClaudeCode/CLAUDE.md | IT/Platform team | Org-wide, cannot be excluded |
| User | ~/.claude/CLAUDE.md | Individual developer | All their projects |
| Project | ./CLAUDE.md or ./.claude/CLAUDE.md | Team | Checked into version control |
| Local | ./CLAUDE.local.md | Individual developer | .gitignore'd, personal preferences |
The org implication: That managed policy layer means your platform team can enforce standards — security rules, code style, compliance constraints — across a large engineering organization without relying on anyone reading a wiki. Claude loads it at session start, every time, and it cannot be excluded.
Auto-Memory: Claude Learns From Your Sessions
Run /memory to see what Claude has learned about you. It stores insights in ~/.claude/projects/<project>/memory/ as plain markdown files across four types:
- User memories — your role, expertise, preferences ("senior Go engineer, new to React")
- Feedback memories — corrections and confirmed approaches ("don't mock the database in integration tests")
- Project memories — ongoing work context ("merge freeze starts June 5 for mobile release")
- Reference memories — pointers to external systems ("pipeline bugs tracked in Linear project INGEST")
This is not a black box. These are markdown files you can read, edit, and delete. The MEMORY.md index is capped at 200 lines and loads every session. Topic files load on-demand.
Why this matters at scale: When an engineer corrects Claude once — "we use Zod for validation, not manual type checks" — that correction persists across sessions. Multiply that across many engineers and teams teaching Claude their local conventions, and you've got an institutional knowledge system that reduces repeated mistakes org-wide.
Path-Scoped Rules: Conditional Context
Most users put everything in one CLAUDE.md. The better pattern is .claude/rules/:
# .claude/rules/api-security.md
---
paths:
- "src/api/**/*.ts"
- "src/middleware/**/*.ts"
---
All endpoints must validate input with Zod schemas.
Never return raw database errors to clients.
Use the Result<T> error model.
These rules only load when Claude reads matching files. Your API security rules don't burn tokens when someone's editing a React component. For a large monorepo, this is the difference between 200 lines of always-loaded context and targeted injection of relevant standards.
The @ Import System
CLAUDE.md supports file imports:
@docs/coding-standards.md @packages/database/CLAUDE.md
These resolve relative to the importing file, support up to 4 levels of nesting, and expand at session start. This lets you modularize instructions instead of maintaining one massive file.
Part 2: Sub-Agent Delegation (~5 min)
This is the feature most power users underutilize. When you ask Claude to "search the codebase for all authentication patterns," it reads dozens of files — and every one of those files stays in your context window, consuming tokens on every subsequent turn.
Sub-agents solve this. They run in isolated context windows, do the heavy lifting, and return only a summary to your main session.
How It Works
Claude spawns a sub-agent with its own system prompt, tool restrictions, and context. The sub-agent explores, analyzes, or implements — then sends back a concise result. The verbose exploration output (file contents, grep results, error traces) stays in the sub-agent's context, not yours.
Built-In Sub-Agents
- Explore (runs on Haiku — fast/cheap): File discovery and codebase analysis. Three thoroughness levels: quick, medium, very thorough.
- Plan (inherits your model): Read-only codebase research before planning changes.
- General-purpose: Full tool access for complex multi-step tasks.
Creating Custom Sub-Agents
This is where it gets interesting at enterprise scale. Run /agents → Library → Create New, or create a markdown file:
# .claude/agents/security-reviewer.md
---
name: security-reviewer
description: Review code changes for security vulnerabilities. Use proactively.
tools: Read, Grep, Glob, Bash
model: sonnet
maxTurns: 10
---
You are a security specialist. For every code change:
1. Check for injection vulnerabilities (SQL, XSS, command)
2. Verify input validation at system boundaries
3. Check for exposed secrets or API keys
4. Verify authentication/authorization checks
Report findings by severity: Critical, High, Medium, Low.
Drop this in .claude/agents/ and it's available to every engineer on the project. The description field includes "Use proactively" — meaning Claude will automatically delegate security reviews without being asked.
Prompt Best Practices for Sub-Agents
The biggest mistake engineers make with sub-agents: writing vague prompts. Sub-agents start with zero context from your conversation. Brief them like a colleague who just walked into the room:
Bad: "Review the auth changes"
Good: "Review the changes in src/auth/middleware.ts for OWASP Top 10 vulnerabilities. The middleware handles JWT validation and session management. We recently added refresh token rotation — focus on whether the old token is properly invalidated. Report findings only, don't fix."
Key principles:
- Explain what and why — the sub-agent doesn't know your task
- Specify the output format — "report in under 200 words" or "list findings by severity"
- State whether to read or write — "research only, don't edit files"
- Use foreground when you need results before proceeding; background for parallel work
Parallel Sub-Agents
You can launch multiple sub-agents simultaneously. This is transformative for investigation work:
"Research the authentication module, the database schema, and the API routing layer in parallel using separate sub-agents. For each, report: what patterns are used, what tests exist, and any inconsistencies."
Three sub-agents run concurrently. You get three focused reports. Your main context stays clean.
Part 3: Strategic Compaction (~5 min)
Every message in your conversation costs tokens — not just when it's sent, but on every subsequent turn. A 100-message session means Claude re-reads all 100 messages every time you ask a question. This is where /compact earns its keep.
What /compact Actually Does
- Claude generates a summary of your entire conversation
- The summary replaces your full message history
- Full history is preserved in the session transcript (Claude can reference it if needed)
- All CLAUDE.md files and auto-memory reload from disk
- Every subsequent turn now processes the shorter summary instead of the full history
/compact vs /clear
| /compact | /clear | |
|---|---|---|
| History | Summarized, retained as context | Deleted entirely |
| Session | Continues | Resets |
| Use when | Between tasks in a long session | Starting fresh work |
Guided Compaction
This is the feature most people miss. You can tell /compact what to prioritize:
/compact focusing on the database migration decisions and the three failing test cases
This shapes the summary to preserve what you'll need next, instead of Claude guessing what matters.
Auto-Compaction
At ~95% context capacity, Claude auto-compacts. You can tune the threshold:
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50 # Compact at 50% instead of 95%
For long coding sessions, setting this lower means you hit compaction earlier but each turn is cheaper. Across a large engineering organization, this becomes a meaningful cost lever.
The Compaction Workflow
The optimal pattern for long sessions:
- Start task — investigate, implement, test
- Finish task — /compact focusing on [decisions made, patterns established]
- Start next task — clean context, relevant history preserved
- Repeat
Each compaction resets your token cost per turn back down. Without it, costs grow linearly with conversation length.
What Doesn't Survive Compaction
- Instructions you typed in conversation (put them in CLAUDE.md instead)
- Nested CLAUDE.md files from subdirectories (reload when Claude reads matching files again)
- Exact file contents (Claude re-reads files it needs)
Key takeaway: Anything you want Claude to remember across compactions belongs in CLAUDE.md or auto-memory — not in conversation messages.
Chapter 2: Burn Less Tokens Today
Objective: This lesson covers four techniques that reduce token consumption without reducing capability.
Part 1: RTK — The Transparent Token Optimizer (~4 min)
RTK (Rust Token Killer) is a CLI proxy that sits between Claude Code and your shell commands. It compresses command output before Claude sees it, achieving 60-90% token savings on common dev operations.
How It Works
RTK installs as a Claude Code hook. When Claude runs git status, the hook transparently rewrites it to rtk git status. RTK runs the real command, strips noise (ANSI codes, verbose formatting, repetitive output), and returns a compressed result.
You and your engineers type commands normally. The compression is invisible.
Setup
RTK integrates via a PreToolUse hook in your settings.json. Once configured, every Bash command Claude runs goes through RTK automatically. No workflow changes required.
Measuring Savings
rtk gain # Cumulative token savings this session
rtk gain --history # Per-command breakdown with savings
rtk discover # Analyze past sessions for missed RTK opportunities
rtk discover is particularly valuable during rollout — it reads your Claude Code history and identifies commands that could have been compressed but weren't, helping you tune coverage.
Why This Matters at Scale
Consider: a large engineering org × repeated git/build commands × hundreds of working days. If each command returns 500 tokens of output and RTK compresses that by 70%, the annual savings can quickly reach hundreds of millions of tokens on command output alone. That's before counting the compounding effect — those saved tokens don't get re-read on every subsequent turn either.
Part 2: Pointers Instead of Full Text (~4 min)
The most expensive habit in Claude Code is reading entire files when you need three lines. Every token Claude reads stays in context for the rest of the session (or until compaction). Teaching your engineers to think in "pointers" reduces this dramatically.
Technique 1: Targeted File Reading
The Read tool accepts offset and limit parameters:
Read file.ts from line 50, 20 lines only
Instead of loading a 2000-line file (burning ~8000 tokens that persist all session), you load 20 lines (~80 tokens). That's a 99% reduction on a single read.
Practical tip: Use Grep first to find the relevant line numbers, then Read with offset/limit. Two cheap operations instead of one expensive one.
Technique 2: Search Before Read
Grep("handleAuth", glob="src/**/*.ts", output_mode="files_with_matches")
This returns a list of file paths (~50 tokens) instead of file contents (~5000+ tokens). Claude then reads only the relevant file, at the relevant lines.
The anti-pattern is asking Claude to "look through the codebase for authentication logic" without any targeting. That triggers sequential reads of dozens of files. Instead: "grep for handleAuth in src/, then read the top result."
Technique 3: @ Imports in CLAUDE.md
Instead of pasting your coding standards into every CLAUDE.md:
# CLAUDE.md
@docs/coding-standards.md
@docs/api-conventions.md
The content loads once at session start. It doesn't get re-imported or duplicated. And if coding-standards.md changes, every engineer's next session picks it up automatically.
Technique 4: Sub-Agent Isolation for Exploration
When Claude needs to explore broadly (reading 20+ files to understand a system), delegate to a sub-agent. The exploration output stays in the sub-agent's context. Your main session gets a 200-word summary instead of 20 files worth of content.
This is especially impactful for onboarding — new engineers ask broad "how does X work?" questions that trigger extensive exploration. With sub-agents, that exploration is contained.
Part 3: Skills Over Long-Winded Prompts (~4 min)
Every time an engineer types a 500-word prompt describing their deployment checklist, that's 500 words of tokens consumed. If they do it daily, that's 100,000+ tokens/year on prompt text alone — per person.
Skills encapsulate these repeated workflows into single commands.
What Skills Are
A skill is a markdown file in .claude/skills/ with frontmatter and instructions. When invoked with /skill-name, the full content loads into the session. When not invoked, only the one-line description loads (cheap).
# .claude/skills/deploy/SKILL.md
---
name: deploy
description: Walk through deployment checklist for stage and prod environments
---
## Pre-Deploy
1. Run `pnpm test` and verify all pass
2. Run `pnpm lint` and verify clean
3. Check `git status` for uncommitted changes
4. Verify environment variables in `.env.local`
## Stage Deploy
1. Push to stage branch
2. Wait for CI green
3. Verify staging environment
## Prod Deploy
1. Merge main → production
2. Monitor deploy-production.yml
3. Verify Slack notification
Now every engineer types /deploy instead of remembering (or misremembering) the checklist.
Skills vs CLAUDE.md
| CLAUDE.md | Skills | |
|---|---|---|
| Token cost | Always loaded, every turn | Only when invoked |
| Best for | Facts, conventions, rules | Procedures, workflows, checklists |
| Scope | Passive context | Active invocation |
The rule of thumb: If it's a fact Claude should always know → CLAUDE.md. If it's a procedure Claude should execute on demand → skill.
Auto-Invocation
Skills with disable-model-invocation: false (the default) can be triggered automatically when Claude determines they're relevant. A skill described as "security review before merging" will fire when an engineer says "I'm ready to merge."
Skills with disable-model-invocation: true only fire on explicit /skill-name invocation. Use this for destructive operations (deploys, database migrations) where you want the engineer to opt in.
Organizational Impact
Check in skills to .claude/skills/ in your repo. Every engineer on the project gets them. Update the skill once, everyone's next session has the update. Compare this to a Confluence page that 30% of the team has bookmarked and 5% have read recently.
For a distributed engineering organization, skills become your executable runbooks. The deploy procedure isn't a document someone might follow — it's a command that Claude follows exactly.
Part 4: The Compound Effect (~3 min)
These techniques compound. Consider a typical 45-minute coding session:
Without optimization:
- 3 broad codebase explorations: ~50,000 tokens each = 150,000
- 15 command outputs (git, test, build): ~500 tokens each = 7,500
- Those tokens re-read on every turn (~30 turns): 157,500 × 30 = 4,725,000
- Plus the engineer's repeated prompt patterns: ~5,000
With RTK + pointers + sub-agents + skills:
- Explorations via sub-agents (only summaries in main context): ~2,000 each = 6,000
- RTK-compressed command output: ~150 tokens each = 2,250
- Strategic compaction after exploration phase resets context
- Skills replace repeated prompts: ~50 tokens per invocation
The per-session savings are 60-80%. Across a broad engineering population and a full working year, the math becomes significant.
Chapter 3: Land Better Code The First Time
Objective: This lesson covers how Claude Code's model selection, review tooling, and execution modes help engineers write correct code on the first pass.
Part 1: Model Selection — The Right Brain for the Job (~4 min)
Claude Code supports three model tiers. Most engineers default to whatever's configured and never think about it. That's leaving capability and cost on the table.
The Three Models
| Model | Strength | Default Effort | When to Use |
|---|---|---|---|
| Opus 4.8 | Deepest reasoning, best for architectural decisions | high | Complex refactors, system design, multi-file changes with dependencies |
| Sonnet 4.6 | Best balance of speed and capability | high | Day-to-day coding, tests, reviews, most tasks |
| Haiku 4.5 | Fast, cheap, good for scoped work | N/A (no effort control) | Exploration, simple edits, quick lookups |
The opusplan Shortcut
Run /model opusplan and you get the best of both worlds: Opus for planning, automatic switch to Sonnet for execution. This means your architect-level reasoning happens at Opus quality, but the actual code generation runs at Sonnet cost.
This is particularly effective for tasks where getting the approach right matters more than the implementation mechanics — refactoring a state machine, redesigning an API surface, restructuring a database schema.
Sub-Agent Model Overrides
You can assign different models to different sub-agents:
# .claude/agents/explorer.md
---
name: explorer
model: haiku
tools: Read, Grep, Glob
---
Haiku for exploration (fast, cheap), Sonnet for implementation (balanced), Opus for review (thorough). Your team can standardize these patterns across the org.
Part 2: Effort Levels — Depth on Demand (~3 min)
Effort level controls how much Claude thinks on each step. This is different from model selection — it's about reasoning depth within a given model.
The Spectrum
| Level | Behavior | Use Case |
|---|---|---|
| low | Minimal reasoning, fast | Trivial edits, formatting, renames |
| medium | Moderate reasoning | Standard bug fixes, feature adds |
| high | Default. Full reasoning | Most development work |
| xhigh | Deep analysis | Complex debugging, security review |
| max | No token constraint on thinking | Architecture decisions, novel problems (session-only, can't persist) |
Set with /effort or --effort at startup. You can also set effortLevel in project settings to standardize across the team.
ultracode — Dynamic Effort
The ultracode setting sends xhigh effort to the model and automatically orchestrates dynamic workflows for substantive tasks. It's the "I need Claude to be thorough and proactive" mode — useful for complex multi-file changes where Claude needs to investigate before implementing.
Part 3: /fast Mode — Speed Without Sacrifice (~2 min)
This is the most misunderstood feature. /fast mode does not change the model or reduce quality. It's the same Opus model with a different API configuration that prioritizes output speed.
- 2.5x faster response latency
- Identical quality — same model, same reasoning
- Higher per-token cost (roughly 3-10x depending on model version)
Toggle with /fast. Use it during:
- Live debugging sessions where you're iterating rapidly
- Demos and pair programming where latency breaks flow
- Time-sensitive incident response
It shares a rate limit pool, so when you hit the ceiling, it falls back to standard speed automatically (indicated by a gray icon) and re-enables when the cooldown expires.
The cost tradeoff is explicit: you're paying more per token for lower latency. At enterprise scale, you'd gate this to specific use cases — incident response, time-critical demos — rather than enabling it org-wide.
Part 4: Automated Code Reviews (~4 min)
Claude Code has two review mechanisms, and most users only know about one.
GitHub App Reviews (Automated)
When configured, Claude automatically reviews every PR with multiple specialist agents running in parallel:
- Multiple agents analyze different aspects (security, logic, style) concurrently
- A verification step checks for false positives
- Results are deduplicated and ranked
- Findings post as inline comments on changed lines
Severity levels:
- Important (red) — bugs that should be fixed before merge
- Nit (yellow) — minor issues, worth fixing but not blocking
- Pre-existing (purple) — bugs in the codebase that weren't introduced by this PR
The pre-existing category is particularly valuable — it surfaces tech debt contextually, at the moment someone is already working in that area of code.
Trigger with @claude review on any PR, or configure for automatic review on every push.
Local CLI Reviews
Run /code-review (or the skill shorthand) for a review without the GitHub App:
/code-review high # Thorough local review
/code-review --fix # Review and auto-fix findings
/code-review --comment # Review and post as GitHub PR comments
Effort levels apply here too — low catches obvious issues quickly, high does deep analysis. The --fix flag is powerful: Claude identifies the issue and applies the fix in one pass. For straightforward findings (unused imports, missing null checks, inconsistent naming), this eliminates the review-fix-re-review cycle.
Custom Review Rules
Two files shape review behavior:
- CLAUDE.md — project conventions. Violations flagged as nits.
- REVIEW.md — review-specific rules with highest priority. Put your "never merge without" rules here.
For a distributed engineering organization, REVIEW.md becomes your enforceable code quality standard. Not a style guide people might read — a review rule Claude applies on every PR.
/code-review --fix — The Unified Review Pass
After implementation, run /code-review --fix. It reviews your changes for correctness bugs and applies cleanup fixes — code reuse, unnecessary complexity, and efficiency improvements — all in one pass. The effort level tunes depth:
/code-review --fix # Full review + auto-fix at default effort
/code-review high --fix # Deep analysis + auto-fix
/code-review # Review only (report findings, no fixes)
For a quicker cleanup-only pass that skips bug-hunting, the /simplify command still runs a focused review covering reuse, simplification, efficiency, and altitude — and applies those fixes without the full correctness audit.
Part 5: Multi-Model Workflows in Practice (~2 min)
Pulling it all together, here's how a senior engineer might handle a complex task:
- Plan with Opus — /model opusplan, describe the task. Opus reasons through the architecture, identifies risks, proposes an approach.
- Implement with Sonnet — opusplan auto-switches. Sonnet generates the code efficiently.
- Explore with Haiku sub-agents — Delegate codebase research to Haiku agents running in parallel. Fast, cheap, keeps main context clean.
- Review with /code-review high — Thorough analysis of the changes.
- Finalize with /code-review --fix — Apply fixes and cleanup in one pass (or /simplify for cleanup only).
- Fast-iterate with /fast — If there's a bug in the review findings, toggle fast mode for rapid fix-test cycles.
Each step uses the right model at the right effort level. The engineer doesn't pay Opus prices for boilerplate implementation or Sonnet prices for quick file lookups.
The org-level insight: Standardize these patterns in your CLAUDE.md and custom agents. When teams follow the same plan-implement-review workflow with appropriate model selection, you get more consistent code quality at optimized cost. The patterns are checked into the repo, not locked in anyone's head.