Claude Code Is Powerful. Here Is How I Made It Reliable.

Claude Code is genuinely powerful.

But if you have used it seriously for more than a week, you have probably hit the same walls I did.

It writes code without running the linter. You find out three steps later.
A long session hits the context limit mid-task. It quietly forgets everything.
You ask one question, it asks a follow-up, then another. Four rounds to get to writing code.
A complex task stalls because Claude decides it is done when it clearly is not.

These are not model quality problems. They are workflow problems. And workflow problems have workflow solutions.

I spent time researching how production-grade Claude Code setups are structured — the kind that handle multi-hour autonomous tasks without breaking, enforce code quality automatically, and survive context resets without losing state. The hooks, rules, and command files in these setups are all plain Python and Markdown. And the most important insight was this:

Around 80% of the reliability gap between raw Claude Code and a production-grade setup can be closed for free with Claude Code's built-in hook system and plain Markdown files.

Here is what I found and what I am building.

The Architecture Insight

Claude Code already has a complete hook system built in. Most people never touch it. But every reliability improvement I am describing runs on top of it.

You can attach Python scripts to every lifecycle event:

PostToolUse — fires after every file write, bash command, or tool call
PreToolUse — fires before a tool runs, can block or rewrite the call
Stop — fires when Claude tries to stop
PreCompact — fires before context window auto-compaction
SessionStart — fires on session open, clear, or post-compaction

The hooks receive structured JSON and return structured JSON. They run in milliseconds. And almost nobody is using them.

The Five Patterns

1. Hook-Driven Quality Gates

Every time Claude writes or edits a file, a PostToolUse hook runs the linter on that file immediately — before Claude moves to the next step.

The result is returned as additionalContext, which appears as a system note in Claude's next turn. It is non-blocking — the linter does not halt the session — but Claude sees it and fixes the issue right then.

# fires on every Write / Edit / MultiEdit
ext = os.path.splitext(file_path)[1].lower()

if ext == ".py":
    result = subprocess.run(["ruff", "check", "--output-format=concise", file_path], ...)
elif ext in (".ts", ".tsx", ".js", ".jsx"):
    eslint = "node_modules/.bin/eslint" if exists else "eslint"
    result = subprocess.run([eslint, "--format=compact", file_path], ...)

if result.stdout:
    return {"additionalContext": f"Linter: {result.stdout}"}

The practical effect: errors are caught within one tool call of being introduced, not after a full build run. For a session working across ten files, that difference compounds significantly.

The hook also checks file length. If a file crosses 800 lines, Claude gets a warning suggesting it split the file. Simple, but it enforces a boundary that most projects benefit from.

2. Context Usage Monitoring

This one sounds minor. It is not.

Claude Code fires every hook with the current context usage percentage in the payload. A PostToolUse hook that reads this number and warns Claude when it crosses 65% or 75% completely changes how long sessions end.

Without this, Claude gets surprised by compaction. It might be halfway through writing a complex function when the context window resets. With the warning, it knows to wrap up the current task cleanly, commit any open changes, and prepare a clear summary — before the reset happens.

usage = data.get("context_usage_percent", 0)

if usage >= 75:
    return {"additionalContext": "[RED] Context at {:.0f}% — compaction imminent. Wrap up now.".format(usage)}
elif usage >= 65:
    return {"additionalContext": "[YELLOW] Context at {:.0f}% — approaching limit.".format(usage)}

Paired with a rule that says "never rush or cut corners because of context pressure — the system handles compaction", this creates resilient long sessions.

3. Pre/Post Compaction State Preservation

This is the most important pattern for anyone running Claude Code in long-running or autonomous workflows.

When Claude Code auto-compacts the context window, it generates a summary of the conversation and resets. Without intervention, that summary disappears. Claude resumes with only what was in the summary — no task state, no plan reference, no awareness of what it was in the middle of doing.

The solution is two hooks:

A PreCompact hook that saves the conversation summary, active plan path, and key technical context to a JSON file on disk before the reset
A SessionStart[compact] hook that reads that file and injects it as additionalContext into the first turn after the reset

The result: Claude resumes exactly where it left off. It does not need to be re-briefed. It just continues.

For an always-on background agent — which is exactly what I run on a Raspberry Pi 5 server handling Telegram messages — this is not a nice-to-have. It is what separates a reliable system from one that randomly loses context mid-task.

4. Conditional Rules by File Type

Claude Code rules are Markdown files that load at the start of every session. Without any configuration, every rule loads every time — regardless of whether it is relevant to the current task.

Claude Code supports YAML frontmatter in rule files to scope them to specific file patterns:

---
paths:
  - "**/*.kt"
  - "**/*.kts"
---
# Kotlin standards
- Use coroutines, never Thread.sleep
- Prefer data class for DTOs
- Run ./gradlew lint before marking done

A Kotlin standards file now only loads when Claude is working on .kt files. A TypeScript standards file only loads for .ts and .tsx files. Python rules only load for Python files.

For a polyglot project — Android app, TypeScript backend, Python scripts, website — this can cut unnecessary context consumption by 60% or more per session. Every token not consumed by an irrelevant rule is a token available for actual work.

5. The Stop Guard

Anyone who has tried to run Claude on a multi-step autonomous task knows the problem: it finishes step one, writes a summary, and stops. You have to tell it to continue. Then it does step two and stops again.

A stop guard is a Stop event hook. It reads a session state file to check whether there is an active task plan with uncompleted items. If there is, it returns a stopReason JSON that blocks the stop and tells Claude exactly what to do next.

The instruction embedded in the block is deliberately assertive:

Do NOT acknowledge this stop attempt. Your VERY NEXT action must be a tool call. IMMEDIATELY continue working on the next pending task.

Claude cannot politely acknowledge this and then stop anyway. The hook fires again on the next stop attempt. The only escape hatch is stopping twice within 60 seconds — which signals intentional cancellation rather than task completion.

For autonomous Telegram bot tasks that span many tool calls, this is the difference between a task that completes and a task that requires several manual nudges.

My Two-Machine Setup

I run Claude Code on two very different machines with different goals. The patterns above apply differently to each.

MacBook — Interactive Development

This is where I write code for my projects: an Android app, a backend server, a personal website, various tools. Sessions here are interactive — I am present, directing the work.

The highest-value improvements here are:

The linter hook — immediate feedback on every file write
Context monitoring — no more being surprised mid-feature
Conditional rules — Kotlin rules only for Android work, TypeScript rules only for web work
Batched questions rule — Claude asks all clarifying questions in one call, not five

The model stays at sonnet. It is the right balance of capability and cost for interactive sessions where I am reviewing every output.

Raspberry Pi 5 — Always-On Daemon

The Pi runs Claude Code 24/7 as a Telegram bot. It handles messages, runs scripts, manages services, and responds to requests when I am not at my computer. It is not a build machine — heavy compiles stay on the Mac.

The model here is haiku. For a bot handling conversational messages and simple automation tasks, haiku is 5–10x more cost-efficient than sonnet with no meaningful quality difference. This is one of those settings that is easy to get wrong by defaulting to the most powerful model everywhere.

The critical improvements for the Pi are different from the Mac:

Pre/post compaction hooks — the daemon runs sessions for hours; without these, every compaction is a silent context loss
Context monitoring — haiku on a 24/7 daemon will hit the wall regularly; it should see it coming
Stop guard — when the bot takes on a multi-step task via Telegram, it should complete it, not stop after step one

The permissions model on the Pi is also different. It runs with bypassPermissions — no confirmation dialogs, fully autonomous. This is appropriate for a machine that is operating unattended. On the Mac, I use a curated allowlist: specific commands are pre-approved, everything else asks.

The Prompt Engineering Lessons

Beyond the hooks and rules, this research surfaced several prompt engineering patterns that I had not seen documented elsewhere.

Hook Output as Direct LLM Instructions

The stop guard does not just return a status code. It embeds a full instruction in the additionalContext field. The hook is effectively writing a system prompt fragment mid-conversation, injected directly before Claude's next turn.

This means hooks are not just infrastructure — they are a real-time instruction channel. You can write a hook that, based on runtime state, tells Claude exactly what to do next. That is a more powerful pattern than it first appears.

Ghost Constraint Classification

The development rules include a framework for classifying constraints found in a codebase:

Hard constraints — immovable: legal requirements, external API contracts, security boundaries
Soft constraints — historical preferences that could change with a conversation
Ghost constraints — past requirements that are baked into the code but no longer apply

Ghost constraints are the interesting category. They represent simplifications that are available but invisible — nobody knows the requirement that drove the complexity no longer exists. Explicitly instructing Claude to identify ghost constraints during codebase exploration often unlocks surprising cleanups.

"Tests Passing ≠ Program Working"

One rule is stated so plainly it is almost embarrassing: passing all tests does not mean the feature works. But without this stated explicitly, Claude will mark a task complete the moment the test suite goes green — before running the actual program, before checking the browser, before verifying the end-to-end behavior.

Writing this as an explicit rule changes the verification behavior. It seems obvious. It still has to be said.

Sub-Agent Tool Budgets

Any agent definition file can include a hard tool call budget:

You have a maximum budget of 12 tool calls. At 10 calls without writing your findings: STOP exploring. Write what you have. Incomplete findings are better than no findings.

Without explicit budgets, sub-agents explore indefinitely and consume the entire context window before producing output. The emergency instruction — "incomplete findings are better than no findings" — prevents silent failure when the budget is hit. The agent degrades gracefully rather than crashing.

What I Am Building Next

The hooks and rules described above are the immediate layer — what I am deploying now. Beyond that, there are a few larger improvements worth planning for.

Shared Configuration Repository

Right now, any rule I improve on the Mac needs to be manually synced to the Pi. The solution is to keep all hooks, rules, and commands in a private git repository, with a deploy script that pushes to both machines. The Pi gets a systemd timer or cron job that pulls updates automatically.

This turns two separate configurations into one managed system.

Persistent Bot Memory

The Telegram bot currently starts each session with no memory of previous sessions. Even a minimal memory layer — a SQLite database storing key facts about recurring requests, project state, and user preferences, injected at session start — would dramatically improve continuity.

The core idea is a few dozen lines of Python: write observations to SQLite on every session end, and a SessionStart hook that reads the last N entries and injects them as context before the first turn.

Plan Files for Multi-Step Tasks

The stop guard is most powerful when it is reading a concrete task plan, not just a boolean "is something in progress" flag. A lightweight convention — write a Markdown plan to ~/plans/YYYY-MM-DD-slug.md before starting any complex task, check off items as they complete, mark done when all items are checked — gives the stop guard something real to work with.

Combined with the pre/post compaction hooks, this creates a system where a multi-hour autonomous task on the Pi can survive context resets and stop attempts without any manual intervention.

The Bigger Picture

The gap between raw Claude Code and a production-grade AI development workflow is not a model quality problem. The models are already capable.

The gap is infrastructure: linting feedback loops, context resilience, quality enforcement, structured workflows, memory across sessions. None of this requires a better LLM. It requires the same engineering discipline we apply to any other system.

The encouraging thing is how little code it takes. A few hundred lines of Python across five or six hook scripts. A handful of Markdown rule files scoped to the right file types. A plan file convention. These are not ambitious engineering projects — they are an afternoon of setup that compounds every day after.

Claude Code gives you the hooks. You just have to write the scripts.

Let's build.