Skip to content

Engineering notes from the agent era

Most posts here are about what changes when the agent becomes a first-class collaborator: local code intelligence, testing systems that analyze behavior over time, knowledge tooling that compounds. The rest of the career (cloud, low-latency, mobile) shows up when something's worth writing down.

A clean editorial illustration evoking record, replay, diff, and judge as a temporal evaluation loop
Apr 25, 2026 · 20 min read

Backtesting AI Agents: Replay to Catch Regressions

54% of enterprises ship AI agents in production. Most cannot tell when a CLAUDE.md edit silently regresses behavior. Backtesting is the missing discipline.

Apr 25 18 min read

Context Engineering in Practice: Where Does Each Piece Go?

Context engineering became the #1 2026 skill shift. Anthropic's research notes context exhibits n² token relationships. Here's the per-surface decision framework.

  • context engineering
  • Claude Code
  • MCP
  • AI agents
  • developer productivity
Apr 18 31 min read

Treat AI as a Team Member, Not a Chat Window

84% of developers use AI, 46% distrust it. The right scaffolding (constitution, skills, memory, MCP, subagents) turns an assistant into a team member.

  • AI agents
  • developer productivity
  • Claude Code
  • MCP
  • team workflows
Apr 16 8 min read

How to Track Claude Code 5-Hour Window Usage

40.8% of devs use Claude Code, but the 5-hour window is opaque. Build a local dashboard that parses transcripts and estimates your token budget.

  • claude-code
  • token-usage
  • developer-tools
  • ai-coding
  • cost-tracking
Apr 9 31 min read

Your AI Agent Is Flying Blind Without Local Code Intelligence

84% of developers use AI tools but 46% distrust the output. Three on-device models, 32 MCP tools, 9.93/10 relevance, and zero source code leaving your machine.

  • local code intelligence
  • AI agents
  • MCP
  • code search
  • developer tools
Apr 8 11 min read

Building an LLM Wiki: From Karpathy's Gist to a Working CLI

I turned Andrej Karpathy's LLM wiki concept into a Bun CLI (~500 lines of TypeScript) that automatically builds a persistent knowledge base from Claude Code sessions, files, and URLs.

  • llm
  • cli
  • knowledge-management
  • claude-code
  • bun
Apr 2 15 min read

How Do You Test Systems That Analyze Behavior Over Time?

Backtesting borrows from quant finance to catch temporal bugs unit tests miss. Poor US software quality costs $2.41T per year. Here's the technique.

  • backtesting
  • software-engineering
  • data-pipelines
  • temporal-data
  • regression-testing
  • synthetic-data
  • developer-tooling

Search posts, projects, resume, and site pages.

Jump to

  1. Home Engineering notes from the agent era
  2. Resume Work history, skills, and contact
  3. Projects Selected work and experiments
  4. About Who I am and how I work
  5. Contact Email, LinkedIn, and GitHub