Day 21: Week 3 Review + Hands-On

Week 3 wrapped up agents in implementation mode: building loops from scratch, debugging thought traces, testing evaluations, adding RAG, and locking down safety. Today is the integration checkpoint — you don't just review, you build something real.

Why This Day Matters

Theory compounds slowly. Muscle memory compounds fast.

You've covered the full implementation layer — code generation agents, debugging patterns, eval frameworks, RAG pipelines, and security guardrails. If you haven't wired any of this together into a real agent yet, today is the forcing function. If you have, today is about hardening what exists: one useful tool, proper error handling, and logging you'll actually read when things go wrong.

Senior engineers don't need more concepts. They need clean systems they trust.

Core Concepts Recap

1. The Minimal Viable Agent is 50 Lines

Week 3, Day 15 showed that the core loop is embarrassingly small:

async function runAgent(userInput) {
  const messages = [{ role: "user", content: userInput }];

  while (true) {
    const response = await llm.chat({ messages, tools });

    if (response.stop_reason === "end_turn") {
      return response.content;
    }

    // Execute tool calls, push results back
    for (const toolCall of response.tool_calls) {
      const result = await executeTool(toolCall);
      messages.push({ role: "tool", content: result, tool_use_id: toolCall.id });
    }

    messages.push({ role: "assistant", content: response.content });
  }
}

That loop is the skeleton. Everything else — error handling, logging, retries, memory — is flesh on that skeleton. Don't let frameworks obscure what's happening here.

2. One Tool Done Right Beats Five Done Poorly

If you're adding tools to your agent today, resist the urge to add many. Pick one tool that solves a real problem, and do it properly:

const readFileTool = {
  name: "read_file",
  description: "Read the contents of a file at the given path. Returns file content as a string. Returns an error message if the file doesn't exist or is not readable.",
  input_schema: {
    type: "object",
    properties: {
      path: {
        type: "string",
        description: "Absolute or relative path to the file"
      }
    },
    required: ["path"]
  }
};

async function executeReadFile({ path }) {
  try {
    const content = await fs.readFile(path, "utf-8");
    return { success: true, content };
  } catch (err) {
    // Return structured error — never throw. The agent needs to read this.
    return { success: false, error: err.message };
  }
}

Key principle: tool errors should return, not throw. The agent needs to read the error and decide what to do. An uncaught exception breaks the loop; a structured error message lets the agent retry, reframe, or escalate.

3. Logging You'll Actually Use

Debugging agents without traces is archaeology. Add structured logging at the points that matter:

function logAgentEvent(event) {
  const entry = {
    ts: new Date().toISOString(),
    type: event.type,           // "llm_call" | "tool_call" | "tool_result" | "loop_end"
    data: event.data,
  };
  console.log(JSON.stringify(entry));
  // Or: append to a file, send to Langfuse, etc.
}

// Usage in the loop:
logAgentEvent({ type: "llm_call", data: { messageCount: messages.length } });
logAgentEvent({ type: "tool_call", data: { name: toolCall.name, input: toolCall.input } });
logAgentEvent({ type: "tool_result", data: { name: toolCall.name, success: result.success } });

You want to answer these questions from logs alone:

What tools did it call, in what order?
What inputs did it pass?
Did the tool succeed or fail?
How many LLM turns did the run take?

If you can't answer those from your logs, your logging is incomplete.

4. Error Handling Taxonomy

Not all agent errors are equal. The three classes you saw in Day 12:

Class	Example	Strategy
Tool failure	File not found, API 500	Return error to agent, let it retry with different params
LLM misbehavior	Hallucinated tool args, infinite loop	Detect via schema validation or turn counter, break + escalate
Systemic failure	Network down, token budget exceeded	Hard stop, persist state if possible, notify human

Add a turn counter as your last line of defense:

let turns = 0;
const MAX_TURNS = 20;

while (true) {
  if (++turns > MAX_TURNS) {
    throw new Error(`Agent exceeded ${MAX_TURNS} turns — likely stuck in a loop`);
  }
  // ... rest of loop
}

Try This Today

Build or extend an agent with these three things working together:

One real tool — something that touches your actual stack (read a file, query a DB, call an API you own)
Error handling — tool errors return structured { success, error }, never throw
Structured logging — every LLM call, every tool call, every result logged as JSON

Run it on a real task. Then open the logs and trace exactly what happened. Can you reconstruct the agent's "reasoning" from the log alone? If yes, your observability is solid.

Bonus: add the turn counter guard and intentionally trigger it by giving the agent an impossible task. Confirm it exits cleanly instead of running forever.

Resources

Anthropic Tool Use Guide — the definitive reference for tool schemas, results, and error patterns
Langfuse Self-Hosted — if you want a real trace UI without sending data externally; worth 30 minutes of setup

Week 4 starts tomorrow: observability at scale, cost management, human-in-the-loop patterns, and production deployment. The shift is from "does it work" to "can I trust it in production."