## Sources

1. [Everyone’s an Engineer Now](https://www.oreilly.com/radar/everyones-an-engineer-now/)
2. [AI Code Review Only Catches Half of Your Bugs](https://www.oreilly.com/radar/ai-code-review-only-catches-half-of-your-bugs/)

---

### **"AI Code Review Only Catches Half of Your Bugs" by Andrew Stellman**

*   **The Flaw in AI Code Generation:** AI tools are exceptionally good at writing syntactically "correct" code that completely misses the user's actual goal [1]. Because an AI only builds exactly what it is instructed to build, a simple misunderstanding of intent can lead to flawless logic that operates under the wrong parameters, such as a transit app fetching data for buses driving in the wrong direction [1-3].
*   **The "Intent Ceiling" in Structural Analysis:** Structural analysis—which includes traditional static analyzers, linters, and current AI code review tools—suffers from an "intent ceiling" [4]. These tools analyze what the code does, but they are entirely blind to what the developer actually *intended* the code to do [4]. Without a clear specification of the software's intent, the AI cannot ask "does this do what it’s supposed to do?" [4].
*   **A 50% Plateau for Detecting Security Bugs:** Research stretching back two decades, including NIST evaluations and a 2024 ISSTA study, demonstrates that static analysis tools plateau at detecting roughly 50-60% of security vulnerabilities [5]. Roughly half of all security defects are implementation bugs (like buffer overflows or SQL injections) which tools can spot, while the other half are design flaws or intent violations [5, 6]. For example, a missing authorization check (CWE-862) is not a coding error but a missing requirement, making it completely invisible to standard AI code reviews [6, 7].
*   **Limitations of Spec-Driven Development (SDD):** While SDD is highly popular and improves AI output by having developers clearly define *how* something should be implemented, it fails to capture the *why* [8]. Verifying code quality requires the AI to understand the core purpose and the edge cases associated with that purpose [9].
*   **The Power of the "Quality Playbook":** To address these gaps, the author developed the "Quality Playbook," an open-source tool that forces AI to derive and verify behavioral requirements [10, 11]. By analyzing community issue discussions to extract intent, this tool successfully caught and patched a long-standing bug in Google’s widely used Gson library, where duplicate keys were silently accepted if the first value was null [9, 12]. 
*   **Actionable Strategies for Developers:** 
    *   **Document Guarantees:** Developers must explicitly state what their software is meant to guarantee, including the reasons why it matters and who depends on it [13].
    *   **Share Intent, Not Just Code:** Feed AI assistants the chat logs, support tickets, and design discussions that contain the crucial *why* behind architectural decisions [13].
    *   **Define Negative Requirements:** Specify what the software must *never* do (e.g., "unauthenticated users must not be able to delete data"), as these boundaries are impossible for structural reviewers to infer on their own [13].


### **"Everyone’s an Engineer Now" by Tim O’Reilly**

*   **Pervasive AI Integration at Anthropic:** Based on a fireside chat with Anthropic's Cat Wu, the article notes that 90% of Anthropic's code is now written by their AI tool, Claude Code [14]. The tool grew organically from a side project to full-company adoption within two months [15].
*   **Automated and Tight Feedback Loops:** Anthropic uses an internal Slack channel to gather constant feedback on Claude Code [15]. The feedback loop is so rapid that it functions like continuous integration for product quality; in some cases, scheduled AI agents actually scan the channel, identify user issues, and autonomously write and merge fixes before humans can get to them [16].
*   **The Bottleneck has Shifted to Code Review:** Because engineers are now producing 200% more code than they were a year ago, generating code is no longer the bottleneck—reviewing it is [17]. 
*   **Heavyweight AI Code Review:** To combat the review bottleneck, Anthropic employs a highly robust review system where 5 to 10 agents run in parallel with slightly different tasks to achieve maximum recall [18, 19]. This thorough tracing routinely catches obscure, adjacent bugs—such as cache invalidation issues or unintended side effects—that human reviewers would likely miss [18].
*   **A Crucial Cultural Shift to "Full Ownership":** Relying on AI output necessitated a shift in engineering culture. Code authors are now expected to own their Pull Requests end-to-end, including post-deployment bugs, and must thoroughly understand every line of AI-generated code [19, 20]. This prevents situations where senior engineers are overwhelmed by verifying lightly-tested code generated by juniors [19].
*   **The Rise of Personal Software:** With tools like Cowork making agentic software accessible to nontechnical users, there is a rising trend of people easily building bespoke, single-use tools—such as family expense trackers—that would never justify professional development costs [20, 21].
*   **"Product Taste" is the New Core Skill:** Since AI has essentially commoditized the ability to implement a basic spec, the most valuable engineering skill today is "product taste" [22, 23]. This means possessing the intuition to understand complex user needs, deciding exactly what features to build, and setting a high quality bar for the AI's output [22].
*   **Bidirectional Leveling Up:** Junior engineers are encouraged to use AI agents like interns: first asking them questions to understand a codebase, verifying those answers with senior staff, and then updating the AI's core instructions (like a CLAUDE.md file) so the AI continuously learns from the humans [23, 24].