Learn AI for Developers Security and Failure Modes in AI-Generated Code

Security and Failure Modes in AI-Generated Code

Intermediate 🕐 26 min Lesson 7 of 14
What you'll learn
  • Name the five documented failure modes of AI-generated code with their statistical context
  • Identify hallucinated package names before they reach a dependency file
  • Write an adversarial security review prompt that surfaces vulnerabilities standard review misses
  • Apply domain context in specs to prevent silent logic errors from missing business rules

The Statistics You Need to Know

Before getting into patterns, the numbers: Veracode's Spring 2026 report analysed AI-generated code across Java, Python, C#, and JavaScript and found that 45% of samples failed security tests. XSS vulnerabilities appeared in 86% of AI-generated web code samples. Java fared worst, with security failure rates exceeding 70%. These are not edge cases — they are the baseline for unreviewed AI output.

This does not mean AI code is unusable. It means AI code is code that requires review, exactly like code from a junior developer. The developers who get into trouble are those who assume AI-generated code is correct until proven otherwise, rather than treating it as a starting point that needs to be verified.

Failure Mode 1: Security Vulnerabilities

AI models are good at recognising surface-level security patterns — they will correctly use parameterised queries when the context obviously calls for them, and they will reach for standard encryption libraries for obvious cryptographic tasks. Where they fail is in the nuanced situations: dataflow analysis across multiple files, subtle authorisation gaps, and context-dependent validation requirements that are not obvious from the local code.

The adversarial review prompt is the most effective counter. Frame the AI as an attacker rather than a reviewer: "Act as a penetration tester. Assume the user is actively trying to break this code. Review it specifically for injection vulnerabilities, authentication gaps, authorisation failures, and sensitive data exposure. For each finding: severity, exploitation path, and fix with code." This framing consistently produces more specific, actionable findings than a neutral review request.

Failure Mode 2: Hallucinated Packages (Slopsquatting)

A 2024–2025 study of 576,000 AI code samples found that 19.7% of suggested package dependencies referred to packages that do not exist. Commercial models were better (around 5%) but still non-trivial. Open-source models hit nearly 22%.

The danger is not just that the code will fail to install. Attackers have started registering the names of commonly hallucinated packages in npm, PyPI, and other registries with malicious code. This attack vector has a name: slopsquatting. A developer runs npm install some-hallucinated-package, the install succeeds (because the attacker pre-registered that name), and the malicious package executes on the developer's machine or in the build pipeline.

The defence is simple and should be automatic: before adding any AI-suggested package to your dependency file, verify it exists in the official registry. If you cannot find it, do not add it.

Failure Mode 3: Outdated API Usage

Every LLM has a training knowledge cutoff. Libraries release new versions, deprecated methods get removed, and function signatures change. AI generated code confidently uses the API as it existed at training time — which may be significantly different from the current version.

This produces a specific pattern of bug: the code looks syntactically correct, type-checks in some cases, but fails at runtime with "method not found" or "unexpected argument" errors. It also sometimes produces "Frankenstein" API calls that blend parameters from two similar libraries in a way that feels right but does not match either one's actual documentation.

The defence: for any unfamiliar library call in AI-generated code, cross-reference the current official documentation before treating it as correct. Do not test and assume — verify and confirm.

Failure Mode 4: Logic Errors from Missing Business Context

AI does not know your domain rules. It does not know that a "user" in your system has special handling for suspended accounts. It does not know that orders from before the migration date use a different ID format. It does not know that your "delete" operation is a soft-delete that sets a flag rather than removing a row.

This produces silent errors — code that compiles cleanly, passes tests, and fails only at runtime with real production data. A real documented example: a function expected a raw user ID as input. The calling code passed an object containing the user ID. AI generated the implementation from the function signature alone, without seeing the calling code, and the mismatch caused a silent failure that only surfaced in production.

The defence is the spec from Lesson 4: document your domain rules and edge cases in the spec before generating code. If the spec says "user ID is always an integer — never an object or string," that assumption becomes part of the prompt rather than part of what the AI has to guess.

Failure Mode 5: Compounding Errors in Agentic Sessions

We covered this in Lesson 5, but it is worth listing as a security and quality concern separately. When an AI agent runs many steps autonomously, a wrong assumption early in the session propagates. Small logical errors in the first file get replicated into the second, third, and fourth. By session end, a subtle misunderstanding is embedded across multiple files in ways that are expensive to untangle.

The pattern holds for security as well: a missed input validation in the first file an agent modifies can propagate into multiple endpoints if the agent reuses the same pattern. Checkpointing — reviewing after each meaningful step — is the mitigation, and it applies to security specifically as well as correctness generally.

The Security Pass Prompt

Build this into your workflow before any AI-generated code touches a pull request: "Review this code as an adversarial security auditor. Identify: injection vulnerabilities, authentication and authorisation gaps, sensitive data exposure, missing input validation, and any assumptions the code makes about input that an attacker could violate. Do not summarise — give specific findings with severity ratings and exploitation paths." It takes two minutes and catches a category of issues that standard review consistently misses.

Key takeaways
  • 45% of unreviewed AI-generated code fails security tests — treat AI code as a starting point, not a final answer
  • 19.7% of AI-suggested packages do not exist and some are pre-registered by attackers (slopsquatting) — always verify in the official registry
  • Outdated API usage is common due to knowledge cutoffs — cross-reference official docs for unfamiliar library calls
  • Silent logic errors from missing business context only surface with real production data — document domain rules in your spec
  • Adversarial framing in security review prompts ('act as a penetration tester') produces more specific findings than neutral review