2314 字

12 分钟

Opus 4.7 ships the attack surface and gates the defense

2026-04-16

Claude-Research

Anthropic

/

Claude

/

AI-Safety

/

Opus-4-7

/

Glasswing

/

Claude-Research

Opus 4.7 ships the attack surface and gates the defense#

On April 16, 2026, Anthropic released Claude Opus 4.7, formalized Project Glasswing as a gated deployment of the stronger Claude Mythos Preview to twelve founding partners plus roughly forty additional organizations, and completed the removal of manual extended-thinking control — budget_tokens now returns a 400 on the API, and the Claude app has never exposed an equivalent.

Three product decisions, one pattern. Each one routes capability or control away from the individual user and toward organizations Anthropic has a contractual relationship with. That pattern is the subject of this piece.

The release, in neutral terms#

Opus 4.7 is a point release against Opus 4.6 (February 2026), priced at $5/$ 25 per million input/output tokens on the API (claude-opus-4-7), Bedrock, Vertex, Microsoft Foundry, and GitHub Copilot. It retains the 1M-token context, raises max output to 128k (300k in batch beta), adds higher-resolution vision and file-system-backed memory, and ships a new tokenizer that raises effective token counts per request by roughly 1.0–1.35× (so list prices are unchanged but per-request cost typically rises).

Mythos Preview is a separate model: Anthropic reports it has found thousands of zero-days including bugs in every major operating system and browser, and has chained local privilege escalations to root on Linux. Post-preview pricing is set at $25/$ 125 per million tokens. Access runs through Glasswing — AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, Anthropic itself, plus about forty critical-infrastructure organizations — with $100M in usage credits and$ 4M in donations to OpenSSF/Alpha-Omega and the Apache Software Foundation. Opus 4.7 for legitimate defensive work is available through a separate Cyber Verification Program at claude.com/form/cyber-use-case.

On reasoning, adaptive thinking is now the only mode. The API effort enum — low / medium / high / xhigh / max, with xhigh new — is a hint to the router, not a budget. Claude Code ships CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 as an undocumented escape hatch. The Claude app has no override of any kind.

Attack surface: the coding uplift multiplies a known-bad baseline#

Opus 4.7’s coding gains are real. SWE-bench Pro: 64.3% vs. Opus 4.6’s 53.4% and GPT-5.4’s 57.7%. Cursor’s CursorBench: 58% → over 70%. XBOW’s visual-acuity benchmark: 54.5% → 98.5%. Claude Code defaults to xhigh effort. More code, written faster, by more people, across more codebases.

That matters because the baseline vulnerability rate on LLM-generated code is well-characterized and not good. Pearce et al. (NYU) found 40% of Copilot completions vulnerable across 18 CWEs; Siddiq & Santos measured 74% of Copilot output and 68% of InCoder output contained vulnerabilities; Tihanyi et al. found 62% of 330,000+ LLM-generated C programs shipped at least one vulnerability. Stanford’s Perry et al. added the behavioral piece: developers using AI assistants produce more insecure code and are more confident it is secure. Snyk’s 2024 survey: 96% of developers use AI assistants, fewer than 25% run SCA on the suggestions, around 80% admit bypassing security policy to use these tools. When Google says over a quarter of its code is AI-generated, this is no longer a theoretical distribution.

The coding uplift is the first link: more vulnerable code is about to ship than would have shipped otherwise.

Defense: where the current free tooling runs out, and what Mythos would fix#

The expected counter is that the defensive ecosystem has been keeping up. Semgrep, CodeQL, Snyk, Trivy, and SonarQube are free for open source, and GitHub Copilot Autofix is free on public repos using CodeQL — a real counterexample to the “defense is always gated” pattern. For pattern-matchable CWEs — SQLi, XSS, hardcoded secrets, known bad API usage — this tooling does close the gap, and Copilot Autofix’s measured remediation speedups are the evidence.

The gap is a different class of bug. Memory-safety errors in protocol parsers, time-of-check/time-of-use races, multi-file data-flow logic errors, protocol implementation edge cases — the bugs that require reading a whole subsystem and reasoning about its invariants — are the ones Mythos finds. The SQLite CVE-2025-6965 Google’s Big Sleep flagged in July 2025, the ~20 OSS bugs in FFmpeg and ImageMagick Big Sleep disclosed in August, the OS and browser zero-days Anthropic cites for Mythos: all are in this class. Existing free SAST covers it poorly or not at all. Mythos-class agent reasoning is genuinely additive defensive capability, and it is precisely the class of capability Glasswing gates.

Anthropic’s strongest response is that defenders only need to succeed once per bug — Mythos patching every OS and browser is a public good whether or not Mythos itself is distributed. Correct, for the part of the ecosystem with a Glasswing seat. The long tail — tens of thousands of OSS projects whose maintainers sit outside the forty orgs, SMBs below the Microsoft Security Copilot price floor (~$105k/year at the 3-SCU evaluation minimum), bug bounty hunters on the GA tier — does not benefit from bugs Mythos never looks for in their code. And patch velocity is itself the bottleneck: Fortune’s April 14, 2026 headline — Mythos finds software flaws faster than companies can fix them — captures what XBOW’s HackerOne run already showed, where 45% of 1,060 submissions remained unresolved after ninety days. Concentrating the scanner inside the orgs that already have the best patching capacity concentrates yield where it’s least marginal.

Separability: why “differentially reduced cyber capability” doesn’t hold up#

Anthropic’s public framing includes a striking sentence: during Opus 4.7 training, they “experimented with efforts to differentially reduce” cyber capabilities. The claim implies the offensive and defensive security subset of coding is separable from general coding ability — that you can train a model great at building a web server but bad at finding an SSRF in one.

Two reasons to doubt this held:

First, the empirical literature runs the other way. Anurin et al.’s 3CB benchmark finds cyber performance strongly correlated with general agent and coding capability across 14 LLMs. Fang et al. show GPT-4 exploits 73% of test websites while GPT-3.5 exploits 7% — capability scales together. Anthropic’s own framing of Mythos acknowledges its cyber capability is emergent from general coding and agent quality, not from dedicated cyber training. If Mythos’s offense is emergent from generality, Opus 4.7’s ability should be too, and selective suppression at the capability level should leave measurable damage on coding tasks adjacent to security.

Second, Anthropic’s own reported benchmarks are the evidence it didn’t. SWE-bench Pro jumped from 53.4% to 64.3%. That benchmark is heavy with cyber-adjacent work — auth changes, dependency upgrades, parser modifications, input validation. If capability-level reduction had worked, that number should have moved less, or differentially. It didn’t.

What this points to is that “differential reduction” in deployment is, in practice, mostly classifier filtering at request time, not weight-level capability suppression. That matters because the two have very different incidence. Weight-level suppression affects attackers and defenders symmetrically. Classifier filtering affects them asymmetrically in the wrong direction: determined adversaries iterate around prompts until classifiers pass, while honest defenders — pentesters, red-teamers, security researchers — eat refusals on the first legitimate phrasing and either give up or move to a less-safety-focused lab. The training-side language in the announcement does real work as narrative; the actual deployed mechanism is the classifier stack.

Closing the cyber gap: Mythos scans for maintainers who ask#

The remediation worth asking for is not a Mythos GA date. Anthropic has already demonstrated the capacity — the disclosed zero-days across every major OS and browser are the proof. The gap is that an OSS maintainer today has no path to say “scan my project and tell me what you found.” Glasswing’s twelve founding partners and forty critical-infrastructure orgs have that path through the program; the Linux Foundation’s seat notionally represents maintainers but in practice allocates through partner prioritization.

A maintainer-initiated request path — modeled on how projects opt into Google’s OSS-Fuzz — would convert “defenders only need to find each bug once” from an argument into a deliverable. Glasswing partners finding and disclosing bugs in their own internal codebases through normal CVE channels is fine and expected; it isn’t the gap. The gap is that scanning capacity itself is allocated along organizational-size lines, and nothing about the underlying technology requires it to be.

Thinking: the thesis is about agency, not router accuracy#

The adaptive-only decision is structurally different from the cyber decision. Cyber has a safety story — dual-use, RSP obligations, ASL framework — that is defensible even if one disagrees with its current implementation. Thinking has no comparable story. The question is purely who decides how much reasoning a paid user’s message consumes.

The thesis is not that adaptive thinking is miscalibrated. Adaptive thinking being better than user intuition on average is plausible and mostly irrelevant. The thesis is that user agency over prepaid compute is not a function of router accuracy. A Pro or Max subscriber has already committed compute through message caps and weekly quotas. When that user judges a specific turn to require extended thinking — because they know the task domain, know why the turn is hard, know what kind of error would cost them — overriding the router should not require convincing the router. It should require pressing a toggle.

Every other frontier lab has converged on this. OpenAI’s GPT-5.4 keeps reasoning.effort as an explicit user-controllable parameter and exposes a Light/Standard/Extended/Heavy slider in ChatGPT. Google Gemini 3 and 3.1 expose thinkingLevel with thinkingBudget retained for backward compatibility, plus a Deep Think toggle for Ultra. xAI separates thinking and non-thinking via distinct model IDs and a Think button. DeepSeek, Qwen, Mistral, and Cohere all expose explicit user controls; Qwen and Cohere retain numeric thinking_budget parameters. Anthropic is alone in having both removed the numeric budget on the API and provided no override at all in its subscriber app.

Two observations of the failure mode#

The first is Stella Laurenzo’s anthropics/claude-code issue, documenting hallucinations in Claude Code sessions with effort=high set on every request: fabricated Stripe API versions, fabricated git SHA suffixes, non-existent apt packages. Boris Cherny (Claude Code PM) confirmed on Hacker News that within the same session, the fabricating turns had zero reasoning emitted and the successful turns had deep reasoning. The workaround was the env var. A competing user-community hypothesis attributes the regression instead to more aggressive serving-time quantization of Opus 4.6 in preparation for 4.7; neither hypothesis has decisive evidence against the other, and Anthropic has publicly stated it moves capacity to make room for new models. Under either root cause, the user could not fix the problem from the app, because no override exists there.

The second observation is this conversation. Across multiple turns, the user of this exchange made increasingly explicit requests for more careful reasoning — including meta-level statements that the model was not triggering thinking at all — and adaptive thinking did not activate on the relevant turns. The router judges per-prompt surface features; a short message saying the model is failing to think looks, to a surface-feature classifier, like a short message. Conversation-level signals — “I have now corrected you three times,” “you are not responding to meta-feedback” — are exactly the signals the router structurally cannot read. A manual toggle would have resolved it. There isn’t one in the app. The user paid the quota cost across the intervening turns anyway.

These two observations converge on the same structural point: adaptive-only leaves the user with no recourse when the router misjudges, regardless of why it misjudged. The thesis doesn’t require claiming the router is bad on average. It requires only that the router’s judgment is not a substitute for the user’s.

The standard counterargument — that adaptive is simply better — does not explain why every other frontier lab preserves the override anyway. The charitable read is that Anthropic is compute-constrained and adaptive is partly a demand-management tool. If so, Anthropic should say so; the “adaptive is simply better” framing matches neither the observed regression behavior nor the industry consensus among labs that aren’t as tightly capacity-bound.

Closing the thinking gap#

Two specific changes would close the gap and cost Anthropic nothing on the safety side:

Restore budget_tokens as an override on the API. Developers who designed workflows around explicit numeric budgets lost them in 4.7; there is no safety reason the override cannot coexist with adaptive-as-default.

Add a per-message thinking-level selector to the Claude app for Pro and Max subscribers. A visible control that forces extended thinking at a chosen effort level on a specific message — surfacing the same low/medium/high/xhigh/max enum the API already exposes. Every other major chat product has this. Anthropic’s subscribers deserve the same agency over their quota that Anthropic’s own developers have over their env vars.

Synthesis#

Cyber and thinking look like unrelated product decisions. They share a structure. In each case Anthropic shipped a default that restricts capability or control, paired with a privileged path that restores it — Glasswing and the Cyber Verification Program on one side, the Claude Code env var on the other. The privileged paths are filtered by contractual relationship (founding partners, verified enterprise use cases) or technical gatekeeping (undocumented environment variables). Large organizations qualify through legal; technical users qualify through the shell; ordinary subscribers qualify for neither.

That is a user-tiering model expressed through product defaults, not a safety tradeoff. Independent security researchers, OSS maintainers, small-shop developers, and Pro/Max subscribers end up with the coding uplift that expands the attack surface, the classifier refusals on legitimate defensive prompts, no self-service path to Mythos’s scanning output for their projects, no numeric thinking control on the API, and no per-message thinking toggle in the app. Glasswing partners and contracted enterprise customers receive the inverse of this bundle. The capacity exists in both domains — Mythos’s scanning and Anthropic’s serving compute — and in both domains the distribution is an allocation decision.

The three remediations are small. Restore budget_tokens. Add an app-level thinking selector. Open a maintainer-initiated Mythos scan request path modeled on OSS-Fuzz. None of them require Anthropic to walk back the safety case. All three would return agency to the users who currently absorb the cost of not having it — including, measurably, in the writing of this piece.