buildfastwithaibuildfastwithai
GenAI LaunchpadAI WorkshopsAll blogs
Download Unrot App
Free AI Workshop
Share
Back to blogs
AI News

AI News Today - May 31, 2026: 11 Biggest Stories

May 31, 2026
21 min read
Share:
AI News Today - May 31, 2026: 11 Biggest Stories
Share:

AI News Today -- May 31, 2026: Claude Opus 4.8, Microsoft Goes Independent, and a Month That Changed Everything

May 2026 ends the way it started: with a major AI release nobody expected this fast. Claude Opus 4.8 launched on May 28, 2026, just 41 days after Opus 4.7 -- Anthropic's fastest version cadence ever. Same price. Better benchmarks across the board. 4x more honest about its own code bugs. And a new orchestration mode called Dynamic Workflows that lets Claude Code spin up 1,000 parallel subagents for repository-scale migrations.

The day before, Reuters and The Information reported that Microsoft will unveil homegrown MAI AI models at Build 2026 starting tomorrow June 2 -- including a coding model designed to take GitHub Copilot back from Claude Code, which has overtaken Copilot as the dominant developer AI tool. Anthropic also confirmed Claude Mythos Preview models will be publicly available "in the coming weeks." And the cybersecurity world spent the week processing Nightmare Eclipse, the rogue security researcher who released 6 Windows exploits in 6 weeks before being banned from GitHub and then GitLab.

Here are the 11 stories worth reading on the last day of May 2026.

1. Claude Opus 4.8: Same Price, Better Benchmarks, 4x More Honest About Its Own Bugs

Anthropic released Claude Opus 4.8 on May 28, 2026, just 41 days after Claude Opus 4.7. The fastest version cadence Anthropic has ever run. The headline positioning: "a modest but tangible improvement" -- Anthropic's own language, which is unusually candid for a model launch.

The core facts: standard pricing is unchanged at $5 per million input tokens and $25 per million output tokens. The new Fast mode runs at approximately 2.5x the speed of standard Opus inference for $10/$50 per million tokens -- three times cheaper than Opus 4.7's Fast mode was priced at $30/$150. Batch API is 50% off on both sides. Context window remains 1 million tokens on the API, Amazon Bedrock, and Google Vertex AI.

The alignment improvement is the part of this release that is genuinely new. Anthropic's internal benchmarks show Opus 4.8 is four times less likely than Opus 4.7 to let a code flaw pass without flagging it. The model scores 0% on "uncritically reporting flawed results" -- a metric Anthropic tracks specifically for agentic code review reliability. For teams running Opus in autonomous coding loops where the model reviews its own output, this is not a benchmark footnote. It is a production reliability change that compounds across every agent run.

The reception is mixed in the developer community. TechCrunch noted that Opus 4.7 received a "chilly reception" because benchmark gains were modest relative to price expectations. Opus 4.8 avoids that critique by holding price flat while delivering gains. Developers who are already on the Opus 4.7 rate card have zero evaluation cost to upgrade.

For the full benchmark breakdown including SWE-bench Pro, GDPval-AA Elo, and the Terminal-Bench comparison against GPT-5.5, our dedicated Claude Opus 4.8 Review: Benchmarks, Dynamic Workflows, and Price has the complete analysis.

2. Dynamic Workflows: Claude Code Now Orchestrates 1,000 Parallel Subagents

The most practically significant feature in the Opus 4.8 launch is Dynamic Workflows in Claude Code. It is in research preview, available now for Max, Team, and Enterprise plans, and enabled by default for Max and Team. Enterprise admins must explicitly enable it.

Dynamic Workflows lets Claude Code fan a hard problem out across hundreds -- up to 1,000 -- parallel subagents, then verify their work and synthesize results before reporting back. The canonical demo case comes from Bun creator Jarred Sumner: Dynamic Workflows migrated approximately 750,000 lines of Rust code in 11 days. That is the kind of task that previously required assembling a team of engineers, planning a multi-month migration project, and accepting significant regression risk throughout.

Three operational additions shipped alongside Dynamic Workflows: mid-task system messages on the Messages API (inject new instructions into a running long-context task without restarting), an effort control slider on claude.ai and Cowork (explicitly tell Claude how much effort to apply: standard, high, xhigh, max), and Claude Code terminal version 2.1.154 which picks up Opus 4.8 as the default model automatically.

The implication for enterprise developers: parallel subagent orchestration at this scale crosses a qualitative threshold. Claude Code is no longer a coding assistant. It is a distributed engineering system where Claude orchestrates other Claude instances to solve repository-scale problems. The latency and cost math changes when you can parallelize the hard part across 1,000 subagents simultaneously.

For production agent pipeline patterns using Claude Code and the Managed Agents API, the gen-ai-experiments repository has working notebooks you can run immediately.

3. Claude Opus 4.8 vs GPT-5.5: The Benchmark Head-to-Head That Matters for Builders

The competitive picture from the Opus 4.8 launch is more nuanced than "Anthropic wins." On the benchmarks that developers run agentic coding workloads on, Opus 4.8 leads clearly. On terminal-agent tasks specifically, GPT-5.5 holds a small lead.

SWE-bench Verified: Opus 4.8 scores 88.6%, up from 87.6% on 4.7. GPT-5.5 trails at approximately 78-79%.

SWE-bench Pro: Opus 4.8 scores 69.2%, up from 64.3% on 4.7. This is 10 percentage points ahead of GPT-5.5 (approximately 59%) on the harder evaluation. Gemini 3.1 Pro is roughly tied with Opus 4.7 on this metric.

GDPval-AA Elo: Opus 4.8 scores 1890, 121 Elo points ahead of GPT-5.5 on real-world agentic task evaluation.

GPQA Diamond: Opus 4.8 scores 93.6%, placing it in the upper tier of science reasoning alongside Claude Mythos Preview (94.6%).

USAMO 2026 math: Opus 4.8 jumps from 69.3% to 96.7% -- a 27-point improvement in mathematical reasoning in 41 days.

Terminal-Bench 2.1: GPT-5.5 leads here. Opus 4.8 scores 74.6%, which is above Opus 4.7 but below GPT-5.5's score. This is the one clear area where GPT-5.5 has a performance advantage.

The practical developer takeaway: if you are building agentic coding pipelines, running long-horizon repository-scale tasks, or need honest code review in autonomous loops, Opus 4.8 is the clear choice at current pricing. If you are running terminal-agent workloads or latency-sensitive command execution, GPT-5.5 remains competitive.

For the competitive model landscape including DeepSeek V3.2 as the cost-efficiency alternative at $0.28/$0.42 per million tokens, our Best AI Models April 2026: Ranked by Benchmarks provides the full pricing-adjusted comparison.

4. Claude Mythos Preview Public Release: Coming in Weeks

Buried in the Opus 4.8 release coverage but significant: Anthropic confirmed in the release materials and in developer communications that Claude Mythos Preview -- the restricted model powering Project Glasswing -- will be publicly available "in the coming weeks." This is the first explicit timeline commitment Anthropic has made for public Mythos access.

The Opus 4.8 review on Codersera explicitly states: "Anthropic also confirmed Mythos-class models -- the restricted Project Glasswing model that found 23,019 vulnerabilities in its first month -- will be generally available 'in the coming weeks.'" The 23,019 figure represents the updated vulnerability count from Glasswing's first month, which appears to have grown from the initial 10,000+ disclosure on May 22.

What public Mythos availability means in practice: Claude Mythos Preview is currently accessible only to approximately 50 Project Glasswing partner organizations (AWS, Apple, Google, Microsoft, IBM, Cloudflare, Mozilla, and others). A public release -- even if initially limited to enterprise API access -- would make its autonomous vulnerability discovery capabilities available to any organization running enterprise-tier Claude API access.

For context: Claude Mythos is the model the UK's AI Safety Institute (AISI) evaluated as capable of completing a 32-step simulated corporate network attack in 3 out of 10 attempts. It is the model that identified a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg. Public availability in weeks is the most consequential AI deployment news of Q3 2026 if it holds to schedule.

For the full Mythos capability profile, Project Glasswing structure, and DoD lawsuit context, our Claude Mythos: Release Date, Access, and What Comes Next is the definitive reference.

5. Microsoft MAI: Homegrown AI Models to Power GitHub Copilot at Build 2026

Reuters (citing The Information) reported on May 28, 2026, that Microsoft will unveil a suite of homegrown AI models at Build 2026 on June 2, including a dedicated coding model for GitHub Copilot. Microsoft shares rose approximately 3% on the report. Microsoft declined to comment.

The full lineup per The Information: a new model class in multiple sizes, a coding-specialized model, a new in-house agent, and a new reasoning-specialized model. The team behind these models is Microsoft's internal AI division (MAI) led by Mustafa Suleyman. Suleyman's team had been restricted from training top-tier foundation models under the original OpenAI partnership agreement; those restrictions were renegotiated in April 2026, clearing the path for this announcement.

The strategic framing from Cybernews: "The push reflects Microsoft's effort to reduce dependence on OpenAI, whose partnership terms have been renegotiated, as rivals' coding tools -- notably Anthropic's Claude Code -- have pulled ahead." This is unusually direct: Microsoft's own reporting source explicitly names Claude Code as the reason GitHub Copilot needs a new model.

For context: GitHub Copilot was the pioneering commercial AI coding tool, launched in 2021. It led the market for three years. By 2026, Microsoft's own internal telemetry showed Claude Code overtaking Copilot in enterprise developer adoption. The decision to build homegrown models -- rather than continuing to rely exclusively on OpenAI and Anthropic models -- is the direct strategic response to that market reversal.

6. GitHub Copilot Already Writes 46 Percent of Code on the Platform

A data point confirmed by Microsoft's own Build 2026 pre-conference communications: GitHub Copilot writes 46 percent of the code committed by developers on the GitHub platform, based on Microsoft telemetry. That number was previously cited at 40 percent in November 2025. The trend is clear and accelerating.

For the enterprise developers making tool decisions: 46 percent AI-generated code on the world's largest software development platform means AI-assisted coding has crossed the threshold from "nice to have" to "default workflow" for the majority of active GitHub developers. The question is not whether to use AI for coding. It is which AI, at what price, running which underlying model.

Microsoft's timing is deliberate. Announcing the 46 percent telemetry figure the week before Build -- alongside the MAI coding model news -- creates a specific narrative: GitHub Copilot is already writing nearly half the code on GitHub, and tomorrow Microsoft reveals the model that will take that to 60 or 70 percent. Whether the MAI coding model can compete with Claude Code's SWE-bench Pro score of 69.2% is the benchmark question Build will answer.

7. Nightmare Eclipse: The Rogue Researcher Who Released 6 Windows Exploits in 6 Weeks, Then Got Banned From Everywhere

The security story that has been building for six weeks reached its conclusion this week. Nightmare Eclipse, a pseudonymous security researcher, released six working critical Windows exploits over six weeks, including vulnerabilities providing full SYSTEM access and BitLocker encryption bypass on fully patched Windows systems. The stated motivation was a vendetta against Microsoft.

GitHub terminated Nightmare Eclipse's account on May 23, 2026. The researcher immediately migrated to GitLab and continued posting. GitLab banned the account on May 26, just three days later. Both platforms wiped all published repositories.

In a threatening post before the bans, Nightmare Eclipse said: "I will make sure your bones are shattered that day," referencing July 14, 2026 as a potential future event date. The researcher also claimed to have deployed a "Dead man's switch" that would automatically release additional exploits if something happened to them. Microsoft is patching the disclosed vulnerabilities in the June 9, 2026 Patch Tuesday cycle.

The Nightmare Eclipse situation reveals a gap in the security disclosure ecosystem. A motivated researcher with working zero-day exploit code can now publish to a global audience within minutes of discovery, stay ahead of platform moderation by migrating between hosts, and create enormous enterprise risk in the weeks before patches arrive. The Windows exploit releases were effective attacks on Microsoft's enterprise customers, regardless of the researcher's stated intent. The fact that it took weeks and required two platform bans to stop the distribution is the structural problem.

8. OnlyFans 340 Million Record "Mega Leak" -- The Real Story Is More Complicated

A claim circulating since May 24 -- that a hacker was selling 340 million OnlyFans user records for 0.313 BTC (approximately $76,000) -- has been investigated by multiple security firms including Hackread and Troy Hunt's Have I Been Pwned service. The conclusion: this is almost certainly not a breach of OnlyFans itself.

When Hackread contacted the seller directly via Telegram, the seller admitted: "We didn't breach or hack OnlyFans. We used existing breaches and leaks databases and matched with users of the OnlyFans platform." What they built was a correlation database -- taking leaked data from Twitter, Instagram, Spotify, and other platforms, matching email addresses and usernames to OnlyFans accounts, and assembling a dataset that links real identities to OnlyFans profiles.

This distinction matters enormously for privacy impact. The specific harm of this dataset is identity correlation, not credential theft. OnlyFans is a platform where many users, including content creators and subscribers, maintain anonymity for safety, professional, or personal reasons. A dataset that links OnlyFans usernames to real-world email addresses and social media accounts breaks that anonymity even without any direct password compromise.

Troy Hunt publicly questioned whether this could be AI-generated data. Security analyst IntCyberDigest raised the same concern. The CVSS-equivalent risk here is not the technical breach severity -- it is the social harm of exposing identities associated with a platform where pseudonymity is functionally necessary for many users. Whether the data is real, recycled, or AI-generated, the phishing and doxing risk for anyone whose identity appears in the correlation set is significant.

9. Gemini 3.5 Pro: June Launch Confirmed, Vertex AI Allowlist Already Open

As of May 28, 2026, Google has confirmed that Gemini 3.5 Pro is not yet generally available -- but the Vertex AI allowlist for early access has opened. Organizations with active GCP enterprise contracts can request allowlist access through Vertex AI Model Garden. Some Google I/O attendees were granted direct access during the keynote. Google AI Studio also has a standard waitlist for non-enterprise users.

Sundar Pichai's exact line at I/O 2026 on May 19: "Give us until next month to get it to you." That puts the general availability window in June 2026, with no specific date committed. Based on prior Google model release patterns, a blog.google model card announcement with immediate general availability is the expected launch format.

The competitive context for Gemini 3.5 Pro: Flash already beats Gemini 3.1 Pro on coding and agentic benchmarks (Terminal-Bench 2.1 at 76.2% vs 70.3%, MCP Atlas at 83.6% vs 78.2%). Where Flash regresses relative to 3.1 Pro is long-context reasoning and Humanity's Last Exam. Pro is expected to close those gaps while maintaining Flash's agentic strengths.

The competitive benchmark Pro needs to clear to win the developer narrative: GPQA Diamond above 90%, SWE-bench Pro above 65%, and GDPval-AA Elo above 1800. If it clears all three, it is a genuine challenger to Claude Opus 4.8 at roughly half the price. If it misses any of them, it lands as a faster, cheaper alternative for specific use cases rather than a direct Opus replacement.

For a full preview of what Gemini 3.5 Pro needs to deliver based on Flash's benchmarks and market positioning, our Google I/O 2026: Gemini 3.5 Flash and All Developer Announcements has the complete technical context.

10. May 2026 by the Numbers: The Month That Defined the AI Industry's Future

May 2026 is the most consequential single month in AI industry history since ChatGPT launched in November 2022. The summary in data:

  • $900 billion: Anthropic's valuation after closing $30B+ funding round, surpassing OpenAI's $852B for the first time
  • $10.9 billion: Anthropic's projected Q2 2026 revenue (up 130% from Q1), with first-ever quarterly operating profit of $559M
  • $1.75 trillion: SpaceX's IPO target valuation, the largest IPO in history, with pricing June 11
  • 10,000+: High-severity software vulnerabilities found by Claude Mythos Preview in 30 days via Project Glasswing
  • 1.4 billion: Catholics globally reached by Pope Leo XIV's Magnifica Humanitas, the first papal AI encyclical
  • 276,000: Employees at KPMG getting Claude access, joining PwC and Deloitte for 1.1M Big Four professionals total
  • 41 days: Time between Claude Opus 4.7 and Opus 4.8, Anthropic's fastest version cadence ever
  • 10,000: GitHub repos stolen by TeamPCP supply chain attack via poisoned VS Code extension in 18 minutes
  • $70 billion: ByteDance's planned 2026 AI infrastructure capex, funded from its $50B profit
  • 900 million: Gemini monthly active users as disclosed at Google I/O 2026

The month's central lesson: AI is no longer primarily about which model scores highest on a benchmark. It is about who controls compute, who owns the enterprise deployment layer, and who has built the governance frameworks that let organizations deploy AI in regulated environments. Those questions are settled for years by the deals signed in May 2026.

11. What to Watch This Week: Build June 2-3, WWDC June 8, SpaceX IPO June 12

The most consequential 10-day window in AI industry history begins tomorrow:

Microsoft Build 2026 (June 2-3, San Francisco + online): Satya Nadella opens June 2. Windows Agent Framework, Copilot Agent Mode, Windows Agent Store, Azure AI Foundry multi-model, and the MAI coding model announcement confirmed for tomorrow. Keynote livestreamed free at build.microsoft.com.

Apple WWDC 2026 (June 8-12, Apple Park, June 8 10 AM PT keynote): Siri 2.0 with Gemini integration, the Extensions system for third-party AI in iOS 27, macOS 27 redesign, and the Gemini-powered Siri demo on 2 billion Apple devices. The most consequential consumer AI product launch since the original iPhone.

SpaceX IPO (roadshow June 4, pricing June 11, trading June 12 on Nasdaq SPCX): $1.75 trillion target valuation at up to $75 billion raise. 30% retail allocation via Robinhood, Fidelity, Schwab. The first public market data point for AI-era infrastructure valuation -- and the company whose AI revenue depends almost entirely on Anthropic's $1.25B/month compute contract.

If Build delivers on the MAI coding model and WWDC delivers on the Gemini-powered Siri, June 2 to 12 will be the week the AI consumer and developer markets both visibly tipped. The SpaceX IPO is the financial market's verdict on whether the compute infrastructure enabling all of this has been correctly priced.

For the full context on what triggered this entire convergence -- the AI cost reckoning, enterprise discipline phase, and the structural shifts of May 2026 -- our AI News Today May 29, 2026 and the May 22 through 30 series cover the full story.

Frequently Asked Questions

What is new in Claude Opus 4.8?

Claude Opus 4.8 launched on May 28, 2026, 41 days after Opus 4.7. Key improvements: SWE-bench Verified rises to 88.6% (from 87.6%), SWE-bench Pro rises to 69.2% (from 64.3%), USAMO 2026 math jumps to 96.7% (from 69.3%), and the model is four times less likely to let code flaws pass without flagging them. Standard pricing is unchanged at $5/$25 per million tokens. New Fast mode runs at 2.5x speed for $10/$50 per million tokens, three times cheaper than Opus 4.7's Fast mode. New features include Dynamic Workflows (1,000 parallel subagents in Claude Code), mid-task system messages on the Messages API, and an effort control slider.

What are Claude Opus 4.8 Dynamic Workflows?

Dynamic Workflows is a research preview feature in Claude Code that allows the model to fan a complex problem out across hundreds or up to 1,000 parallel subagents, verify their work, and synthesize results before reporting. It ships in Claude Code versions 2.1.154 and later. Available for Max, Team, and Enterprise plans; enabled by default for Max and Team; requires admin activation for Enterprise. Real-world example: Bun creator Jarred Sumner used Dynamic Workflows to migrate approximately 750,000 lines of Rust code in 11 days.

What is Microsoft's MAI coding model?

Reuters and The Information reported on May 28, 2026, that Microsoft will unveil a suite of homegrown AI models at its Build 2026 developer conference (June 2-3) in San Francisco. The lineup includes a coding-specialized model to strengthen GitHub Copilot, new models for transcription, reasoning, speech, and images, and a new in-house AI agent. The models are developed by Microsoft's internal AI division (MAI) led by Mustafa Suleyman. The push reduces Microsoft's dependence on OpenAI models and is explicitly a response to Anthropic's Claude Code overtaking GitHub Copilot in enterprise developer adoption.

When will Claude Mythos be publicly available?

Anthropic confirmed in materials associated with the Claude Opus 4.8 launch on May 28, 2026, that Claude Mythos Preview models will be "generally available in the coming weeks." Mythos Preview is currently restricted to approximately 50 Project Glasswing partner organizations including AWS, Apple, Google, Microsoft, IBM, Cloudflare, and Mozilla. The model found 23,019 high-severity vulnerabilities in its first month through Glasswing. Public availability is expected to begin with enterprise-tier API access.

Is the OnlyFans 340 million data leak real?

A hacker listed 340 million alleged OnlyFans records for sale (0.313 BTC, approximately $76,000) on a dark web forum starting May 24, 2026. However, the seller told Hackread directly on Telegram: "We didn't breach or hack OnlyFans. We used existing breaches and leaks databases and matched with users of the OnlyFans platform." OnlyFans has not confirmed any platform breach. Security researcher Troy Hunt publicly questioned the claim. The dataset appears to be a correlation database linking old breach data from other platforms to OnlyFans usernames, not a direct OnlyFans database dump. The privacy risk -- identity correlation for anonymous users -- is real regardless of the breach claim's accuracy.

What happened with Nightmare Eclipse and Windows zero-days?

Nightmare Eclipse is a pseudonymous security researcher who released six working critical Windows exploits in six weeks as a public vendetta against Microsoft. The exploits included vulnerabilities providing full SYSTEM access and BitLocker encryption bypass on fully-patched Windows systems. GitHub terminated the account on May 23, 2026. The researcher migrated to GitLab within hours. GitLab banned the account on May 26. Both platforms wiped all published repositories. The researcher threatened further releases on July 14, 2026 and claimed to have a "Dead man's switch" that would automatically release additional exploits. Microsoft is patching the disclosed vulnerabilities in the June 9, 2026 Patch Tuesday release.

When will Gemini 3.5 Pro launch?

Google confirmed Gemini 3.5 Pro at I/O 2026 on May 19, with Sundar Pichai saying "give us until next month" -- meaning June 2026. As of May 28, Gemini 3.5 Pro is not generally available but the Vertex AI Model Garden allowlist has opened for organizations with active GCP enterprise contracts. Some I/O attendees received direct access during the keynote. A Google AI Studio waitlist is open for non-enterprise users. The expected launch format is a blog.google model card announcement with immediate general availability. No specific date within June has been committed.

Recommended Reads

  • Claude Opus 4.8 Review: Benchmarks, Dynamic Workflows, and Price -- Build Fast with AI
  • AI News Today -- May 29, 2026: The AI Cost Reckoning Has Arrived -- Build Fast with AI
  • Claude Mythos: Release Date, Access, and What Comes Next (2026) -- Build Fast with AI
  • Google I/O 2026: Gemini 3.5 Flash and All Developer Announcements -- Build Fast with AI
  • Best AI Models April 2026: Ranked by Benchmarks -- Build Fast with AI
  • What Is Claude Cowork? The 2026 Guide -- Build Fast with AI

References

  • The Decoder -- Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks (May 28, 2026)
  • LLM Stats -- Claude Opus 4.8 launch: benchmarks, dynamic workflows, pricing
  • Build Fast with AI -- Claude Opus 4.8 Review: Benchmarks, Dynamic Workflows, Price
  • Reuters / Yahoo Finance -- Microsoft to release new coding model next week, the Information reports (May 28, 2026)
  • Cybernews -- Microsoft to unveil new homegrown AI models for GitHub Copilot at Build (May 29, 2026)
  • Digg -- Microsoft planning multiple releases at Build: coding model, reasoning model, new agent (The Information, May 28)
  • Cybernews -- Microsoft Nightmare Eclipse: GitLab removes rogue security researcher after GitHub ban
  • Hackread -- Hacker Selling 340M OnlyFans User Records Built From Old Breaches
  • TechRadar -- Hackers claim to be selling 340 million stolen OnlyFans records but experts are skeptical
  • Codersera -- Gemini 3.5 Pro: The June 2026 Launch Guide (includes Vertex AI allowlist status)

Windows News AI -- Microsoft Build 2026: AI Agents, Copilot, Azure AI Foundry, and Windows Local AI

Enjoyed this article? Share it →
Share:

    You Might Also Like

    AI News Today - May 18, 2026: 13 Biggest Stories
    AI News

    AI News Today - May 18, 2026: 13 Biggest Stories

    13 AI stories you need to know today: Google I/O 2026 in 48 hrs, Anthropic $900B round, Meta Avocado silent, GPT-5.5 Instant default, and more.

    AI News Today (May 25, 2026): Top AI Stories & Headlines
    AI News

    AI News Today (May 25, 2026): Top AI Stories & Headlines

    Today's biggest AI stories: Anthropic's $30B round at $900B valuation, Pope Leo XIV's AI encyclical, OpenAI IPO, and 12 more headlines.