GitHub's AI Agent Crisis Forces Microsoft to Tap AWS as Outages Break Enterprise SLAs

On June 22, 2024, a massive AWS outage paralyzed Slack, Zoom, and ChatGPT simultaneously. Now, GitHub’s AI agents are breaking enterprise SLAs at scale — forcing Microsoft to lease capacity from rival Amazon Web Services. The infrastructure underpinning AI-powered development tools is cracking under pressure it was never designed to handle.

TL;DR: A critical vulnerability in the LangGraph platform allowed attackers to hijack AI agents and steal corporate data, while 85% of Polish firms experienced cyberincidents in the past year. Simultaneously, AWS outages have taken down Slack, ChatGPT, and Zoom — exposing how fragile centralized AI infrastructure has become for enterprises betting their workflows on these tools.

Why Are GitHub’s AI Agents Breaking Enterprise SLAs?

GitHub’s autonomous AI agents, designed to handle code reviews and pull requests independently, are failing at rates that violate enterprise service-level agreements. The core problem stems from infrastructure that cannot sustain the computational demands of thousands of concurrent agent operations. When agents time out or produce erroneous outputs, the cascading failures disrupt development pipelines for hours.

The situation has deteriorated to the point where Microsoft — GitHub’s parent company — has begun leasing GPU capacity from AWS. This is unprecedented. A company that invested billions in its own Azure AI infrastructure now relies on its biggest cloud competitor to keep developer tools functional. Enterprise customers paying premium rates for guaranteed uptime are receiving SLA breach credits.

The root cause is architectural. AI agents require persistent connections, real-time model inference, and state management across distributed systems. Traditional cloud scaling assumptions do not apply. When a single agent operation can consume minutes of GPU time, the math breaks.

How Did the LangGraph Vulnerability Expose Corporate AI Agents?

Security researchers disclosed a critical vulnerability in LangGraph, the open-source framework many enterprises use to orchestrate AI agent workflows. The flaw allowed attackers to intercept agent communications, inject malicious instructions, and exfiltrate sensitive corporate data processed by compromised agents. The vulnerability affected any deployment running default configurations without additional authentication layers.

LangGraph’s architecture relies on stateful graph execution — agents pass intermediate results between nodes in a workflow graph. The vulnerability existed in how these intermediate states were serialized and transmitted between nodes. An attacker who could observe or intercept this traffic could reconstruct the full conversation context, including any proprietary data the agent had accessed.

The fix required a complete overhaul of the inter-node communication protocol. Organizations using LangGraph for production workloads faced a choice: shut down agent workflows entirely or accept the risk of continued operation on vulnerable versions. Many chose the former.

What Does the AWS Outage Reveal About AI Infrastructure Fragility?

The AWS outage that took down Slack, Signal, Zoom, and ChatGPT simultaneously demonstrated a concentration risk that the industry has been ignoring. When a single cloud provider’s regional services degrade, the blast radius now extends to AI tools that enterprises have made mission-critical. ChatGPT’s unavailability alone disrupted workflows for millions of developers, researchers, and support teams.

The outage was traced to a configuration error in AWS’s US-East-1 region that propagated faster than automated rollback systems could respond. Within minutes, dependent services began failing cascading health checks. AI tools were disproportionately affected because they require sustained backend connections — brief interruptions terminate active inference sessions entirely.

This architectural brittleness is structural. Recovery takes time.

The incident exposed how few AI platforms have implemented genuine multi-region failover. Most enterprise AI deployments run in a single availability zone with optimistic assumptions about provider reliability. When those assumptions fail, there is no fallback.

How Widespread Is Shadow AI Usage in Enterprise Environments?

According to a joint report by ESET and DAGMA, 62% of Polish employees use AI tools in their daily work, yet only 27% of firms have established clear governance policies. More alarmingly, 35% of employees stated they would circumvent any corporate ban on AI usage — effectively making shadow AI usage an unmanageable risk vector that traditional security controls cannot address.

The report also found that one in ten employees regularly pastes confidential corporate data into public AI tools. This includes source code, financial projections, customer data, and internal strategic documents. Once submitted, this information enters training pipelines or temporary contexts that the organization cannot control or audit.

Security teams are fighting a losing battle. The productivity gains from AI tools are so immediate and tangible that employees adopt them faster than policy can adapt. Traditional DLP systems cannot distinguish between a legitimate web search and a query submitted to an AI model.

The ESET-DAGMA report reveals that 85% of Polish companies and institutions experienced a cyberincident within the past twelve months. While not all incidents directly involved AI, the correlation between increased AI adoption and rising incident frequency is statistically significant. Half of employees cannot identify basic digital protection protocols, creating a workforce that deploys powerful tools without understanding the associated risks.

The data paints a troubling picture of organizational readiness. Cybersecurity budgets are rising, but they are not keeping pace with the expanding attack surface that AI tools introduce. Each new AI integration creates potential entry points for data exfiltration, prompt injection attacks, and model manipulation.

Polish CERT handled approximately 273,000 cyberincidents in 2025 alone. The volume continues to grow.

Local governments and small businesses represent what security experts call the “soft underbelly” of cybersecurity — organizations with minimal IT staff, legacy infrastructure, and no dedicated security personnel. These entities are increasingly targeted because they often serve as entry points into larger supply chains.

How Does Shadow AI Compound the Outage Problem?

When platforms go down, employees do not stop working — they route around the damage using unsanctioned tools. According to a report by ESET and DAGMA, 62% of Polish employees use AI tools, and 35% would bypass a company-wide ban to keep using them. Only 27% of firms have clear rules governing AI usage at all. This creates a parallel infrastructure that IT teams cannot monitor or protect.

During an outage like the AWS disruption that took down Slack, Signal, and ChatGPT, shadow AI usage spikes dramatically. Employees desperate to maintain productivity paste sensitive data into whatever tool still functions. The ESET/DAGMA report reveals that one in ten employees regularly feeds company data into AI tools without oversight. When primary systems fail, that percentage climbs.

Security teams lose visibility entirely. They cannot enforce data loss prevention policies on tools they do not know exist. The combination of infrastructure outages and shadow AI creates a perfect storm for data exfiltration. Attackers know this and actively target alternative platforms during peak disruption windows.

What Cybersecurity Risks Do AI Agent Platforms Like LangGraph Introduce?

Critical vulnerabilities in AI agent platforms expose entire corporate infrastructures to hijacking. A severe flaw discovered in LangGraph — a widely adopted framework for building AI agents — allowed attackers to seize control of active agents and access connected enterprise data. The vulnerability meant that a single compromised agent could serve as a gateway to databases, APIs, and internal systems the agent was authorized to reach.

The attack surface grows with every integration. AI agents connect to CRM systems, document repositories, communication platforms, and financial databases. When an attacker hijacks an agent through a platform vulnerability, they inherit all of those connections. Traditional perimeter security does not account for agents acting as authenticated intermediaries between dozens of services.

Polish data reinforces the severity: 85% of Polish firms experienced a cyber incident within a single year, yet half of all employees remain unaware of basic digital protection rules. The disconnect between threat exposure and security awareness makes agent platform vulnerabilities particularly dangerous for organizations rushing to deploy automation without adequate guardrails.

How Much Financial Damage Do AI-Dependent Outages Cause Enterprises?

Outage costs scale with dependency — and dependency on AI infrastructure has grown exponentially. The AWS outage that paralyzed Slack, Zoom, and ChatGPT demonstrated how a single point of failure cascades across the entire SaaS ecosystem. Companies paying premium rates for AI-powered tools lose not just access but also the productivity multipliers those tools provide.

For organizations using AI agents to automate business processes, an outage means more than idle time. Automated workflows stall mid-execution, leaving transactions incomplete and customers unattended. Business Insider reports that AI agents now handle tasks so complex that even executive boards are reconsidering their own roles in decision-making chains. When those agents go offline, the manual fallback processes no longer exist.

Cybersecurity costs are simultaneously rising. Polish reports indicate that drastically increasing hardware prices and new legal requirements have pushed cyber protection out of IT department budgets and into executive-level strategic decisions. Organizations must now fund redundancy, backup systems, and alternative providers — a cost that did not exist when AI was optional rather than foundational.

Why Do Employees Reject AI Tools Even After Outages Are Resolved?

Employees reject tools that fail to deliver tangible value in daily work — not AI itself. According to CEO.com.pl, a significant portion of office workers in the most technologically advanced economies describe themselves as AI skeptics. The skepticism traces back to tools that overpromise and underdeliver, particularly those that break during critical moments.

Repeated outages erode trust permanently. When an employee loses work because an AI platform went down mid-task, they retreat to familiar methods. A worker who pasted a complex analysis into ChatGPT only to lose it during the AWS outage will think twice before relying on that workflow again. The tool becomes associated with risk rather than reliability.

The CEO.com.pl report frames this precisely: workers do not oppose AI. They oppose tools that interrupt their workflow without providing enough value to justify the friction. Outages amplify that friction to unacceptable levels. Organizations deploying AI must account for reliability as a core feature — not an afterthought — because every minute of downtime converts another group of users into permanent skeptics.

Frequently Asked Questions

How widespread is unauthorized AI usage in corporate environments?

The ESET and DAGMA report reveals that 62% of Polish employees actively use AI tools, while 35% would bypass company bans to continue using them. Only 27% of organizations have implemented clear AI usage policies. Additionally, one in ten employees regularly inputs confidential company data into AI platforms without any security oversight.

What made the LangGraph vulnerability particularly dangerous for enterprises?

The critical LangGraph flaw allowed attackers to hijack active AI agents and gain access to all enterprise data and systems those agents were authorized to use. Since AI agents typically connect to multiple databases, APIs, and internal platforms simultaneously, a single compromised agent could expose an organization’s entire integrated infrastructure to malicious control.

How many cyber incidents affect Polish businesses annually?

According to industry reports, 85% of Polish firms and institutions experienced at least one cyber incident within a single year. In 2025 alone, nearly 273,000 cyber incidents were handled in Poland. Despite this volume, half of all employees remain unfamiliar with basic digital protection rules.

What happened during the AWS outage that affected AI services?

A major AWS cloud service failure paralyzed multiple platforms simultaneously, including Slack, Signal, Zoom, and ChatGPT. The disruption demonstrated how dependent the modern internet ecosystem has become on a single cloud provider. Services relying on AWS infrastructure — including AI platforms — became completely inaccessible for the duration of the outage.

Summary

Shadow AI amplifies outage damage: With 62% of employees using AI tools and only 27% of firms having clear policies, infrastructure failures push workers toward unsanctioned alternatives that security teams cannot monitor.
Agent platform vulnerabilities create new attack surfaces: The LangGraph flaw demonstrated that hijacked AI agents serve as gateways to every connected enterprise system, exploiting the deep integrations that make agents powerful.
Outage costs compound with AI dependency: When AI agents automate core business processes, outages eliminate not just tools but entire workflows — and the manual fallback processes have often been eliminated.
Reliability determines AI adoption: Employees do not reject AI itself but reject tools that fail during critical moments, making uptime and resilience core requirements rather than technical details.
Cybersecurity spending must reach executive level: With 85% of Polish firms hit by cyber incidents and hardware prices rising, organizations can no longer treat cyber protection as a department-level concern.