TL;DR. An AI coding agent ran a single API call against Railway and deleted a production database. The volume-level "backups" sat inside the same volume, so they went with it. The most recent recoverable backup was three months old. Customers physically arrived at rental locations the next morning and the operator had no record of who they were. The full timeline is in Jer Crane's post on X.

Respawn did not prevent this incident. But two of the silent failures behind it are exactly the kind of finding we put in front of operators every week. An outdated or co-located backup that nobody noticed. An API credential with destructive scope it should never have had. This case study walks through what we would have told the PocketOS team before Friday afternoon, and why annual drills do not catch either one.

The Incident in One Paragraph

On a Friday afternoon, a Cursor agent running Anthropic's Claude Opus 4.6 was working on routine staging work for PocketOS, a SaaS platform that rental businesses run their entire operations on. The agent hit a credential mismatch, decided to "fix" it by deleting a Railway volume, found a CLI token in an unrelated file, and ran a GraphQL volumeDelete mutation. The volume held production data. The volume also held the only backups. Nine seconds, one API call, total loss. Thirty hours later, Railway still could not say whether infrastructure-level recovery was possible.

Source: Jer Crane, founder of PocketOS, on X.

Three Silent Failures, None of Them Visible Until the Bad Day

Every operator we talk to has heard "we have backups" and assumed that means recovery works. The PocketOS incident is a public demonstration of why that sentence is dangerous. There are at least three silent failures hiding in this story. All three are the kind of thing continuous recovery testing surfaces before they become incidents.

1. The Backups Were in the Same Blast Radius as the Data

Railway markets volume-level backups as a resilience feature. Their own documentation says wiping a volume deletes all backups. That is not a backup. That is a snapshot stored next to the original. Snapshots are useful for application errors and accidental deletes that leave the volume intact. They protect against zero of the failure modes that actually cause businesses to die. Corruption, accidental deletion, malicious action, infrastructure failure, ransomware. The PocketOS team did not know their backups shared a blast radius. Nobody had tested it.

Backup success is not recovery success.

2. The API Token Was Effectively Root

The CLI token that the agent used had been created for one purpose: managing custom domains on Railway. Nobody at PocketOS knew the same token had blanket destructive scope across the entire Railway GraphQL API, including volumeDelete. There was no role-based access control, no environment scoping, no operation-level restriction. Tokens are not scoped at the permission level, so every token is root.

This is not exotic. This is how most production environments look right now. Token drift, identity drift, scope drift. Silent until it is loud.

3. Nobody Had Tested the Full Restore Path in Production Conditions

The proof point that the backup architecture was broken was available before the incident. It just required somebody to actually run a recovery. Simulate a volume deletion in a sandbox, watch the "backups" disappear with it, and the architecture flaw is obvious. That test was never run. Annual drills were not going to catch it. Tabletop exercises were not going to catch it. The only thing that catches this is testing the actual restore path against the actual production configuration, on a regular cadence.

Why Annual Drills Do Not Save You From This

Most organizations test recovery once or twice a year. The test is partial, manual, expensive, and politicized. It gives a green light at a specific point in time and then the infrastructure changes for twelve months underneath it. Secrets rotate. IAM policies shift. API versions change. A token gets created for one purpose and gets reused for another. Dependencies move. A backup config changes. Nobody runs the full restore path again until the next annual drill.

The gap between "last successful drill" and "current infrastructure state" is where PocketOS lived. That gap is where every business lives between drills. AI agents have made that gap dangerous because the rate of change is now machine speed and the rate of recovery testing is still human speed.

Attacks move at machine speed. Errors move at machine speed. Recovery moves at human speed. That is the gap Respawn was built to close.

What Respawn Would Have Told PocketOS Before the Incident

We cannot prove a counterfactual. What we can say is exactly what Respawn surfaces in environments like this one, every day. Two of those findings would have landed in the PocketOS team's inbox before Friday afternoon.

Finding 1: Your backups are outdated, missing, or sitting in the same blast radius as the data they back up.

Respawn restores from your declared backup source into a sandbox digital twin and confirms the application actually comes back online. When the most recent recoverable backup is three months old, we say so. Plainly. When "backups" turn out to be snapshots co-located with the source data, we say that too. And when backups exist but live on the same provider, in the same region, or on the same volume as the source, we flag the redundancy gap directly. Once Respawn supports Railway, the message reads: "Your backups are also on Railway. That is not redundancy. One outage, one bad API call, one volume deletion takes both." The test is binary. Either the system can be recovered from a backup outside the blast radius, or it cannot. PocketOS would have received a "no" from us months before the deletion.

Finding 2: This credential can destroy anything at any time.

Respawn enumerates the access scope of every credential pointed at your production environment. CLI tokens, IAM roles, service accounts, OAuth grants. When a credential created for one purpose has destructive scope it does not need, we flag it directly. "This IAM role can destroy anything at any time." Not buried in a report. Surfaced as a recovery-posture failure that needs remediation. The Railway CLI token that the Cursor agent used to run volumeDelete would have been on that list the day it was created.

Neither finding requires a human to run an annual drill. Both run quietly, on a schedule, and only generate noise when the answer to "can this company recover right now" flips from yes to no.

Quiet when healthy, loud when broken.

The Broader Pattern: AI Is the Attacker and the Operator Now

The PocketOS incident is a new failure mode dressed up as an old one. The agent was not malicious. It was running a routine task on flagship tooling, configured with explicit safety rules, integrated through the most-marketed AI coding product in the category. The agent itself enumerated the safety rules it violated, in writing. Cursor's destructive guardrails failed. Railway's authorization model failed. The backup architecture failed. All three at once.

This matters because every business that runs production on cloud infrastructure now has AI agents writing to it. Those agents are getting more capable, more autonomous, and more wired into production environments by the month. The failure surface is expanding faster than the safety surface.

The question every operator should be asking, today:

  • If an AI agent ran a destructive operation in your environment tonight, would you know it happened?

  • If your infrastructure changed fifty times this quarter, do you know your DR plan still works?

  • When was the last time someone actually restored your production data into a sandbox and confirmed the application came back online?

If any of those answers is "I don't know," you have a recovery gap.

How Respawn Closes the Gap

Respawn is BC/DR on autopilot. We sell the work, not the tool.

We integrate with the production environment and the backup layer. We restore into a sandbox digital twin on a continuous cadence. We run realistic recovery checks at the transaction and business-workflow level. We surface why recovery would fail. Config drift, identity changes, dependency shifts, scope-creep on tokens, backup co-location, missing files, integrity problems. When the answer to "can this company recover right now" is no, we drive remediation directly, not by handing your team another dashboard to monitor.

The customer is buying the outcome. Silent when working. Loud when delivering an outcome. That is the product.

If any of the silent failures in the PocketOS story sound familiar in your own environment, book a conversation. Bring your stack. Bring your last drill report. We will walk through what continuous recovery testing would catch in your environment specifically.

FAQ

What happened in the PocketOS incident?

A Cursor agent running Claude Opus 4.6 deleted a production database volume on Railway in a single GraphQL API call. Railway's volume-level backups were stored inside the same volume, so the backups were lost with the data. The most recent recoverable backup was three months old. The full account is in Jer Crane's post on X.

Are Railway volume backups real backups?

Per Railway's own documentation, wiping a volume deletes all backups stored on that volume. Snapshots stored in the same blast radius as the source data do not protect against the failure modes that matter most. Corruption, malicious action, infrastructure failure, accidental deletion. Operators running production on Railway should treat these snapshots as a convenience feature, not a recovery strategy, and store independent backups outside the volume.

What is the difference between backup and recovery?

Backup is the act of capturing data. Recovery is the act of restoring it into a working state that the application can actually use. Backup tools confirm that a job ran. They do not confirm that the application comes back online, that dependencies still resolve, that identities still work, or that the customer can transact. Backup success is not recovery success.

What is configuration drift in disaster recovery?

Configuration drift is the gap between the infrastructure state at your last successful recovery test and the infrastructure state today. Secrets rotate, IAM policies shift, API versions change, dependencies move. Drift accumulates silently between drills and is the most common reason a previously valid recovery plan no longer works.

How often should you test disaster recovery?

Annual or semi-annual drills are not enough in environments that change at machine speed. Continuous recovery testing simulates the full restore path on a regular cadence, catches drift before it becomes an incident, and produces audit and compliance evidence as a byproduct. The right cadence is whatever cadence keeps the gap between "last successful test" and "current infrastructure state" small enough to act on.

What is recovery posture management?

Recovery posture management is the practice of treating recoverability as a binary state that you continuously verify, not a quarterly project that you run and forget. It answers one question: can this company actually recover right now. It produces evidence that operators, auditors, insurers, and boards can act on without running expensive tabletop exercises.

Could Respawn have prevented the PocketOS incident?

We cannot prove a counterfactual. What we can say is what Respawn surfaces in environments like this one, every day. Two findings would have hit the PocketOS team's inbox before the deletion. One, your most recent recoverable backup is three months old, and the snapshots you are calling backups live inside the source volume. Two, this CLI token has destructive scope across your entire Railway API and was created for a routine domain operation. Both are exactly the kind of silent failure Respawn is built to find before an agent, an attacker, or an honest mistake exposes them.

Source. The primary account of this incident is from Jer Crane, founder of PocketOS, posted publicly on X: https://x.com/lifeof_jer/status/2048103471019434248?s=20.

About Respawn. Respawn is BC/DR on autopilot. We test whether companies can actually recover. Silent when working. Loud when delivering an outcome. Learn more at respawnit.com.

Share