Incident Response
"An incident is the moment your runbook stops being theoretical."
Overview
This document is a set of operational playbooks for three common compromise scenarios: a compromised peer machine, a leaked join token, and a rogue AI agent. Each scenario is presented as a sequence of commands to run and decisions to make, with the reasoning behind each step. The playbooks assume you have admin capability on the Mothership and are operating in a non-panicked state; the first discipline of incident response is to slow down enough to use the tools correctly.
Aura's role in incident response is to provide evidence, containment, and recovery. Evidence comes from the audit trail; containment comes from revocation and zone management; recovery comes from aura rewind and, where needed, key rotation. Aura does not replace your organization's incident-response process. It plugs into one, providing the technical controls your IR coordinator will invoke.
Threat Scope
The three playbooks below address the three most common scenarios we see in practice. Each has variants; each has common errors. The variants and errors are called out inline.
Scenarios not covered here — Mothership compromise, signing-key compromise, full-chain supply-chain incidents — are more serious and require vendor coordination. Contact Naridon's security disclosure channel directly for those.
Mechanism
Playbook 1: Compromised peer machine
A developer reports a lost laptop, an EDR alert fires on a developer workstation, or a forensic investigation concludes that malware had code-execution on a peer. The peer's identity key must be assumed compromised.
Step 1 — Contain. Revoke the peer's identity at the Mothership immediately. This refuses any future connection from that key.
aura mothership revoke-peer peer-5f3a --reason "laptop theft 2026-04-21"
Revocation propagates to every other peer on next sync (typically within seconds). The compromised peer, if still online, is dropped on its next sync attempt with an E_REVOKED error.
Step 2 — Enumerate exposure. Identify every intent the compromised identity produced, and every zone it claimed.
aura trace --actor peer-5f3a --since 30d > exposure.jsonl
aura audit export --signer peer-5f3a --since 30d > exposure-audit.jsonl
The --since window should be at least the period since the last known-good state; extend it if the compromise window is uncertain.
Step 3 — Triage changes. Run the security-focused review against every commit the identity produced:
aura pr-review --actor peer-5f3a --since 30d --security
Output flags suspicious patterns: newly introduced network calls, secret-like values, deleted security-critical functions, unusually large diffs. Every flag should be investigated manually — not every flag is a compromise artifact, but each is a candidate.
Step 4 — Rewind where needed. For each confirmed malicious change, use surgical rewind:
aura rewind --function src/auth/verify.py:verify_token --to <last-good-intent-id>
Rewind operates on AST nodes, not text lines, so it does not produce merge conflicts with legitimate work on neighboring functions. See aura rewind.
Step 5 — Rotate. Issue a new identity key for the affected user on a fresh machine. Never reuse the old key, even if the theft turns out to be a false alarm — revocation is cheap and rotation is the correct posture once suspicion exists.
aura mothership issue-peer alice --role contributor
Step 6 — Record. Commit an incident intent to the log so the event is documented in-band:
aura log-intent "INCIDENT: peer-5f3a revoked due to laptop theft. New identity peer-c7e2 issued for alice@example.com. See exposure-audit.jsonl in secops store."
Common errors. Operators sometimes revoke the peer without first capturing the exposure export — the export is still possible after revocation, but confusion about whether revocation deleted data is a recurring concern. It does not: revocation invalidates a key; the log persists. Always export, then revoke, as a habit. Another common error is forgetting that the compromised peer may have pushed changes already synced to other peers. Rewind on those peers must propagate; confirm with aura sync status that every peer has received the rewind intents.
Playbook 2: Leaked join token
A developer posts a join token to a public Slack channel. A CI log with a token gets archived to a third-party store. A backup is pulled from an unexpected jurisdiction. The token must be assumed adversary-available.
Step 1 — Revoke. Join tokens carry a unique jti (token ID) issued by the Mothership.
aura mothership list-tokens --active
aura mothership revoke-token tk_01J2ABCDEFGHJKMNPQ --reason "posted to #general"
The revocation list updates and propagates to peers. Any future use of the token receives E_REVOKED.
Step 2 — Audit the window. Determine whether the token was used before revocation.
aura mothership audit-token tk_01J2ABCDEFGHJKMNPQ
Output enumerates every join attempt made with the token, successful or refused, with timestamps and source IPs. If the token was never successfully used by an unknown party, the exposure is limited to the leaked secret itself; no further action is strictly required beyond the revocation.
Step 3 — If the token was used by an unknown peer. You now have a compromised peer — fall through to Playbook 1 for that peer, using the peer_id the token was bound to.
Step 4 — Tighten token policy. If leaks are recurring, shorten the default TTL. A token valid for 10 minutes is considerably harder to misuse than one valid for an hour.
[crypto]
join_token_ttl = 600 # 10 minutes
Common errors. Teams sometimes invalidate the token by rotating the Mothership signing key — this works but is heavy-handed, invalidating every other token as well. Revocation by jti is targeted and preferred. Another error is assuming a token in a private Slack channel is "safe" — Slack archives are discoverable in legal processes, searchable by Slack admins, and occasionally breached. Treat any token outside of the provisioning pipeline as leaked.
Playbook 3: Rogue AI agent
An AI coding agent — prompt-injected by a malicious snippet it read, misbehaving due to a model regression, or simply producing destructive diffs at speed — must be stopped and its actions reviewed.
Step 1 — Revoke the agent's token.
aura agent list
aura agent revoke claude-7f3a --reason "generated unvetted deletions across auth module"
Revocation takes effect within seconds on the next tool call.
Step 2 — Pause adjacent agents. If the rogue behavior appears to originate from a prompt injection or a shared context, other agents using the same upstream model or running in the same repository may be affected. Consider broader revocation:
aura agent list --filter "kind=agent"
# revoke the bad one plus any on the same session
aura agent revoke-all --role agent --reason "investigating prompt injection"
This is a hard pause — no AI agents can act until new tokens are issued. Human committers are unaffected.
Step 3 — Enumerate agent actions. Every MCP call the agent made is in the audit log.
aura trace --actor claude-7f3a --since 6h
aura audit export --actor claude-7f3a --since 6h --format jsonl > rogue-agent.jsonl
For each intent the agent produced, read the intent text, the AST delta, and the resulting function hashes. Cross-reference against the user's reports of what the agent was supposed to be doing.
Step 4 — Rewind malicious changes. Per function, per file:
aura snapshot-list --author claude-7f3a
aura rewind --function <path:name> --to <good-intent-id>
Where the agent produced many small commits, consider rewinding to the snapshot immediately before the agent's session began rather than going commit-by-commit. Snapshots are what aura_snapshot produced automatically before each edit.
Step 5 — Investigate the cause. Prompt injection, model regression, and operator error (granting too broad a capability set) all manifest similarly. Review:
- The agent's input: what documents, messages, or code did it read in the session?
- The agent's capability set: was
pushgranted whencommitwould have sufficed? - The agent's quotas: did
max_commits_per_hourcontain the blast radius, or was the quota set too high?
Tighten the configuration as needed. See agent permissions for capability guidance.
Step 6 — Restore service cautiously. Issue new tokens to agents one at a time, with explicit observer supervision for a period. Increase audit verbosity temporarily:
[agents.enforcement]
log_all_calls = true
Common errors. Operators sometimes delete the agent's work by filesystem restore, bypassing rewind. This loses the audit value: the commits remain in the intent log as orphaned actions with no record of correction. Always use rewind so the correction is itself an intent in the log. Another common error is revoking the agent but leaving its sentinel-inbox conversations in place; if the agent was communicating with another agent, the recipient may still be acting on instructions. Review aura sentinel inbox for cross-agent coordination traces.
Configuration
A hardened incident-response posture includes the following defaults:
[mothership]
# Short token TTL limits leaked-token exposure
join_token_ttl = 600
[audit.export]
# Continuous SIEM export so evidence survives even if the Mothership is later lost
enabled = true
format = "otlp"
endpoint = "https://collector.internal:4318"
on_failure = "buffer"
[agents.defaults]
role = "reader"
max_ttl = 7200
require_explicit_role = true
deny_capability_escalation = true
[hooks]
# Alert on unusually large agent diffs
large_diff_threshold_files = 20
large_diff_threshold_lines = 2000
Useful one-liners to keep in runbooks:
# Who has active access?
aura agent list
aura mothership list-peers
aura mothership list-tokens --active
# What happened in the last N hours?
aura trace --since 6h
aura audit export --since 6h --format jsonl > recent.jsonl
# Emergency admin lockdown — requires physical/admin presence
aura mothership lockdown --reason "ongoing incident investigation"
# While locked, no new tokens issue; existing peers continue read-only
aura mothership unlock
aura mothership lockdown is the break-glass control when the scope of an incident is unclear. It refuses new tokens, new pushes, and new zone claims while leaving pull-only operations intact so investigators can continue to read. Use it sparingly — it is disruptive — but know it exists.
Limitations
- Revocation requires Mothership reachability. An air-gapped peer that cannot reach the Mothership does not learn of a revocation until it syncs. In practice this is rarely a problem (attackers cannot exploit an offline peer without online connectivity), but a partitioned peer with the attacker on the same partition is a threat model Aura does not fully address.
- Rewind is semantic but not perfect. A change that touched many functions in subtle, correlated ways may be hard to rewind cleanly; treat deep-surface incidents as full-repo restoration events. Take a signed backup of the intent log before major rewinds.
- Attribution depends on key custody. If two developers share a peer identity key (which you should never do, but it happens), the log cannot distinguish them. Single-peer single-identity is a hard rule.
- Prompt-injection recovery is bounded by the quality of the snapshots and the promptness of detection. Aura automates snapshots before edits, but if prompt injection persisted across many sessions before detection, recovery may require reverting to an older known-good tag.
- The playbooks here are starting points, not substitutes for organizational IR. Integrate Aura's commands into whatever incident process you already run; do not invent a new process centered on Aura alone.
See Also
- Threat model — the adversary set these playbooks address
- Audit trail — the evidence you rely on
- Agent permissions — capability revocation details
- Aura rewind — surgical recovery
- Aura trace — reading the log for investigation
- Security disclosure — when to escalate to vendor
- Mothership overview — operator controls