Offline Mode

What happens when the network goes away. Your work doesn't stop.

Overview

Mothership is designed to tolerate disconnection. A peer that loses contact with its Mothership — because the WiFi died, the laptop lid closed on a train, the VPN dropped, or the Mothership host rebooted — does not stop being useful. Local editing, snapshots, intent logging, and branch operations all keep working. Changes are buffered in a local write-ahead log (WAL) and replayed on reconnect.

This matters in two directions. Practically, developers don't lose minutes to a flaky network. Architecturally, it means a brief Mothership outage is not an outage for the team — just a temporary pause on cross-peer sync.

What Works Offline

Everything that does not fundamentally require consensus with the team:

  • Editing files.
  • Running aura snapshot, aura log_intent, aura rewind, aura pr_review.
  • Creating branches, committing locally, running the semantic index.
  • Reading the local semantic graph.
  • Sending messages (they queue and send on reconnect).
  • Consuming previously-synced function bodies and intents.

What Does Not Work Offline

Things that require the Mothership as a coordinator:

  • Issuing or revoking join tokens.
  • Claiming new zones (existing zone state is cached but new claims can't be acknowledged).
  • Pulling new changes from teammates.
  • Receiving new cross-branch impact alerts.
  • Joining the team for the first time from a new machine.

In short: your own work continues; team-wide coordination pauses.

The Write-Ahead Log

Every state change on a peer is appended to a local WAL before anything else happens. The WAL lives at ~/.aura/wal/ as a series of append-only segment files. Structure:

~/.aura/wal/
  000000001.log   (10MB, sealed)
  000000002.log   (10MB, sealed)
  000000003.log   (active, writable)
  index.db        (offset index for fast scan)

Each record is a self-describing event: a patch, an intent, a message, a zone action. Records carry a local sequence number and a vector clock entry, so their order can be reconstructed on replay.

When the peer is online, the WAL is drained continuously — events flow to the Mothership as they are produced, and the Mothership acknowledges them. Acknowledged events can be compacted out of the WAL once they are persisted to the peer's permanent semantic store.

When the peer is offline, the WAL just grows. There is no hard size limit by default, but you can set one:

[wal]
max_size = "2GB"
max_age  = "30d"

If either limit is hit, the oldest events are evicted. A peer that has been offline for a month beyond the configured max_age will start losing the oldest of its offline changes. For most workflows this is irrelevant; if it matters, raise the limits.

Detecting Offline

The peer's sync loop considers itself offline when three consecutive heartbeats to the Mothership fail. Default heartbeat interval is 30 seconds, so the transition takes about 90 seconds.

While offline:

  • aura status shows mothership: offline (last seen 5m ago).
  • Commands that would normally sync report queued for later sync.
  • Incoming messages are not delivered (they remain on the Mothership, to be fetched on reconnect).

The peer retries connecting every 15 seconds initially, backing off exponentially to a maximum of 5 minutes between attempts. On network change (Wi-Fi switched, VPN reconnected), the backoff is reset — see persistent daemon for how network change detection works.

Reconnection and Replay

When the Mothership becomes reachable again, reconciliation runs in this order:

  1. Handshake. Peer presents its peer certificate. Mothership verifies and accepts.
  2. Clock sync check. If peer and Mothership clocks disagree by more than the allowed skew, peer warns but proceeds.
  3. Pull phase. Peer requests all WAL events from the Mothership since its last known position. These are applied to the local semantic store.
  4. Push phase. Peer streams its buffered WAL events to the Mothership. The Mothership appends them to its own WAL and fans them out to other peers.
  5. Impact recompute. With both sides now caught up, any new cross-branch impacts are computed and reported.
  6. Message delivery. Queued messages in both directions are delivered.

This whole sequence usually takes seconds. For a peer that was offline for a day on an active team, expect a minute or two; for a peer offline for a week on a large team, expect maybe five minutes. Progress is shown:

reconnecting to mship_7fcd21a9...
  handshake:        ok
  pulling events:   847/847 [==========] 2.3s
  pushing events:   62/62 [==========] 0.4s
  recomputing impacts: 2 new alerts
  delivering queued messages: 3
reconnected.

Conflict Handling

The key question: what happens when two peers, both offline, edit the same function?

Mothership's unit of sync is the AST node. A conflict requires both peers to have edited the same function body during their overlapping offline window. Editing different functions in the same file is not a conflict — they produce independent node updates.

When a real conflict occurs:

  1. Both peers come online. The first to push wins normal replication; their change is applied team-wide.
  2. The second peer's push for the same function is rejected with conflict: function X was modified by Y at T.
  3. The second peer's local edit is preserved, and the conflict is surfaced as a cross-branch impact alert.
  4. The second peer resolves by rebasing their change on top of the first peer's, or by using aura rewind to drop their edit.

Because edits are semantic, conflict resolution tends to be easier than text-diff merge. You see "Alice changed function X to do A; you changed it to do B" rather than "text hunk at lines 42-58 conflicts."

Gotcha. Offline mode does not extend to branch operations. If two peers create branches with the same name while offline, the second peer to reconnect will be asked to rename.

Vector Clocks and Ordering

Each event carries a vector clock: a map of peer_id -> local_sequence_number capturing which events from which peers this event causally follows. On reconnect, vector clocks are used to:

  • Deduplicate events the Mothership has already seen.
  • Establish a partial order consistent with causality.
  • Detect concurrent edits for conflict reporting.

You do not interact with vector clocks directly. They are an implementation detail, mentioned here because they explain why Mothership's reconnect behavior is principled rather than best-effort.

Strategies for Long Offline Periods

If you know you'll be offline for a while — a two-week field assignment, a boat trip — there are a few things worth doing.

Before going offline:

aura mothership prefetch

Pulls the latest intent log and every reachable function body into your local store. Slower than normal sync (minutes on large repos) but means your offline experience has the same context as your online experience.

Raise your WAL limits:

[wal]
max_size = "10GB"
max_age  = "90d"

On return, expect a long reconcile. Do it on a good network. Nothing is lost, but the initial push can move gigabytes.

Mothership Outage vs. Peer Offline

Same mechanism from both ends. If the Mothership itself is down:

  • Peers enter offline mode simultaneously.
  • Peers with direct peer-to-peer channels already established can still exchange patches. See p2p architecture.
  • New peer joins are impossible.
  • Zone claims and revocations freeze until the Mothership returns.

When the Mothership comes back, every peer reconciles against it using the same flow as an individual reconnect.

Security callout. Revocations issued before the Mothership went down are persisted and replicated to peers. They remain enforced during the outage. Revocations issued during the outage are not possible until the Mothership is back.

Watching the WAL

For debugging:

aura wal status

Output:

WAL at ~/.aura/wal
  segments:         3 (2 sealed, 1 active)
  total size:       24.3 MB
  oldest event:     2026-04-14 09:12:04
  newest event:     2026-04-21 11:38:22
  unacked events:   0
  last ack from:    mship_7fcd21a9 at 11:38:22

unacked events: 0 when online means you are fully in sync. Non-zero while online suggests a flaky connection or a slow Mothership.

aura wal replay --dry-run

Shows what would be pushed on next reconnect, without pushing. Useful for diagnosing "why is my sync so slow" scenarios (the answer is usually "because you have 10,000 unacked events").

Designing Your Workflow Around Offline

A few habits that make offline mode work for you rather than against you:

  • Log intent frequently. Intents are tiny and cheap. They make offline WALs easier to review.
  • Don't hold huge branches locally offline. The WAL can carry them, but reconcile is slow.
  • Prefer semantic operations. aura rewind applies cleanly offline; manually reverting text and hoping Mothership figures it out is fragile.
  • If you know you'll be offline, prefetch. Twenty seconds before disconnect saves hours of "why doesn't this function exist" later.

Typical Scenarios

A few concrete cases to illustrate behavior.

Commuter on a subway. Laptop goes through tunnels, WiFi drops, WAL accumulates a handful of events over twenty minutes. On emerging, network returns, Aura reconnects within seconds, flushes the WAL in one round-trip. The developer may not notice anything happened. This is the common case.

Weekend hackathon at an offsite. Team is on a venue WiFi with the Mothership unreachable. Five developers edit for eight hours. Each accumulates thousands of events. On return to the office on Monday, all five reconcile in parallel. Conflicts are possible but usually surface as clear "alice changed function X; you also changed function X" alerts, not as text merge nightmares. Resolution takes minutes, not hours.

Long field trip, two weeks offline. Developer prefetches before leaving, raises their WAL limits, works offline for two weeks. On return, reconcile pulls two weeks of team changes and pushes two weeks of local changes. Total reconcile time on a broadband connection: under five minutes for a typical repo. Impact alerts are generated for every function the developer touched that someone else also touched; they work through the alerts at their own pace.

Mothership host crashes. All peers transition to offline simultaneously. Direct peer-to-peer channels already established stay up and keep exchanging patches. When the Mothership host is restored from backup, peers reconcile against it. Any events that flowed peer-to-peer during the outage are pushed up to the Mothership as part of reconcile. Nothing is lost.

Why This Design

There are version control systems that treat offline as a failure mode and version control systems that treat offline as normal. Git is in the second camp. Mothership inherits that stance. A developer's local history is always authoritative locally; the Mothership is a convenience for team coordination, not a referee that can refuse your work.

This is a sovereignty property as much as a resilience one. If your internet is out, your work is still yours. If your employer's Mothership is temporarily unreachable, your branch is still your branch. If the Mothership operator revokes your access while you are offline, your local history is untouched — you cannot sync new work upstream, but nothing reaches into your disk to delete what you already have.

Gotcha. "My work is still mine" cuts both ways. Employees leaving a company walk away with their local Aura history just as they walk away with their local Git history. Revoke before their last day and back up the Mothership nightly.

Next Steps