Sync Troubleshooting

Stuck queues, stale state, ghost conflicts — the ten things that can go wrong and the one command that fixes each.

This page is a field manual. When Live Sync misbehaves — and in a real-time distributed system, it occasionally will — the symptoms fall into a small number of buckets. Each bucket has a diagnosis command, a likely cause, and a remediation. If your problem is not covered here, aura doctor live will dump a structured health report you can attach to an issue. Start with that. The rest of this page walks the buckets in order of frequency.

Overview

Live Sync has three moving parts: the scanner, the network link to the Mothership, and the WAL. Nearly every problem is one of: the scanner is not seeing changes, the network link is degraded, or the WAL has something pending that is not draining. The standard diagnostic sequence is:

aura live sync status    # one-line health
aura doctor live         # structured health dump
aura live wal status     # WAL-side view
aura live presence       # peer-side view

If those four commands agree on what is wrong, the fix is usually one command. If they disagree, you have found a bug — file it.

How It Works: the health model

aura doctor live prints a categorized report:

  live sync doctor
  ────────────────
  scanner:        OK    (warm scan 12ms, dirty=3)
  mothership:     OK    (ws online, rtt 38ms, peers 4)
  wal:            OK    (412 records, pending out 0, pending in 0)
  conflicts:      OK    (0 unresolved)
  impacts:        WARN  (2 unresolved cross-branch)
  presence:       OK    (heartbeat 22s ago)
  integrity:      OK    (segment checksums valid)

  overall: WARN — address impacts

Each line is a sub-check. OK, WARN, FAIL. If anything says FAIL, start there. If everything says OK but you still see a symptom, it is almost always either a config mismatch (selective sync) or a clock-skew issue with the Mothership.

Bucket 1: "I saved but nothing pushed"

Symptom: You save a function. aura live sync status shows no recent push. Teammates do not see your change.

Check, in order:

Is Live Sync enabled? aura live sync status. First line should say enabled. If not, aura live sync enable.
Is the file included? aura live selective why path/to/file.rs. If filtered by an exclude glob, fix the glob in .aura/live.toml. See selective sync.
Is the function marked local-only? Same why command shows per-function opt-outs.
Did the AST actually change? A whitespace/comment-only save produces no aura_id change and no push — this is expected and correct. Check with aura diff --semantic path/to/file.rs.
Is the outbound queue stuck? aura live wal status — if pending out > 0 for more than a minute, see bucket 3.

If 1–4 are fine, the scanner is healthy but the push is blocked. Go to bucket 3.

Bucket 2: "Mothership offline"

Symptom: Status shows mothership: offline or degraded.

aura live sync status --peer mothership

  mothership:  offline
  last seen:   3m 12s ago
  endpoint:    ws://mothership.local:7070
  retries:     4 (next in 30s, backoff exponential)
  queued:      12 outbound pushes waiting

Cases:

Mothership process down. SSH in and aura mothership status. Restart with aura mothership start. See Mothership operations.
Network path blocked. Corporate VPN, firewall, or laptop came off Wi-Fi. Reconnect; Aura retries automatically.
Wrong endpoint. Someone moved the Mothership and the client config still points to the old address. Fix [live].mothership in .aura/live.toml.
Auth token expired. aura live auth refresh.

During an outage, your work is safe — edits flow to the WAL and push when the link returns. You can keep coding.

Bucket 3: "Queue won't drain"

Symptom: aura live wal status shows pending out > 0 and not decreasing. Mothership is online. No apparent error.

This is usually a specific push being rejected on the server side. Check:

aura live wal tail --n 20

Look for records marked OUTBOUND_FAILED. The reason field tells you why. Common reasons:

payload_too_large — a function body exceeds the Mothership's size cap. Increase the cap in Mothership config, or split the function.
signature_rejected — your client signature is invalid. Re-auth with aura live auth refresh.
repo_not_registered — the Mothership does not know about this repo. Run aura mothership register on a peer with admin rights.

Force-flush the queue after fixing:

aura live sync push --flush

Bucket 4: "I pulled but the function didn't update"

Symptom: Status shows a recent inbound pull, but your file still has the old body.

Is there a pending conflict? aura live conflicts list. A conflict staged but not resolved means the apply is waiting on you. See sync conflicts.
Is the file writable? Permissions, read-only mount, or an editor holding an exclusive lock can block the atomic rename. Aura logs this as APPLY_FAILED in the WAL tail.
Is your working copy ahead? If you edited the same function locally before the pull arrived and your local version is newer, Aura keeps your version and files the incoming as a conflict (which you may have dismissed). Check aura live wal tail --filter APPLY_SKIPPED.

Bucket 5: "Ghost conflicts"

Symptom: A conflict keeps appearing for a function that looks identical on both sides.

The base-hash pointer has drifted. Both sides diverge from different bases. Force a reconciliation:

aura live sync reconcile --function billing::compute_tax

This re-derives the base from the Mothership's canonical ledger and re-evaluates. In the rare case reconciliation cannot pick a base, it raises a proper conflict with both sides for you to resolve interactively.

Bucket 6: "Stale presence"

Symptom: A peer shows online in aura live presence but they assure you they are offline.

Presence uses 90-second windows. Up to 90 seconds of staleness is expected. If it persists longer, the peer's final heartbeat recorded a "still editing" state and the Mothership is holding it. aura live presence --refresh pings all peers; anyone who does not respond in 10 seconds is demoted to offline.

Bucket 7: "WAL says corrupted"

Symptom: aura live wal verify reports CHECKSUM_MISMATCH on a segment.

Do not panic. Aura never rewrites WAL segments in place, so corruption almost always means something outside Aura wrote to the file — a backup tool, a sync daemon (Dropbox, iCloud), or disk hardware error.

aura live wal recover

Recovery salvages the intact prefix, truncates at the first bad record, and resumes. You lose at most the corrupted segment's unacked pushes — which will be re-detected on the next scan tick.

After recovery, move .aura/live/wal/ out of any cloud-sync directory. Dropbox-ing the WAL is a well-known foot-gun.

Bucket 8: "Scan is slow"

Symptom: Ticks take seconds, not milliseconds. CPU spikes.

Three likely causes:

Too broad include. You are scanning a 200k-function monorepo every tick. Narrow with selective sync.
Tree-sitter parser cache cold. After a repo rebuild (e.g. deleting target/), the parser cache may be re-warming. One slow tick, then back to normal.
Disk slow. mtime checks on a networked filesystem (NFS, SMB) are slow. Move your working copy to local disk.

Bucket 9: "Peer sees my push but applies old version"

Symptom: You pushed the new version; teammate's aura live wal tail shows receipt of the new hash, but their file still has the old body.

Their apply failed. Ask them for:

aura live wal tail --filter APPLY_FAILED --n 10

This will show the reason — usually file permission, editor lock, or a merge conflict staged on their side they have not noticed.

Bucket 10: "Nothing works and I just want to reset"

Sometimes the right answer is to clear the WAL and start fresh. You lose pending local-only pushes, but if you have committed everything to Git first, this is safe:

git status                                   # confirm clean or committed
aura live sync disable
aura live wal reset --confirm
aura live sync enable
aura live sync pull --full                   # re-hydrate from Mothership

Gotcha: aura live wal reset is destructive. It does not touch your working copy, but it discards WAL history. Do not run it if you have uncommitted functions that have not yet been acked by the Mothership.

Log inspection

Detailed logs live in .aura/live/logs/:

aura live logs tail --n 200
aura live logs grep "APPLY_FAILED"
aura live logs since 15m

Bundle for an issue:

aura live logs bundle ./live-debug.tar.gz

The bundle is scrubbed of function bodies and secrets — safe to share.

Config

[live.debug]
verbose_logs = false
log_level = "info"            # trace | debug | info | warn | error
retain_log_days = 7

Getting help

If aura doctor live cannot auto-fix and the buckets above do not match, open an issue with:

aura doctor live --json output
aura live logs bundle
Rough description of the symptom and the last thing you did