# P2P Architecture _How peers find each other, sync function bodies, and survive the loss of any one node._ ## Overview Calling Mothership "P2P" can be misleading if you expect BitTorrent-style decentralization with no coordinator. It is not that. Mothership is a _hybrid_ P2P system: there is a coordinator role (the Mothership), but every peer is a first-class replica, peers can talk directly to each other, and the coordinator is not a single point of truth. This page explains the architecture: why we chose this shape, how peers discover each other, how sync actually flows, and what happens at the edges — NAT, firewalls, asymmetric networks. ## Why P2P, Specifically The alternative architectures we considered and rejected: **Pure client-server (like Gitea).** One server, many clients, all reads and writes through the server. Simple to reason about. But it concentrates three risks: availability (server down = team down), integrity (a compromised server can rewrite history for everyone), and bandwidth (every file transfer hits the server). For a team on one LAN with a dead Mothership, pure client-server means nobody can sync even though everyone's machines are sitting feet apart. **Pure mesh (like Scuttlebutt).** No coordinator. Every peer gossips to every other peer. Elegant but converges slowly on large teams and has no natural place to hang team-wide policy (join tokens, zones, messaging). We built a version of this and killed it — the UX was worse. **Mothership's model — coordinator + direct peer channels.** The Mothership holds team-wide state (peer list, revocations, zones, messages) and acts as a relay for peers that can't reach each other directly. But two peers who _can_ reach each other open a direct TLS tunnel and sync without routing through the Mothership. Best of both worlds: simple team policy, good LAN performance, survives coordinator failure. ## The Three Kinds of Traffic Every connection in a Mothership network is one of: 1. **Peer-to-Mothership.** Long-lived TLS connection. Carries control messages, WAL subscription, team messages, zone claims. 2. **Peer-to-peer direct.** Opportunistically established when two peers are on the same network or can reach each other through NAT. Carries bulk data — function body patches, snapshot sync. 3. **Mothership-to-Mothership.** In [federated topologies](/team-topology), two Motherships peer as if each were a special kind of client of the other. All three run over the same wire protocol with the same TLS setup. The only thing that differs is who initiates and what state they carry. ## Discovery Protocol How does peer A know peer B exists? ### Step 1: Mothership authoritative peer list When A joins the Mothership, the Mothership sends A the current peer list: ```text peer_id display_name network_hints peer_0a1b2c3d alice@acme [10.0.1.42, 100.64.0.5] peer_1b2c3d4e bob@acme [100.64.0.12] peer_2c3d4e5f carol@contractor [100.64.0.88] ``` Every `network_hint` is an address at which that peer believes it can be reached. The peer announces these addresses to the Mothership on connect. The Mothership does not validate them; it just redistributes. ### Step 2: Gossip updates When a new peer joins, the Mothership broadcasts a `peer_added` event. Existing peers receive it and know a new peer exists. When a peer disconnects, a `peer_gone` event follows. ### Step 3: Direct connection attempt When peer A wants to send a function body to peer B, A tries to open a direct TLS connection to each of B's announced addresses in parallel. The first one that succeeds wins. If all fail, A falls back to relaying through the Mothership. ```text A -> B:10.0.1.42 (LAN, usually wins on same office) A -> B:100.64.0.12 (Tailscale, wins for remote) A -> Mothership -> B (relay fallback, always works) ``` The TLS handshake uses the peer certificates issued by the Mothership, not the Mothership's own cert. Two peers verify each other by confirming their certs were signed by the same Mothership CA. A stranger with a random cert cannot pretend to be a peer. ## Peer List Synchronization Every peer maintains a local copy of the peer list. This is kept fresh by: 1. **Initial snapshot** on join. 2. **Event stream** (`peer_added`, `peer_gone`, `peer_network_update`) from the Mothership over the control connection. 3. **Gossip fill-in** from other peers. If A sees B mention a peer C that A doesn't know about (because A reconnected after C joined and missed the event), A requests the current entry from the Mothership. The list is bounded by the team size. At 500 peers (our tested upper bound; see [scaling](/scaling-mothership)), the full peer list is around 50 KB. It is replicated in full to every peer; there is no partitioning. ## How a Function Change Propagates Concrete example. Alice edits `calculate_risk_score` and logs an intent. ```text Mothership | | (3) WAL fanout v +------ peer_bob ------ peer_carol ------+ | | | | | | (4) direct | (4) | | v v | +----> peer_dan peer_eve <-------+ (1) Alice logs intent locally (2) Alice's Aura pushes AST patch to Mothership (3) Mothership appends to WAL, streams to all connected peers (4) Peers apply patch to their local semantic index (5) Any peer with an impact dependency raises an alert ``` Specifically, the patch is not a text diff. It is an AST-level operation: "replace the body of function with identity `calculate_risk_score@v7` with this new AST subtree." Peers apply it by updating their local semantic graph. If a peer's working copy also has uncommitted edits to the same function, the peer raises a conflict — but conflicts are rare because they require two people editing the same function simultaneously, not just the same file. ## NAT Traversal Direct peer-to-peer connections only work if peers can reach each other. Realities of modern networks: | Scenario | Direct works? | Notes | |---|---|---| | Same LAN | Yes | Announced as `10.x.x.x` or `192.168.x.x` | | Both on Tailscale / WireGuard mesh | Yes | Announced as `100.x.x.x`. Our recommended setup. | | Both on same corporate VPN | Usually | Depends on VPN policy | | One on home WiFi, one on office LAN | No | Falls back to Mothership relay | | Both behind symmetric NAT, no VPN | No | Relay only | Mothership does not ship its own STUN/TURN infrastructure. We considered building NAT hole-punching and decided against it: it is complex, breaks in opinionated corporate networks, and the problem is already solved well by Tailscale, Headscale, Nebula, and plain WireGuard. If your team is distributed, run a mesh VPN. The Mothership and its peers will use it automatically. Relay fallback is always there. Nothing stops working if direct connections fail; it just uses more Mothership bandwidth. ## Wire Protocol The wire protocol is a binary framing over TLS. Every frame has a type, a stream ID, a length, and a payload. Streams are multiplexed — sync, messages, zones, and control all share the connection. Frame types include: - `HELLO` — protocol version negotiation - `AUTH` — JWT or peer cert presentation - `WAL_SUBSCRIBE` / `WAL_EVENT` — log replication - `PATCH_PUSH` / `PATCH_ACK` — AST patch delivery - `MSG_SEND` / `MSG_EVENT` — team messages - `ZONE_CLAIM` / `ZONE_RELEASE` — file ownership - `PEER_EVENT` — peer list updates - `HEARTBEAT` / `HEARTBEAT_ACK` — liveness We are deliberately not publishing the exact byte layout here because it is versioned and subject to change. Protocol negotiation in `HELLO` handles backward compatibility within a major version. You do not need to know the bytes to use Mothership; you only need to know that one TCP connection carries all of this. ## Heartbeats and Liveness Every peer sends a heartbeat every 30 seconds over its control connection. The Mothership responds with an ack containing any queued events. This doubles as keep-alive for NAT/firewall state tables. If three heartbeats in a row go unanswered, the peer considers the Mothership unreachable and transitions to [offline mode](/offline-mode). Local work continues; changes are buffered in the WAL. The Mothership uses the same mechanism in reverse: a peer missing for longer than the configured window (default 5 minutes) is marked `offline` in the peer list, but its entry is not removed. Other peers know not to attempt direct connections. ## The Mothership as Bandwidth Multiplexer A subtle benefit of the coordinator role: the Mothership can fan out a single push to N peers more efficiently than the pusher doing N direct uploads. When Alice pushes a 10 KB patch to the Mothership, the Mothership holds it once and streams it to each subscribed peer as they come ready. Alice does not have to open N connections from her laptop. This matters when Alice is on a hotel WiFi and has 10 teammates to reach. When direct peer-to-peer is available, the protocol prefers it for bulk data — but only between pairs. For one-to-many, the Mothership wins. ## Resilience to Coordinator Loss If the Mothership disappears: - Existing direct peer-to-peer channels keep working. - Local work keeps working, backed by the WAL. - New peer joins cannot happen (no one to sign tokens). - Team-wide state (zone claims, revocations) stops updating. Peers reconnect automatically when the Mothership returns. The WAL is replayed, any missed events are caught up, and the team is back in sync. For teams that cannot tolerate even brief coordinator outages, run a second Mothership in a [federated topology](/team-topology#mesh). Either can sign tokens; both carry the same replicated state. ## Next Steps - [Choose a topology for your team](/team-topology) - [Understand offline behavior in detail](/offline-mode) - [Scale to 500 peers](/scaling-mothership) - [Set up TLS properly](/tls-and-jwt)