Migration Reality Check

What actually happens when you point Aura at a ten-year-old repository.

"Tell me what the first two weeks look like. Not the demo. The real thing."

The Concern

Adoption cost is where most "better than Git" stories fall apart. The tool demos well on a fresh repo with three files. You run it on your actual monorepo — ten years of commits, submodules, LFS-managed binaries, three languages, a directory nobody has touched since 2018 — and the tool chokes, or produces nonsense output, or takes six hours to initialize and you never run it again.

The question an engineer should ask before adoption: what does the first parse of my repo look like? How long does it take? What breaks? What do I have to clean up? What does the two-week pilot actually involve?

How Aura Handles It

This page walks through the phases of a realistic adoption, with honest timings and honest failure modes.

Phase 0: Pre-flight (30 minutes)

Before running aura init on your main repo, do three things:

  1. Clone a fresh copy of the repo to a separate directory. Do not run the initial Aura parse against your working copy. This matters because the parse may take hours, and you want to keep working during that time.
  2. Check your repo size and language distribution.
    tokei    # or cloc; shows LOC per language
    git count-objects -v    # repo metadata size
    du -sh .git    # git's on-disk size
    
    If your repo is >500K LOC or uses languages beyond our well-supported list, expect the parse to take longer and to produce partial results for some files.
  3. Read when not to use Aura. If your situation is in that list, stop here.

Phase 1: Initial parse (2–6 hours for 500K LOC)

cd /path/to/fresh-clone
aura init

On a 4-core developer laptop, parsing a 500K LOC codebase typically takes 2–4 hours. On a 16-core server, 45–90 minutes. The parse is CPU-bound (tree-sitter per-file) and embarrassingly parallel; Aura uses all available cores by default.

What the parser is doing:

  • Walking the git history to identify files ever committed.
  • Parsing each file's current state into a tree-sitter syntax tree.
  • Identifying function-level units and assigning stable identities.
  • Walking the most recent N commits (default 1000, configurable) to establish rename chains and historical identity.
  • Building the semantic graph of call/import/type-use edges.

What you will see:

  • CPU fans engaging.
  • Progress output: parsed 12034 / 48291 files, 83127 functions identified.
  • Occasional warnings for files the parser could not handle (typically generated code, exotic DSLs, or corrupted files).

What you will not see:

  • Your working copy being modified. aura init writes to .aura/ and nothing else.
  • Your git history being changed.
  • Any network activity (unless you explicitly configure a Mothership).

Phase 2: Reviewing the parse (1–2 hours)

After the parse, run:

aura doctor

This produces a health report:

SEMANTIC GRAPH HEALTH
  Files parsed:          48,291
  Files unparseable:      1,203 (2.5%)   — see `aura doctor --unparseable`
  Functions identified:  83,127
  Functions with ambiguous identity:  412 (0.5%)
  Rename chains established:          2,847
  Unresolved rename chains:             118
  Cross-file edges:     412,003
  Cross-file edges unresolved:           87

The interesting numbers:

  • Unparseable files are files tree-sitter could not handle. Usually: vendored third-party code, generated protobuf, minified JavaScript, exotic shell scripts. Decide whether to exclude them (.aurignore, same syntax as .gitignore) or leave them as "pass through to git, no semantic layer." Both are valid.
  • Ambiguous function identity is the interesting one. These are functions where Aura's identity algorithm could not confidently pick a predecessor in git history — usually because the function was renamed and substantially rewritten at the same time, or because the file was moved and restructured. You get a list and can optionally resolve them:
    aura identity review
    
    Interactive walker through ambiguous cases. You can: accept Aura's best guess, manually link a function to its predecessor, or mark the function as "new" (no predecessor). Expect 0.3%–1% of functions to need review on a heavily-refactored 10-year repo. For 500K LOC with ~80K functions, that is 250–800 decisions. Most can be skipped (Aura's default guess is usually right). Budget 1–2 hours if you want to review them all. You do not have to.

We strongly recommend running Aura in observer mode for your first two weeks. In this mode:

  • Aura watches commits as they happen.
  • aura save and aura log_intent are available but optional.
  • The pre-commit hook is installed but in warn-only mode.
  • No team member is required to change their workflow.

Turn it on:

aura config set mode observer

What you get out of two weeks of observer mode:

  • Metrics on how often commits match stated intent (baseline before you enforce).
  • Impact alerts surfaced but not blocking.
  • A feel for false-positive rates in your specific codebase.
  • Data for the conversation with your team about whether to flip on strict mode.

At the end of two weeks:

aura report --period 2w

Produces a summary of what happened during observer mode: intent mismatches caught, conflicts detected, agent activity. Share with the team. Discuss whether the value justifies the continued cost.

If the answer is no, uninstall. aura uninstall --keep-git removes .aura/ and the hooks. Your git repo is untouched. No lock-in.

If yes, proceed.

Phase 4: Enabling hooks and strict mode (day 1 after observer)

aura config set mode strict
aura config set strict_mode_locked false    # set true once you trust it

The pre-commit hook now blocks commits that:

  • Delete functions without explicit intent.
  • Have stated intent that does not match the actual diff.
  • Touch files in zones claimed by other developers.

Expect a few false positives in the first week. Tune thresholds:

aura config set intent.strictness medium    # low | medium | high

Teach the team the --force escape hatch (requires explicit confirmation, is logged):

git commit -m "emergency hotfix" --no-verify    # standard git escape
aura commit --override-strict "emergency"       # logged escape, audit-trailed

Phase 5: Team-wide sync (week 3+)

Once your team is comfortable with local Aura use, introduce Mothership:

aura mothership init    # on a server
aura mothership invite alice@team
aura join <token>       # on each dev machine

Live sync, impact alerts, and team messages become available. This is the part of Aura that requires all-team adoption to deliver full value. Individual use is still valuable; team use is multiplicatively more.

Common Traps

The specific failure modes we have seen in real migrations:

Submodules. Aura does not recursively parse submodules by default. If your monorepo uses submodules for shared libraries, Aura tracks the submodule pointer in the parent repo but does not track the submodule's internal semantic graph unless you aura init inside the submodule separately. This is usually what you want. Surprises happen when a submodule's contents matter for the parent's semantic behavior.

Git LFS. Aura skips LFS-tracked files. They are passed through to git transparently. This is the correct behavior for binary assets. It is a problem if you have committed source code via LFS (rare but happens in game development). Check your .gitattributes.

Monorepo with mixed languages in nested projects. Aura handles mixed-language repos well at the top level. It struggles a bit when a single directory contains, e.g., Python and TypeScript that call each other via a bridge. The semantic graph becomes fragmented at the language boundary. Functional but not as rich as a single-language repo.

Pre-existing messy git history. Repos with merge commits generated by bad automation, force-pushed branches, or histories that were rewritten years ago produce lower-quality semantic backfill. Aura does its best with what is there. If the git history is pathological, expect more ambiguous identities and more unresolved rename chains.

Generated code committed to the repo. Protobuf, OpenAPI clients, ORM models, compiled assets. Aura parses these as ordinary source, which pollutes the semantic graph with churn from regeneration. Add them to .aurignore:

# .aurignore
**/generated/**
**/*.pb.go
**/schema.graphql

Heavy rebasers. If your team rebases aggressively and rewrites commits frequently, Aura's history tracking handles it but produces more ambiguous identity chains. Not a blocker, but expect more noise in aura identity review.

Binary-heavy repos. If most of your repo is images, videos, or data files, Aura is not doing much. Consider whether the overhead is worth it. See when not to use Aura item 6.

Very old commits. Aura's rename detection works best on recent history. Commits from 2015 with renamed and rewritten functions may not link cleanly to their modern descendants. The semantic history is thinner further back. Use git log for deep historical queries; use aura trace for recent semantic history.

CI environments. Running aura init in CI on every build is wasteful. Cache the .aura/ directory in your CI artifacts:

# GitHub Actions example
- uses: actions/cache@v3
  with:
    path: .aura
    key: aura-${{ hashFiles('.git/HEAD') }}

Incremental updates after the first parse are fast.

Partial Adoption

You do not have to roll Aura out everywhere at once. Common partial-adoption patterns:

  • One team, one repo. Start on a single team's main repo. If it works, expand.
  • Individual developers. Developers can install Aura and use it locally without the team installing it. They get intent tracking and aura prove; they do not get live sync or impact alerts (which require teammates). Useful for evaluation.
  • One language in a polyglot repo. If your repo is 90% Go and 10% shell, Aura's semantic value is in the Go part. You can run Aura everywhere; the value is concentrated where the parser is strong.
  • New code only. aura config set backfill.history_depth 0 skips backfilling semantic history. Aura starts tracking semantics from the first aura save going forward. Loses historical trace, gains fast init. Useful if the repo's past is messy and you only care about future state.

What Aura Does Not Solve

A team that does not want to adopt. If your team is skeptical or overloaded, a tool will not fix that. Do not force adoption from above. Pilot with interested developers and let value spread.

Deep historical semantic queries on pre-Aura history. You get what git history supports, plus Aura's best-effort backfill. You do not get rich semantic analysis on code from before you installed Aura. The backfill is useful but imperfect.

Rewriting of bad commit history. Aura does not fix your history. If your git log is messy, it stays messy. Aura adds a layer on top; it does not go back and clean the substrate.

Instant gratification. The real value of Aura compounds over weeks of team use. A one-day evaluation will show you the CLI works; it will not show you the impact on coordination and drift. Budget two weeks.

The Honest Tradeoff

Adopting Aura is a 2–6 hour initial parse plus a 2-week observer period plus a week of strict-mode tuning. Call it a developer-week of total cost spread over a month. The returns are structural — fewer semantic merge surprises, better agent coordination, better intent tracking — and accrue gradually.

If the expected return is smaller than a developer-week over a year, do not adopt. If you are using AI agents heavily, the returns typically exceed the cost within the first month. If you are not using agents and your team is happy on plain git, the returns are thinner and the adoption may not pay back.

We would rather you skip adoption than adopt, find it unhelpful for your situation, and churn. The goal of this page is to make the decision calibrated.

See Also