Migration from Git

Importing a long-lived Git repository into Aura while preserving full history, backfilling function identities, and landing cleanly on the other side.

Overview

Aura does not replace Git. It runs alongside it, adding a semantic layer to commits your team is already producing. That design choice is what makes migration tractable: there is no "port everything over one weekend" event. You point Aura at an existing repository, let it index history, and begin operating with a semantic layer on the next commit.

This page documents the actual procedure for onboarding a long-lived Git repository into Aura — specifically the kind we see most often in enterprise engagements: ten to twenty years of history, hundreds of contributors, millions of lines of code, languages that have rotated in and out, and a handful of rewrite events along the way. We will walk through pre-flight checks, the import itself, function-identity backfill, shadow branches, and the most common pitfalls.

What "migration" means here

Migration has three distinct concerns, and it is worth separating them before we touch a command:

  1. History import — Aura reads git log and attaches a semantic index to every commit it can parse. The Git repository is not modified.
  2. Function identity backfill — Aura assigns stable identities to every function across history, so that a rename six years ago is correctly recognized as the same function.
  3. Shadow branches — Aura creates an internal refs/aura/shadow/* namespace that mirrors the AST-level state of selected branches. This is what enables AST merge for branches that predate the migration.

None of these steps modify the Git history visible to your developers. The import is additive and can be reverted by deleting the Aura-managed directories, with no impact on the underlying Git repository.

Pre-flight checks

Before importing, run the built-in preflight:

aura migrate preflight /path/to/repo

The output is a readiness report that covers:

  • Repository size and commit count. Above ten million commits or 200 GB, the import plan changes; we recommend engaging Naridon professional services.
  • Language detection. Which tree-sitter grammars will be used. Any file types that will be indexed at byte-level only because no grammar is available.
  • Filter-branch detection. History rewrites are recorded and handled, but they are worth flagging to stakeholders so that pre-rewrite behavior is understood.
  • Submodule map. Submodules are indexed individually, not inlined.
  • LFS pointers. LFS content is ignored by Aura; only the pointer files are indexed, which is almost always what you want.
  • Binary blob ratio. Repositories whose history is dominated by binary assets are handled but give less value.

A typical preflight report ends with an estimate of import time and disk footprint. For a ten-year Java monorepo of about 40 GB with 600,000 commits, expect roughly forty to ninety minutes of wall-clock import time on a four-core machine, and an Aura index size around 12% of repository size.

Running the import

Imports are idempotent and resumable. Interrupt at any point and restart with the same command; Aura will pick up where it left off.

aura migrate import /path/to/repo \
  --mothership https://aura.internal.example.com \
  --repo-id monorepo \
  --parallelism 4 \
  --shadow-branches main,release/*

Behind the scenes:

  1. Aura walks git log --all and enumerates commits in topological order.
  2. For each commit, it extracts the tree, parses every supported file with the corresponding tree-sitter grammar, and records AST-level metadata.
  3. Function identities are assigned incrementally, comparing each commit's functions against the previous identity map to detect renames, moves, and refactors.
  4. Shadow branches are created for the refs you requested.
  5. The whole index is streamed to the Mothership as the import proceeds, so that the index is queryable before the full walk completes.

Progress is printed to stderr and also exported as Prometheus metrics at aura migrate status --metrics:

commits walked:         412,003 / 613,847
commits parsed:         411,880
commits byte-only:          123 (files with no grammar)
functions tracked:    1,842,991
identities resolved:  1,830,112
renames detected:         3,244
moves detected:           1,097
estimated time remaining: 00:41:17

Function identity backfill

This is the step that gives Aura its value and the step most worth understanding.

Naive tools compute function identity from file path and function name. That breaks the first time someone renames a file or a function. Aura uses a content-aware identity: a combination of normalized signature, body structure, and sibling context, with a decaying lookback window across recent history. The practical effect is that a function renamed from process_order to process_incoming_order in 2019 is still the same identity today, and every commit touching it — including ones from before the rename — is linked to that identity.

Backfill follows the import walk, so by the time the walk finishes, every historical commit has a resolved identity map. The map is queryable:

aura history function "fn:1a2b3c…"
fn:1a2b3c…  process_incoming_order (current name)
  2014-06-12  created as validate_order  (commit a1b2c3)
  2016-11-04  renamed to process_order   (commit d4e5f6)
  2019-03-21  renamed + moved            (commit g7h8i9)
  2022-08-15  signature extended         (commit j0k1l2)
  2025-02-07  body refactored            (commit m3n4o5)

For the first time, every team sees a function's real life story, independent of whatever names it happened to wear at any given moment.

Shadow branches

A shadow branch is an AST-level mirror of a Git branch, stored under refs/aura/shadow/*. Creating shadow branches for your main release lines enables two capabilities on historical branches:

  • AST merge on old branches. A long-lived feature branch that predates the migration can still be merged into main using Aura's semantic merge.
  • Semantic diff across any two refs. Comparing a 2018 release tag against today's main produces a function-identity-aware diff rather than a textual one.

Create shadow branches during import with --shadow-branches, or later:

aura shadow create release/2023.3 release/2024.1

Shadow branches are maintained automatically once created. When a developer pushes to main, the corresponding shadow updates.

Verifying the import

After import completes, run the verifier:

aura migrate verify --repo-id monorepo

The verifier performs three checks:

  1. Commit coverage. Every reachable commit in the Git repository has an Aura index record.
  2. Identity completeness. Every function in the current HEAD state has a resolved identity back to its creation commit, or a documented reason it cannot be resolved (typically: the function was introduced in a pre-rewrite history segment).
  3. Shadow parity. Each shadow branch's AST state matches the AST computed live from the corresponding Git ref.

A successful verification exits zero and prints a summary. A failed verification identifies the specific commits or identities at fault and suggests remediation.

Cutover strategy

There is no "big bang" cutover. The recommended sequence:

  1. Week 1 — import. Run the import against a clone of the production repository. Verify. This is read-only against Git; no team impact.
  2. Week 2 — pilot team. Roll out the aura CLI and MCP integration to a single pilot team. They continue pushing to the same Git remote; their commits now also flow through Aura.
  3. Weeks 3–4 — observe. Compare Aura's intent log, PR review output, and zone suggestions against the team's actual workflow. Tune permissions.toml zones to match real ownership boundaries.
  4. Week 5+ — broaden. Enable additional teams. Each team's onboarding is a config change, not a data migration.
  5. Week 8+ — enforce. Once coverage is broad, enable strict mode and the pre-commit hook organization-wide. See RBAC & Permissions for the strict-mode lock procedure.

No step in this sequence requires a code freeze. Developers keep working on the Git repository they know; Aura layers on top.

Pitfalls and how to avoid them

We have imported several hundred repositories. A small number of issues recur often enough to be worth calling out.

Vendored third-party code

Large monorepos frequently contain vendored copies of third-party libraries. Aura will happily index them, but the function-identity graph ends up dominated by third-party functions that your team does not actually own.

Fix: exclude vendored paths at import time with --exclude vendor/**,third_party/**. The excluded paths remain available in Git; Aura simply does not track function identities inside them.

Generated code

Protobuf outputs, GraphQL schema generators, and similar tooling produce files that churn constantly without meaningful semantic change. These create noise in the intent log.

Fix: add generated-file globs to the per-repo ignore file (.aura/ignore). Generated files will still be included in commits normally; they just will not generate semantic events.

History rewrites

Repositories that survived a filter-branch or git filter-repo pass have a discontinuity at the rewrite boundary. Aura handles this by treating the rewrite as a synthetic identity event: functions present before and after the rewrite with matching content are linked; functions that changed substantially during the rewrite appear as new identities.

Fix: run aura migrate verify --explain-rewrites after import. The output documents every rewrite boundary Aura detected and any identities that did not link across it. Human review of the output takes an afternoon for a typical repository.

Unsupported languages

A grammar may not exist for a legacy language in the repository. Aura falls back to byte-level diffing for files in unsupported languages, which is strictly worse than AST-level but is never wrong — it just gives less value.

Fix: if an important share of your codebase is in a niche language, talk to Naridon. Commercial customers can commission a grammar.

Very large single files

Tree-sitter handles large files, but files above 50 MB tax the parser and can dominate import time. These are usually data files rather than code.

Fix: add them to the ignore list. If they really are code, increase --max-file-size and accept the longer import.

Rollback

If, at any point, you decide Aura is not the right fit, rollback is:

aura migrate uninstall --repo-id monorepo

This removes refs/aura/* from your Git repository, removes the .aura/ directory, and unregisters the repository from the Mothership. The Git repository is untouched. No rewrites, no cleanup, no lingering artifacts.

See Also