Limitations and Edge Cases

When Aura falls back, where it is honest, and how to tune it for pathological inputs.

Overview

No merge engine is perfect. A tool that claims perfection is a tool that lies to you. Aura is engineered so that its failure modes are visible, localized, and recoverable — not silent corruption, not whole-repo panics, not mysteriously wrong output. This page documents the places where Aura falls short of AST-level merging, the heuristics it uses to decide when to degrade, and the knobs available to tune behavior for your codebase.

Read this before running Aura on a repository for the first time. Knowing the limits is knowing the tool.

A tool is trustworthy when its failures are predictable. Predictability beats cleverness at scale.

How It Works

The fallback hierarchy

Aura's mergers are arranged in a hierarchy. When a higher-precision merger cannot proceed, it degrades to the next one down, always emitting a diagnostic:

 1. AST merge (tree-sitter + adapter)
 2. Structural merge (JSON, YAML, .env, Markdown block-level)
 3. Line merge (standard three-way text merge)
 4. Defer (write merge markers, let the user resolve externally)

At every level, Aura preserves the three-way model — ancestor, ours, theirs — so resolutions downstream of a fallback still have all three versions available.

When AST merge falls back to text

Aura picks text merge over AST merge in the following cases:

No grammar for the file's language. Aura supports a large language set (see cross-language AST), but proprietary DSLs and obscure languages fall through.
Parse failure on a modified subtree. If at least one side edited a region tree-sitter cannot parse, text merge is safer than guessing.
File too large. Above the configured cap (default 32MB for source, 128MB for structured data), full parsing is disabled.
Binary file. Detected by the null-byte heuristic and extension checks. Aura treats binaries as opaque; see below.
Minified / generated file. Heuristics on line length, density, and file-name patterns (*.min.js, dist/**, build/**).
Explicit opt-out. Paths listed in .aura/merge.json's text_only array.

Each fallback is recorded in the merge summary:

$ aura merge feature/payments --summary
merged: 47 files
  AST:         41
  structural:  4  (json, yaml)
  text:        2  (src/bundle.min.js, CHANGELOG.md)
  deferred:    0
diagnostics:
  src/bundle.min.js: minified heuristic triggered (avg line length 2,144)
  CHANGELOG.md:      grammar healthy, but `text_only` rule in .aura/merge.json

Partial parses

Tree-sitter's error tolerance means Aura often gets a usable tree even when the input has syntax errors. Aura exploits this: it identifies clean subtrees (regions with no ERROR or MISSING nodes) and performs AST merge on those, falling back to text merge only on the affected regions.

Concretely, if a file has 20 top-level declarations and one of them has a syntax error, Aura does AST merge on the other 19 and text merge on the broken one. You get the benefit of semantic merge for most of the file.

Binary files

Binaries (images, compiled artifacts, sqlite databases, anything detected by the null-byte heuristic) cannot be meaningfully three-way merged. Aura's policy:

If ours and theirs are byte-identical, keep either.
If one side is unchanged from ancestor, take the other.
Otherwise, raise a type/type conflict and record both alternatives in the merge session. Resolution verbs for binaries are keep-ours, take-theirs, or keep-both (writes both with disambiguating suffixes).

LFS-tracked binaries follow the same rules; Aura respects the LFS pointer format and only materializes blobs on explicit reveal.

Huge files

Large files are a mixed bag:

Source code over 32MB. Almost always generated. Text merge.
JSON over 128MB. Almost always a data dump. Text merge with a warning suggesting Git LFS.
Lockfiles. Can exceed 10MB; Aura's JSON merger has a fast path for these (prefer-theirs or prefer-ours per .aura/merge.json). Merging two three-way lockfiles key by key is technically possible but rarely what you want — regenerating the lockfile is usually the right move.
CSV / TSV / data files. Aura does not ship a CSV merger by default. Text merge.

Caps are configurable:

{
  "limits": {
    "ast_merge_max_bytes":        33554432,
    "structural_merge_max_bytes": 134217728,
    "parse_timeout_ms":           5000,
    "node_count_cap":             250000
  }
}

When a cap trips, Aura emits a diagnostic naming the specific limit so you can tune intentionally.

Tree imbalance

Extremely deep or wide trees stress the diff algorithm. Aura caps recursion at 10k levels and sibling-width at 100k nodes per parent; beyond that, the affected subtree degrades to text merge. In practice, the only files that hit these are adversarial inputs or tools that emit pathological formatting.

Rename detection

Aura's cross-branch rename detection is fuzzy: it uses AST similarity to match a deleted declaration on one side to an inserted declaration on the other. The threshold (default 0.75 Jaccard similarity on subtrees) is configurable. Below the threshold, a rename is seen as unrelated delete + insert, which can surface extra conflicts. Above the threshold, two genuinely different functions might be fused. The default is tuned for realistic codebases; teams with unusual styles can adjust:

{
  "rename_detection": { "threshold": 0.7 }
}

Formatter disagreements

Aura's pretty-printer observes the file's existing style: indent, quote preference, trailing commas, line width. When one side has run a new formatter (prettier major version bump, rustfmt with a new config) and the other has not, Aura can interpret the formatting-only hunks as real changes. The mitigation: run the formatter on ancestor before merging. aura merge --normalize-ancestor does this automatically using the project's detected formatter.

Timestamps and non-deterministic builds

Some generated files embed timestamps or nondeterministic values (build IDs, generated comments with dates). Every commit changes these, which defeats merge. Aura recognizes common patterns (ISO-8601 timestamps in header comments, UUIDs in // Generated by ... lines) and strips them from the compare while preserving them in the output. Unusual generators need a .aura/merge.json exclusion:

{
  "ignore_patterns": {
    "src/generated/*.ts": ["^// Generated at .+$"]
  }
}

Examples

A: A partial parse with recoverable subtree

Ancestor parses clean. Ours edits login. Theirs has a typo in signup:

export function signup(
  email: string,
  password: string
): Promise<User { // <- missing )
  // ...
}

Aura's diagnostic:

src/auth.ts: recoverable parse on `theirs` at line 34 (`signup`)
  AST merge applied to 11 other declarations
  `signup` resolved via text merge
  resolution recorded: text-level merge completed cleanly

The human reader learns exactly what happened and where to look if they are suspicious.

B: Binary file conflict

$ aura merge feature/design
1 conflicts raised, 22 auto-merged
  c_6a18  assets/logo.png  type/type
    ancestor: 14,280 bytes (sha abc...)
    ours:      14,312 bytes (sha def...)
    theirs:    16,504 bytes (sha fed...)
    binary file; resolve with keep-ours / take-theirs / keep-both

keep-both writes assets/logo.png (ours) and assets/logo.theirs.png, leaving you to consolidate manually — often the right answer for asset conflicts.

C: Lockfile pragmatism

// .aura/merge.json
{
  "strategies": {
    "package-lock.json": "prefer-theirs",
    "yarn.lock":         "prefer-theirs",
    "Cargo.lock":        "prefer-theirs",
    "go.sum":            "prefer-theirs"
  },
  "post_merge": {
    "package-lock.json": "npm install --package-lock-only"
  }
}

Lockfiles take theirs, then regenerate. Aura's post_merge hooks run the regeneration and verify the result is a clean install.

D: Tuning parse timeouts

A team with a giant auto-generated Rust file sees occasional parse_timeout diagnostics. They raise the cap:

{
  "limits": { "parse_timeout_ms": 30000 }
}

Aura logs the increase in the merge audit trail so later investigators know why parsing took long.

E: Normalizing ancestor

$ aura merge feature/prettier-bump --normalize-ancestor
applying project formatter (prettier 3.1.0) to ancestor revisions of 31 files
merge proceeding with normalized ancestors
3 conflicts raised (down from 19 without normalization)

Running the formatter on ancestor before diffing collapses formatting-only changes into no-ops, drastically reducing false conflicts.

Edge Cases

Submodules. Aura does not recurse into submodules for merge; Git's submodule pointer is treated as a scalar in the parent repo. A submodule SHA change on both sides is a normal modify/modify scalar conflict.

Symlinks. Treated as text files whose content is the link target. Trivially mergeable but almost always a manual decision when they differ.

Git LFS. Pointer files are text; the blobs they point to are binary. Aura merges the pointers and defers blob reconciliation to Git LFS's own tools.

Filesystem case collisions. On case-insensitive filesystems, Auth.ts and auth.ts are the same file. Aura refuses to merge a branch that introduces a case collision and reports it as a filesystem constraint rather than a merge conflict.

Unicode normalization. Filenames are compared in NFC; content is compared byte-for-byte. A file renamed from NFC to NFD on one side and edited on the other is matched via identity, not path.

Line endings. Aura honors .gitattributes and the repo's core.autocrlf setting. Mixed CRLF/LF within a file is preserved, not normalized.

Copy detection. Aura does not detect a copy (same content at two paths). This is a deliberate choice — copies are usually refactors that should be explicit.

Very large monorepos. On repos with >100k files, initial session start-up (scanning for tracked files, computing per-file health) can take several seconds. Subsequent merges are cached. A background service mode (aura daemon) keeps caches warm for CI workers.

Concurrent merges on one working tree. Unsupported. Aura locks .aura/merge/ for the duration of a session and refuses to start a second concurrent session on the same tree. Running two merges at once is almost always user error.

Non-UTF8 source. Tree-sitter requires UTF-8. Files in Shift-JIS, GB18030, or other encodings are transcoded on read; output is always written UTF-8. Repos that need non-UTF8 round-trip must exclude those files from AST merge.