# Limitations and Edge Cases

*When Aura falls back, where it is honest, and how to tune it for pathological inputs.*

## Overview

No merge engine is perfect. A tool that claims perfection is a tool that lies to you. Aura is engineered so that its failure modes are **visible, localized, and recoverable** — not silent corruption, not whole-repo panics, not mysteriously wrong output. This page documents the places where Aura falls short of AST-level merging, the heuristics it uses to decide when to degrade, and the knobs available to tune behavior for your codebase.

Read this before running Aura on a repository for the first time. Knowing the limits is knowing the tool.

> A tool is trustworthy when its failures are predictable. Predictability beats cleverness at scale.

## How It Works

### The fallback hierarchy

Aura's mergers are arranged in a hierarchy. When a higher-precision merger cannot proceed, it degrades to the next one down, always emitting a diagnostic:

```text
 1. AST merge (tree-sitter + adapter)
 2. Structural merge (JSON, YAML, .env, Markdown block-level)
 3. Line merge (standard three-way text merge)
 4. Defer (write merge markers, let the user resolve externally)
```

At every level, Aura preserves the three-way model — ancestor, ours, theirs — so resolutions downstream of a fallback still have all three versions available.

### When AST merge falls back to text

Aura picks text merge over AST merge in the following cases:

1. **No grammar for the file's language.** Aura supports a large language set (see [cross-language AST](/cross-language-ast)), but proprietary DSLs and obscure languages fall through.
2. **Parse failure on a modified subtree.** If at least one side edited a region tree-sitter cannot parse, text merge is safer than guessing.
3. **File too large.** Above the configured cap (default 32MB for source, 128MB for structured data), full parsing is disabled.
4. **Binary file.** Detected by the null-byte heuristic and extension checks. Aura treats binaries as opaque; see below.
5. **Minified / generated file.** Heuristics on line length, density, and file-name patterns (`*.min.js`, `dist/**`, `build/**`).
6. **Explicit opt-out.** Paths listed in `.aura/merge.json`'s `text_only` array.

Each fallback is recorded in the merge summary:

```bash
$ aura merge feature/payments --summary
merged: 47 files
  AST:         41
  structural:  4  (json, yaml)
  text:        2  (src/bundle.min.js, CHANGELOG.md)
  deferred:    0
diagnostics:
  src/bundle.min.js: minified heuristic triggered (avg line length 2,144)
  CHANGELOG.md:      grammar healthy, but `text_only` rule in .aura/merge.json
```

### Partial parses

Tree-sitter's error tolerance means Aura often gets a usable tree even when the input has syntax errors. Aura exploits this: it identifies **clean subtrees** (regions with no `ERROR` or `MISSING` nodes) and performs AST merge on those, falling back to text merge only on the affected regions.

Concretely, if a file has 20 top-level declarations and one of them has a syntax error, Aura does AST merge on the other 19 and text merge on the broken one. You get the benefit of semantic merge for most of the file.

### Binary files

Binaries (images, compiled artifacts, sqlite databases, anything detected by the null-byte heuristic) cannot be meaningfully three-way merged. Aura's policy:

1. If ours and theirs are byte-identical, keep either.
2. If one side is unchanged from ancestor, take the other.
3. Otherwise, raise a `type/type` conflict and record both alternatives in the merge session. Resolution verbs for binaries are `keep-ours`, `take-theirs`, or `keep-both` (writes both with disambiguating suffixes).

LFS-tracked binaries follow the same rules; Aura respects the LFS pointer format and only materializes blobs on explicit reveal.

### Huge files

Large files are a mixed bag:

- **Source code over 32MB.** Almost always generated. Text merge.
- **JSON over 128MB.** Almost always a data dump. Text merge with a warning suggesting Git LFS.
- **Lockfiles.** Can exceed 10MB; Aura's JSON merger has a fast path for these (`prefer-theirs` or `prefer-ours` per `.aura/merge.json`). Merging two three-way lockfiles key by key is technically possible but rarely what you want — regenerating the lockfile is usually the right move.
- **CSV / TSV / data files.** Aura does not ship a CSV merger by default. Text merge.

Caps are configurable:

```json
{
  "limits": {
    "ast_merge_max_bytes":        33554432,
    "structural_merge_max_bytes": 134217728,
    "parse_timeout_ms":           5000,
    "node_count_cap":             250000
  }
}
```

When a cap trips, Aura emits a diagnostic naming the specific limit so you can tune intentionally.

### Tree imbalance

Extremely deep or wide trees stress the diff algorithm. Aura caps recursion at 10k levels and sibling-width at 100k nodes per parent; beyond that, the affected subtree degrades to text merge. In practice, the only files that hit these are adversarial inputs or tools that emit pathological formatting.

### Rename detection

Aura's cross-branch rename detection is fuzzy: it uses AST similarity to match a deleted declaration on one side to an inserted declaration on the other. The threshold (default 0.75 Jaccard similarity on subtrees) is configurable. Below the threshold, a rename is seen as unrelated delete + insert, which can surface extra conflicts. Above the threshold, two genuinely different functions might be fused. The default is tuned for realistic codebases; teams with unusual styles can adjust:

```json
{
  "rename_detection": { "threshold": 0.7 }
}
```

### Formatter disagreements

Aura's pretty-printer observes the file's existing style: indent, quote preference, trailing commas, line width. When one side has run a new formatter (`prettier` major version bump, `rustfmt` with a new config) and the other has not, Aura can interpret the formatting-only hunks as real changes. The mitigation: run the formatter on ancestor before merging. `aura merge --normalize-ancestor` does this automatically using the project's detected formatter.

### Timestamps and non-deterministic builds

Some generated files embed timestamps or nondeterministic values (build IDs, generated comments with dates). Every commit changes these, which defeats merge. Aura recognizes common patterns (ISO-8601 timestamps in header comments, UUIDs in `// Generated by ...` lines) and strips them from the compare while preserving them in the output. Unusual generators need a `.aura/merge.json` exclusion:

```json
{
  "ignore_patterns": {
    "src/generated/*.ts": ["^// Generated at .+$"]
  }
}
```

## Examples

### A: A partial parse with recoverable subtree

Ancestor parses clean. Ours edits `login`. Theirs has a typo in `signup`:

```typescript
export function signup(
  email: string,
  password: string
): Promise<User { // <- missing )
  // ...
}
```

Aura's diagnostic:

```text
src/auth.ts: recoverable parse on `theirs` at line 34 (`signup`)
  AST merge applied to 11 other declarations
  `signup` resolved via text merge
  resolution recorded: text-level merge completed cleanly
```

The human reader learns exactly what happened and where to look if they are suspicious.

### B: Binary file conflict

```bash
$ aura merge feature/design
1 conflicts raised, 22 auto-merged
  c_6a18  assets/logo.png  type/type
    ancestor: 14,280 bytes (sha abc...)
    ours:      14,312 bytes (sha def...)
    theirs:    16,504 bytes (sha fed...)
    binary file; resolve with keep-ours / take-theirs / keep-both
```

`keep-both` writes `assets/logo.png` (ours) and `assets/logo.theirs.png`, leaving you to consolidate manually — often the right answer for asset conflicts.

### C: Lockfile pragmatism

```json
// .aura/merge.json
{
  "strategies": {
    "package-lock.json": "prefer-theirs",
    "yarn.lock":         "prefer-theirs",
    "Cargo.lock":        "prefer-theirs",
    "go.sum":            "prefer-theirs"
  },
  "post_merge": {
    "package-lock.json": "npm install --package-lock-only"
  }
}
```

Lockfiles take theirs, then regenerate. Aura's `post_merge` hooks run the regeneration and verify the result is a clean install.

### D: Tuning parse timeouts

A team with a giant auto-generated Rust file sees occasional `parse_timeout` diagnostics. They raise the cap:

```json
{
  "limits": { "parse_timeout_ms": 30000 }
}
```

Aura logs the increase in the merge audit trail so later investigators know why parsing took long.

### E: Normalizing ancestor

```bash
$ aura merge feature/prettier-bump --normalize-ancestor
applying project formatter (prettier 3.1.0) to ancestor revisions of 31 files
merge proceeding with normalized ancestors
3 conflicts raised (down from 19 without normalization)
```

Running the formatter on ancestor before diffing collapses formatting-only changes into no-ops, drastically reducing false conflicts.

## Edge Cases

**Submodules.** Aura does not recurse into submodules for merge; Git's submodule pointer is treated as a scalar in the parent repo. A submodule SHA change on both sides is a normal `modify/modify` scalar conflict.

**Symlinks.** Treated as text files whose content is the link target. Trivially mergeable but almost always a manual decision when they differ.

**Git LFS.** Pointer files are text; the blobs they point to are binary. Aura merges the pointers and defers blob reconciliation to Git LFS's own tools.

**Filesystem case collisions.** On case-insensitive filesystems, `Auth.ts` and `auth.ts` are the same file. Aura refuses to merge a branch that introduces a case collision and reports it as a filesystem constraint rather than a merge conflict.

**Unicode normalization.** Filenames are compared in NFC; content is compared byte-for-byte. A file renamed from NFC to NFD on one side and edited on the other is matched via identity, not path.

**Line endings.** Aura honors `.gitattributes` and the repo's `core.autocrlf` setting. Mixed CRLF/LF within a file is preserved, not normalized.

**Copy detection.** Aura does not detect a copy (same content at two paths). This is a deliberate choice — copies are usually refactors that should be explicit.

**Very large monorepos.** On repos with >100k files, initial session start-up (scanning for tracked files, computing per-file health) can take several seconds. Subsequent merges are cached. A background service mode (`aura daemon`) keeps caches warm for CI workers.

**Concurrent merges on one working tree.** Unsupported. Aura locks `.aura/merge/` for the duration of a session and refuses to start a second concurrent session on the same tree. Running two merges at once is almost always user error.

**Non-UTF8 source.** Tree-sitter requires UTF-8. Files in Shift-JIS, GB18030, or other encodings are transcoded on read; output is always written UTF-8. Repos that need non-UTF8 round-trip must exclude those files from AST merge.

## See Also

- [How AST merge works](/how-ast-merge-works)
- [Tree-sitter integration](/tree-sitter-integration)
- [Cross-language AST](/cross-language-ast)
- [JSON deep merge](/json-deep-merge)
- [YAML merge](/yaml-merge)
- [.env merge](/env-merge)
- [Merge strategies](/merge-strategies)
- [Conflict resolution](/conflict-resolution)
- [Interactive conflict picker](/interactive-conflict-picker)