# Tree-sitter Integration *Why Aura embeds tree-sitter, how grammars are loaded, and how we handle parse failures gracefully.* ## Overview Tree-sitter is a parser generator that produces incremental, error-tolerant parsers for every language worth supporting. Its grammars are maintained as open-source packages, battle-tested in editors (Neovim, Helix, Atom, GitHub's code navigation), and fast enough to re-parse on every keystroke. When Aura needs to turn a source file into a syntax tree, tree-sitter is the default answer. This page is for engineers who want to know what "backed by tree-sitter" actually means in Aura's codebase: how grammars are vendored and loaded, how parsing failures are handled, what the accuracy guarantees are, and where tree-sitter ends and Aura's own logic begins. > A parser you can trust is a parser that fails loudly. Tree-sitter's error-tolerant design lets Aura degrade instead of lying. ## How It Works ### Why tree-sitter and not hand-rolled parsers Three concrete reasons: 1. **Breadth.** Every language Aura supports today has a maintained tree-sitter grammar. Building fifteen hand-rolled parsers would be a full engineering team for a year. 2. **Error tolerance.** Tree-sitter parsers produce a best-effort tree even on syntactically broken input. The tree flags `ERROR` and `MISSING` nodes but keeps going. This is exactly what you want for merge — the input is often mid-edit, and a parser that aborts on the first syntax error is useless. 3. **Incremental parsing.** Re-parsing a 10k-line file after a one-line edit is sub-millisecond. Aura leans on this during live sync to push merge deltas as a user types. The tradeoffs are real but manageable: tree-sitter grammars encode concrete syntax, not abstract semantics. They do not do name resolution, type inference, or macro expansion. Aura's adapters (see [cross-language AST](/cross-language-ast)) layer those on where needed. ### Grammar loading Grammars ship with Aura as static libraries, not runtime-loaded dynamic libraries. Each grammar is vendored as a pinned version, built during `aura build`, and linked into the binary. The manifest lives in `crates/aura-parsers/Cargo.toml` and looks roughly like this: ```toml [dependencies] tree-sitter = "0.22" tree-sitter-rust = "0.21.2" tree-sitter-typescript = "0.20.4" tree-sitter-python = "0.20.4" tree-sitter-go = "0.20.0" tree-sitter-java = "0.20.2" # ... ``` At runtime, a `LanguageRegistry` holds one `Language` handle per supported grammar, keyed by file extension and first-line patterns (shebangs, doctype declarations). Resolving the grammar for a file path is a HashMap lookup. No network calls, no download-on-demand, no shell-out to the tree-sitter CLI. Grammars are part of the binary. This matters for reproducibility (the same Aura version produces the same parses on every machine) and for air-gapped deployments (common for enterprise installs). ### The parse pipeline For a given file: ```text source_bytes -> tree_sitter::Parser::parse(source, None) -> tree_sitter::Tree -> AuraAdapter::walk(tree) -> aura::AstNode[] -> AuraAdapter::identify(nodes) -> (id, node)[] -> ready for diff ``` Incremental re-parse for a subsequent edit: ```text prev_tree + edit_range -> tree_sitter::Parser::parse(new_source, Some(prev_tree)) -> tree_sitter::Tree -> only changed subtrees are re-walked ``` Aura caches parse trees per file for the duration of a merge session. Ancestor, ours, and theirs are each parsed once and reused across every diff, resolution preview, and validation check. ### Handling parse errors A tree-sitter tree can include error nodes. Aura classifies a parse result as: | Status | Criteria | Action | |---------------|----------------------------------------------------------|--------| | `clean` | Zero error / missing nodes | Full AST merge. | | `recoverable` | Errors localized to subtrees that neither side edited | Full AST merge on unaffected regions; text merge on affected subtrees. | | `failed` | Errors in subtrees that at least one side edited | Fall back to text merge for the file; emit a diagnostic. | This grading is per-file and per-side. A file that parses clean on ancestor and ours but fails on theirs is handled as a text merge with a `parse/parse` conflict reason (see [conflict resolution](/conflict-resolution)). ### Accuracy guarantees Tree-sitter grammars are expected to be correct on well-formed input. Aura backstops this with: - **Round-trip tests.** For every supported language, a corpus of real open-source files is parsed, pretty-printed, re-parsed, and compared. Structural equality is required; whitespace differences are allowed. - **Golden merge tests.** A directory of before/after merge scenarios covers common patterns per language. Any grammar update must pass the full golden suite before merging. - **Runtime self-check.** On first use per session, Aura re-parses its own pretty-printed output of each file it touches; if structures diverge, it falls back to text merge for safety. Coverage is not 100%. Extremely new syntax (often at the edge of stable language versions) occasionally parses into broader node kinds than the adapter expects, producing a parse failure and a text-merge fallback. Adapter health (`aura doctor --adapters`) surfaces the current failure rate. ### Injections: languages inside languages Tree-sitter supports **injections** — parsing one language inside a region of another. Aura uses this for: - JSX and TSX (JavaScript/TypeScript with embedded JSX). - Vue single-file components (`