# Cross-Language AST *One engine, every language. How Aura stays language-agnostic without losing precision.* ## Overview A merge engine that handles only one language is a toy. Real repositories are polyglot: a TypeScript frontend calls a Rust backend which deploys via YAML to a Kubernetes cluster configured by JSON, with Python scripts for data migration and Go services for edge workers. Conflict on any of those files is equally painful, and a tool that helps with one but fails on the others is not a tool — it is an inconvenience. Aura's merge engine is built as a small language-agnostic core surrounded by per-language adapters. The core knows about nodes, identities, deltas, and conflicts. The adapters know how to parse a specific language, assign stable identities to its declarations, and pretty-print the result. Adding a language is mostly adapter work; the merge algorithm itself never changes. This page documents the current language matrix, how adapters are structured, how health is reported, and what it takes to add a new language. > The merge algorithm is the algorithm. Languages are just a policy for what counts as a node and what counts as a name. Keep the core narrow. ## How It Works ### The adapter contract Every language adapter implements a small interface: ```text trait LanguageAdapter { fn parse(&self, source: &str) -> ParseResult; fn identity(&self, node: &Node) -> NodeId; fn is_trivia(&self, node: &Node) -> bool; fn pretty_print(&self, tree: &Tree, style: &StyleProfile) -> String; fn health(&self) -> AdapterHealth; } ``` - **`parse`** turns source text into a tree. Backed by tree-sitter; see [tree-sitter integration](/tree-sitter-integration) for the mechanics. - **`identity`** produces a stable identifier for a node. This is where each language's rules about "what makes a function *that* function" live — name, receiver type, module path, parameter arity. - **`is_trivia`** marks nodes that should be ignored for identity but preserved in output (comments, whitespace). - **`pretty_print`** serializes the merged tree back to idiomatic source using the project's observed style. - **`health`** reports parser version, grammar revision, known limitations, and recent parse-failure rates from the local repo. ### Supported languages today | Language | Grammar source | Identity strategy | Notes | |-------------|--------------------------|-------------------------------------------------------------|-------| | Rust | `tree-sitter-rust` | module path + item name + (for methods) receiver type | impl blocks, generic params tracked | | TypeScript | `tree-sitter-typescript` | module path + declared name + (for methods) class + overload| TSX supported; decorators preserved | | JavaScript | `tree-sitter-javascript` | module path + declared name | JSX supported | | Python | `tree-sitter-python` | module path + qualified name + decorator set | async def distinguished from def | | Go | `tree-sitter-go` | package + (for methods) receiver type + name | build tags preserved per-file | | Java | `tree-sitter-java` | package + class path + method name + parameter types | overloads distinguished by signature | | C | `tree-sitter-c` | file + function name | `static` functions file-scoped | | C++ | `tree-sitter-cpp` | namespace + qualified name + parameter types | templates tracked textually | | Ruby | `tree-sitter-ruby` | module path + method name + visibility | monkey-patched methods treated as modifications | | PHP | `tree-sitter-php` | namespace + declared name | | | C# | `tree-sitter-c-sharp` | namespace + class + method + parameter types | partial classes merged by combined identity | | Swift | `tree-sitter-swift` | module + type + member + overload | property wrappers preserved | | Kotlin | `tree-sitter-kotlin` | package + class + member + overload | | | Bash | `tree-sitter-bash` | file + function name | | A longer tail (Elixir, Scala, Haskell, Zig, Lua, SQL, GraphQL) is in various states of integration; adapter health reports the current state per language. ### Identity is the subtle part Tree-sitter gives every adapter a parse tree for free. The hard problem is deciding what constitutes identity *within* that tree. Bad identity choices produce spurious conflicts and miss real ones. Consider method identity in Rust: ```rust impl Service { fn handle(&self, req: Req) -> Res { ... } } impl Admin { fn handle(&self, req: Req) -> Res { ... } } ``` A name-only identity would fuse these. A name + receiver type identity (`Service::handle` vs `Admin::handle`) keeps them separate. Aura's Rust adapter uses the latter. Contrast Python: ```python class Service: def handle(self, req): ... class Admin: def handle(self, req): ... ``` Same problem; adapter uses `module.Class.handle` as identity. In a file with a top-level `def handle`, the qualified name disambiguates. JavaScript requires more care because assignment can create methods: ```javascript Service.prototype.handle = function(req) { ... }; ``` The TypeScript/JavaScript adapter recognizes this pattern and attributes identity to `Service.prototype.handle`, giving it the same identity as a method declared in a class body. This means refactoring from prototype assignment to class syntax on one side, and editing the body on the other, merges cleanly. ### Overload disambiguation Languages with overloading (Java, C#, C++, Swift, Kotlin, TypeScript with overload signatures) distinguish methods by parameter types, not just name. Aura's identity includes the parameter type list in canonical form (sorted generic params, normalized nullability) so that reformatting does not shake identity loose. ### File-level identity Files themselves have identity: their canonical path, corrected for case on case-insensitive filesystems, plus an optional stable id stored in `.aura/files.json` for cross-rename tracking. This lets Aura recognize `src/auth.ts` → `src/auth/index.ts` as the same file when a rename is accompanied by content edits. ### Health reporting Every adapter reports health, surfaced via `aura doctor --adapters`: ```bash $ aura doctor --adapters Language Grammar Parse OK Parse Fail Avg ms/file rust 0.21.2 1,284 2 3.1 typescript 0.20.4 2,107 7 2.4 python 0.20.1 416 0 1.9 go 0.20.0 308 0 2.0 java 0.20.2 91 0 3.8 yaml 0.5.0 54 1 0.9 json 0.20.0 198 0 0.3 markdown 0.3.0 73 0 1.2 ``` Parse failures above a threshold (default 1%) trigger a warning. Frequent failures are usually a sign the grammar is outdated or the file uses an experimental syntax. ## Examples ### A: Adding Elixir Adding a language is a contained task — typically a day of work for a language the authors are familiar with. 1. **Vendor the grammar.** Add `tree-sitter-elixir` to `crates/aura-parsers/grammars/`. Build with the tree-sitter CLI, produce a static library. 2. **Implement the adapter.** About 200 lines: parse-tree walking for identity, a trivia classifier, a pretty-printer that defers to `mix format` if available else a built-in formatter. 3. **Define identity.** Elixir modules and functions: identity is `Module.function/arity`, matching the language's own conventions. 4. **Register the adapter.** Add to `LanguageRegistry::defaults()` keyed by extension (`.ex`, `.exs`) and first-line shebang patterns. 5. **Golden tests.** A directory of before/after merge scenarios becomes the regression suite. The merge algorithm — delta computation, conflict detection, resolution — requires zero changes. ### B: Cross-language move Aura can track a declaration moving across language boundaries in rare cases where the declaration keeps its semantic shape. More commonly, cross-language renames (moving a Rust trait into a Python binding) are surfaced as separate delete/insert operations with a hint linking them. ### C: Identity after refactor Python adapter example. Ours renames a top-level function: ```python # ancestor def charge(amount): ... # ours def charge_card(amount): ... ``` Theirs edits the original: ```python # theirs def charge(amount): if amount <= 0: raise ValueError("invalid") ... ``` The adapter sees the ours-side change as delete(`charge`) + insert(`charge_card`). The theirs-side change is modify(`charge`). Aura's core detects the rename-plus-edit pattern, transposes the theirs edit onto `charge_card`, and produces: ```python def charge_card(amount): if amount <= 0: raise ValueError("invalid") ... ``` The rename-heuristic is language-agnostic — it lives in the core, not the adapter — but requires the adapter to give stable identity to the function body (AST structure + comments), so the fuzzy match has something to compare. ### D: Overload handling in Java Ancestor: ```java public String format(int n) { return Integer.toString(n); } ``` Ours adds a double overload: ```java public String format(int n) { return Integer.toString(n); } public String format(double d) { return Double.toString(d); } ``` Theirs adds a string overload: ```java public String format(int n) { return Integer.toString(n); } public String format(String s) { return s; } ``` The Java adapter keys identity by `format(int)`, `format(double)`, `format(String)`. All three survive the merge because they have distinct identities. Aura assembles: ```java public String format(int n) { return Integer.toString(n); } public String format(double d) { return Double.toString(d); } public String format(String s) { return s; } ``` No conflict, no human, no overload lost. ## Edge Cases **Macros and code generation.** Rust macros, Python decorators with metaclasses, C preprocessor directives — the parser sees the pre-expansion text, which is correct for merge purposes. Aura does not expand macros to diff their output; it merges the source. **Embedded languages.** JSX, TSX, Vue SFCs, Svelte components, Rails ERB — these embed one language inside another. Aura uses tree-sitter's injection support to parse each region with the correct grammar and track identities across language boundaries within a file. **Generated files.** Protobuf-generated code, GraphQL-generated types, ORM schema outputs — marked as generated by common header comments or `.aurargnore-merge` globs, Aura defers to [`prefer-theirs`](/merge-strategies) to always take the upstream version. **Case-sensitive identifiers on case-insensitive filesystems.** A rename from `auth.ts` to `Auth.ts` requires care. Aura tracks both the filesystem path and the declared module name; on macOS with a case-insensitive filesystem, identity is the declared path, not the OS path. **Grammar divergence.** Newer language versions (Python 3.12 pattern matching, TypeScript 5.x `using` declarations, Rust 2024 edition features) occasionally land before tree-sitter grammars catch up. Adapter health reports parse failures, and Aura falls back to text merge for the affected files while the grammar is updated. **Non-source files pretending to be source.** A file ending in `.ts` that actually holds JSON fragments will parse fine but produce odd identity choices. Aura's adapters cross-check the first few tokens against the declared language and, on mismatch, switch to the correct merger or emit a warning. ## See Also - [Tree-sitter integration](/tree-sitter-integration) - [How AST merge works](/how-ast-merge-works) - [JSON deep merge](/json-deep-merge) - [YAML merge](/yaml-merge) - [Merge strategies](/merge-strategies) - [Limitations and edge cases](/limitations-and-edge-cases)