Cross-Language AST
One engine, every language. How Aura stays language-agnostic without losing precision.
Overview
A merge engine that handles only one language is a toy. Real repositories are polyglot: a TypeScript frontend calls a Rust backend which deploys via YAML to a Kubernetes cluster configured by JSON, with Python scripts for data migration and Go services for edge workers. Conflict on any of those files is equally painful, and a tool that helps with one but fails on the others is not a tool — it is an inconvenience.
Aura's merge engine is built as a small language-agnostic core surrounded by per-language adapters. The core knows about nodes, identities, deltas, and conflicts. The adapters know how to parse a specific language, assign stable identities to its declarations, and pretty-print the result. Adding a language is mostly adapter work; the merge algorithm itself never changes.
This page documents the current language matrix, how adapters are structured, how health is reported, and what it takes to add a new language.
The merge algorithm is the algorithm. Languages are just a policy for what counts as a node and what counts as a name. Keep the core narrow.
How It Works
The adapter contract
Every language adapter implements a small interface:
trait LanguageAdapter {
fn parse(&self, source: &str) -> ParseResult;
fn identity(&self, node: &Node) -> NodeId;
fn is_trivia(&self, node: &Node) -> bool;
fn pretty_print(&self, tree: &Tree, style: &StyleProfile) -> String;
fn health(&self) -> AdapterHealth;
}
parseturns source text into a tree. Backed by tree-sitter; see tree-sitter integration for the mechanics.identityproduces a stable identifier for a node. This is where each language's rules about "what makes a function that function" live — name, receiver type, module path, parameter arity.is_triviamarks nodes that should be ignored for identity but preserved in output (comments, whitespace).pretty_printserializes the merged tree back to idiomatic source using the project's observed style.healthreports parser version, grammar revision, known limitations, and recent parse-failure rates from the local repo.
Supported languages today
| Language | Grammar source | Identity strategy | Notes |
|-------------|--------------------------|-------------------------------------------------------------|-------|
| Rust | tree-sitter-rust | module path + item name + (for methods) receiver type | impl blocks, generic params tracked |
| TypeScript | tree-sitter-typescript | module path + declared name + (for methods) class + overload| TSX supported; decorators preserved |
| JavaScript | tree-sitter-javascript | module path + declared name | JSX supported |
| Python | tree-sitter-python | module path + qualified name + decorator set | async def distinguished from def |
| Go | tree-sitter-go | package + (for methods) receiver type + name | build tags preserved per-file |
| Java | tree-sitter-java | package + class path + method name + parameter types | overloads distinguished by signature |
| C | tree-sitter-c | file + function name | static functions file-scoped |
| C++ | tree-sitter-cpp | namespace + qualified name + parameter types | templates tracked textually |
| Ruby | tree-sitter-ruby | module path + method name + visibility | monkey-patched methods treated as modifications |
| PHP | tree-sitter-php | namespace + declared name | |
| C# | tree-sitter-c-sharp | namespace + class + method + parameter types | partial classes merged by combined identity |
| Swift | tree-sitter-swift | module + type + member + overload | property wrappers preserved |
| Kotlin | tree-sitter-kotlin | package + class + member + overload | |
| Bash | tree-sitter-bash | file + function name | |
A longer tail (Elixir, Scala, Haskell, Zig, Lua, SQL, GraphQL) is in various states of integration; adapter health reports the current state per language.
Identity is the subtle part
Tree-sitter gives every adapter a parse tree for free. The hard problem is deciding what constitutes identity within that tree. Bad identity choices produce spurious conflicts and miss real ones.
Consider method identity in Rust:
impl Service {
fn handle(&self, req: Req) -> Res { ... }
}
impl Admin {
fn handle(&self, req: Req) -> Res { ... }
}
A name-only identity would fuse these. A name + receiver type identity (Service::handle vs Admin::handle) keeps them separate. Aura's Rust adapter uses the latter.
Contrast Python:
class Service:
def handle(self, req): ...
class Admin:
def handle(self, req): ...
Same problem; adapter uses module.Class.handle as identity. In a file with a top-level def handle, the qualified name disambiguates.
JavaScript requires more care because assignment can create methods:
Service.prototype.handle = function(req) { ... };
The TypeScript/JavaScript adapter recognizes this pattern and attributes identity to Service.prototype.handle, giving it the same identity as a method declared in a class body. This means refactoring from prototype assignment to class syntax on one side, and editing the body on the other, merges cleanly.
Overload disambiguation
Languages with overloading (Java, C#, C++, Swift, Kotlin, TypeScript with overload signatures) distinguish methods by parameter types, not just name. Aura's identity includes the parameter type list in canonical form (sorted generic params, normalized nullability) so that reformatting does not shake identity loose.
File-level identity
Files themselves have identity: their canonical path, corrected for case on case-insensitive filesystems, plus an optional stable id stored in .aura/files.json for cross-rename tracking. This lets Aura recognize src/auth.ts → src/auth/index.ts as the same file when a rename is accompanied by content edits.
Health reporting
Every adapter reports health, surfaced via aura doctor --adapters:
$ aura doctor --adapters
Language Grammar Parse OK Parse Fail Avg ms/file
rust 0.21.2 1,284 2 3.1
typescript 0.20.4 2,107 7 2.4
python 0.20.1 416 0 1.9
go 0.20.0 308 0 2.0
java 0.20.2 91 0 3.8
yaml 0.5.0 54 1 0.9
json 0.20.0 198 0 0.3
markdown 0.3.0 73 0 1.2
Parse failures above a threshold (default 1%) trigger a warning. Frequent failures are usually a sign the grammar is outdated or the file uses an experimental syntax.
Examples
A: Adding Elixir
Adding a language is a contained task — typically a day of work for a language the authors are familiar with.
- Vendor the grammar. Add
tree-sitter-elixirtocrates/aura-parsers/grammars/. Build with the tree-sitter CLI, produce a static library. - Implement the adapter. About 200 lines: parse-tree walking for identity, a trivia classifier, a pretty-printer that defers to
mix formatif available else a built-in formatter. - Define identity. Elixir modules and functions: identity is
Module.function/arity, matching the language's own conventions. - Register the adapter. Add to
LanguageRegistry::defaults()keyed by extension (.ex,.exs) and first-line shebang patterns. - Golden tests. A directory of before/after merge scenarios becomes the regression suite.
The merge algorithm — delta computation, conflict detection, resolution — requires zero changes.
B: Cross-language move
Aura can track a declaration moving across language boundaries in rare cases where the declaration keeps its semantic shape. More commonly, cross-language renames (moving a Rust trait into a Python binding) are surfaced as separate delete/insert operations with a hint linking them.
C: Identity after refactor
Python adapter example. Ours renames a top-level function:
# ancestor
def charge(amount): ...
# ours
def charge_card(amount): ...
Theirs edits the original:
# theirs
def charge(amount):
if amount <= 0:
raise ValueError("invalid")
...
The adapter sees the ours-side change as delete(charge) + insert(charge_card). The theirs-side change is modify(charge). Aura's core detects the rename-plus-edit pattern, transposes the theirs edit onto charge_card, and produces:
def charge_card(amount):
if amount <= 0:
raise ValueError("invalid")
...
The rename-heuristic is language-agnostic — it lives in the core, not the adapter — but requires the adapter to give stable identity to the function body (AST structure + comments), so the fuzzy match has something to compare.
D: Overload handling in Java
Ancestor:
public String format(int n) { return Integer.toString(n); }
Ours adds a double overload:
public String format(int n) { return Integer.toString(n); }
public String format(double d) { return Double.toString(d); }
Theirs adds a string overload:
public String format(int n) { return Integer.toString(n); }
public String format(String s) { return s; }
The Java adapter keys identity by format(int), format(double), format(String). All three survive the merge because they have distinct identities. Aura assembles:
public String format(int n) { return Integer.toString(n); }
public String format(double d) { return Double.toString(d); }
public String format(String s) { return s; }
No conflict, no human, no overload lost.
Edge Cases
Macros and code generation. Rust macros, Python decorators with metaclasses, C preprocessor directives — the parser sees the pre-expansion text, which is correct for merge purposes. Aura does not expand macros to diff their output; it merges the source.
Embedded languages. JSX, TSX, Vue SFCs, Svelte components, Rails ERB — these embed one language inside another. Aura uses tree-sitter's injection support to parse each region with the correct grammar and track identities across language boundaries within a file.
Generated files. Protobuf-generated code, GraphQL-generated types, ORM schema outputs — marked as generated by common header comments or .aurargnore-merge globs, Aura defers to prefer-theirs to always take the upstream version.
Case-sensitive identifiers on case-insensitive filesystems. A rename from auth.ts to Auth.ts requires care. Aura tracks both the filesystem path and the declared module name; on macOS with a case-insensitive filesystem, identity is the declared path, not the OS path.
Grammar divergence. Newer language versions (Python 3.12 pattern matching, TypeScript 5.x using declarations, Rust 2024 edition features) occasionally land before tree-sitter grammars catch up. Adapter health reports parse failures, and Aura falls back to text merge for the affected files while the grammar is updated.
Non-source files pretending to be source. A file ending in .ts that actually holds JSON fragments will parse fine but produce odd identity choices. Aura's adapters cross-check the first few tokens against the declared language and, on mismatch, switch to the correct merger or emit a warning.