Cross-Language AST

One engine, every language. How Aura stays language-agnostic without losing precision.

Overview

A merge engine that handles only one language is a toy. Real repositories are polyglot: a TypeScript frontend calls a Rust backend which deploys via YAML to a Kubernetes cluster configured by JSON, with Python scripts for data migration and Go services for edge workers. Conflict on any of those files is equally painful, and a tool that helps with one but fails on the others is not a tool — it is an inconvenience.

Aura's merge engine is built as a small language-agnostic core surrounded by per-language adapters. The core knows about nodes, identities, deltas, and conflicts. The adapters know how to parse a specific language, assign stable identities to its declarations, and pretty-print the result. Adding a language is mostly adapter work; the merge algorithm itself never changes.

This page documents the current language matrix, how adapters are structured, how health is reported, and what it takes to add a new language.

The merge algorithm is the algorithm. Languages are just a policy for what counts as a node and what counts as a name. Keep the core narrow.

How It Works

The adapter contract

Every language adapter implements a small interface:

trait LanguageAdapter {
    fn parse(&self, source: &str) -> ParseResult;
    fn identity(&self, node: &Node) -> NodeId;
    fn is_trivia(&self, node: &Node) -> bool;
    fn pretty_print(&self, tree: &Tree, style: &StyleProfile) -> String;
    fn health(&self) -> AdapterHealth;
}

parse turns source text into a tree. Backed by tree-sitter; see tree-sitter integration for the mechanics.
identity produces a stable identifier for a node. This is where each language's rules about "what makes a function that function" live — name, receiver type, module path, parameter arity.
is_trivia marks nodes that should be ignored for identity but preserved in output (comments, whitespace).
pretty_print serializes the merged tree back to idiomatic source using the project's observed style.
health reports parser version, grammar revision, known limitations, and recent parse-failure rates from the local repo.

Supported languages today

Language	Grammar source	Identity strategy	Notes
Rust	`tree-sitter-rust`	module path + item name + (for methods) receiver type	impl blocks, generic params tracked
TypeScript	`tree-sitter-typescript`	module path + declared name + (for methods) class + overload	TSX supported; decorators preserved
JavaScript	`tree-sitter-javascript`	module path + declared name	JSX supported
Python	`tree-sitter-python`	module path + qualified name + decorator set	async def distinguished from def
Go	`tree-sitter-go`	package + (for methods) receiver type + name	build tags preserved per-file
Java	`tree-sitter-java`	package + class path + method name + parameter types	overloads distinguished by signature
C	`tree-sitter-c`	file + function name	`static` functions file-scoped
C++	`tree-sitter-cpp`	namespace + qualified name + parameter types	templates tracked textually
Ruby	`tree-sitter-ruby`	module path + method name + visibility	monkey-patched methods treated as modifications
PHP	`tree-sitter-php`	namespace + declared name
C#	`tree-sitter-c-sharp`	namespace + class + method + parameter types	partial classes merged by combined identity
Swift	`tree-sitter-swift`	module + type + member + overload	property wrappers preserved
Kotlin	`tree-sitter-kotlin`	package + class + member + overload
Bash	`tree-sitter-bash`	file + function name

A longer tail (Elixir, Scala, Haskell, Zig, Lua, SQL, GraphQL) is in various states of integration; adapter health reports the current state per language.

Identity is the subtle part

Tree-sitter gives every adapter a parse tree for free. The hard problem is deciding what constitutes identity within that tree. Bad identity choices produce spurious conflicts and miss real ones.

Consider method identity in Rust:

impl Service {
    fn handle(&self, req: Req) -> Res { ... }
}
impl Admin {
    fn handle(&self, req: Req) -> Res { ... }
}

A name-only identity would fuse these. A name + receiver type identity (Service::handle vs Admin::handle) keeps them separate. Aura's Rust adapter uses the latter.

Contrast Python:

class Service:
    def handle(self, req): ...

class Admin:
    def handle(self, req): ...

Same problem; adapter uses module.Class.handle as identity. In a file with a top-level def handle, the qualified name disambiguates.

JavaScript requires more care because assignment can create methods:

Service.prototype.handle = function(req) { ... };

The TypeScript/JavaScript adapter recognizes this pattern and attributes identity to Service.prototype.handle, giving it the same identity as a method declared in a class body. This means refactoring from prototype assignment to class syntax on one side, and editing the body on the other, merges cleanly.

Overload disambiguation

Languages with overloading (Java, C#, C++, Swift, Kotlin, TypeScript with overload signatures) distinguish methods by parameter types, not just name. Aura's identity includes the parameter type list in canonical form (sorted generic params, normalized nullability) so that reformatting does not shake identity loose.

File-level identity

Files themselves have identity: their canonical path, corrected for case on case-insensitive filesystems, plus an optional stable id stored in .aura/files.json for cross-rename tracking. This lets Aura recognize src/auth.ts → src/auth/index.ts as the same file when a rename is accompanied by content edits.

Health reporting

Every adapter reports health, surfaced via aura doctor --adapters:

$ aura doctor --adapters
Language      Grammar     Parse OK    Parse Fail    Avg ms/file
rust          0.21.2      1,284       2             3.1
typescript    0.20.4      2,107       7             2.4
python        0.20.1      416         0             1.9
go            0.20.0      308         0             2.0
java          0.20.2      91          0             3.8
yaml          0.5.0       54          1             0.9
json          0.20.0      198         0             0.3
markdown      0.3.0       73          0             1.2

Parse failures above a threshold (default 1%) trigger a warning. Frequent failures are usually a sign the grammar is outdated or the file uses an experimental syntax.

Examples

A: Adding Elixir

Adding a language is a contained task — typically a day of work for a language the authors are familiar with.

Vendor the grammar. Add tree-sitter-elixir to crates/aura-parsers/grammars/. Build with the tree-sitter CLI, produce a static library.
Implement the adapter. About 200 lines: parse-tree walking for identity, a trivia classifier, a pretty-printer that defers to mix format if available else a built-in formatter.
Define identity. Elixir modules and functions: identity is Module.function/arity, matching the language's own conventions.
Register the adapter. Add to LanguageRegistry::defaults() keyed by extension (.ex, .exs) and first-line shebang patterns.
Golden tests. A directory of before/after merge scenarios becomes the regression suite.

The merge algorithm — delta computation, conflict detection, resolution — requires zero changes.

B: Cross-language move

Aura can track a declaration moving across language boundaries in rare cases where the declaration keeps its semantic shape. More commonly, cross-language renames (moving a Rust trait into a Python binding) are surfaced as separate delete/insert operations with a hint linking them.

C: Identity after refactor

Python adapter example. Ours renames a top-level function:

# ancestor
def charge(amount): ...
# ours
def charge_card(amount): ...

Theirs edits the original:

# theirs
def charge(amount):
    if amount <= 0:
        raise ValueError("invalid")
    ...

The adapter sees the ours-side change as delete(charge) + insert(charge_card). The theirs-side change is modify(charge). Aura's core detects the rename-plus-edit pattern, transposes the theirs edit onto charge_card, and produces:

def charge_card(amount):
    if amount <= 0:
        raise ValueError("invalid")
    ...

The rename-heuristic is language-agnostic — it lives in the core, not the adapter — but requires the adapter to give stable identity to the function body (AST structure + comments), so the fuzzy match has something to compare.

D: Overload handling in Java

Ancestor:

public String format(int n) { return Integer.toString(n); }

Ours adds a double overload:

public String format(int n)    { return Integer.toString(n); }
public String format(double d) { return Double.toString(d); }

Theirs adds a string overload:

public String format(int n)    { return Integer.toString(n); }
public String format(String s) { return s; }

The Java adapter keys identity by format(int), format(double), format(String). All three survive the merge because they have distinct identities. Aura assembles:

public String format(int n)    { return Integer.toString(n); }
public String format(double d) { return Double.toString(d); }
public String format(String s) { return s; }

No conflict, no human, no overload lost.

Edge Cases

Macros and code generation. Rust macros, Python decorators with metaclasses, C preprocessor directives — the parser sees the pre-expansion text, which is correct for merge purposes. Aura does not expand macros to diff their output; it merges the source.

Embedded languages. JSX, TSX, Vue SFCs, Svelte components, Rails ERB — these embed one language inside another. Aura uses tree-sitter's injection support to parse each region with the correct grammar and track identities across language boundaries within a file.

Generated files. Protobuf-generated code, GraphQL-generated types, ORM schema outputs — marked as generated by common header comments or .aurargnore-merge globs, Aura defers to prefer-theirs to always take the upstream version.

Case-sensitive identifiers on case-insensitive filesystems. A rename from auth.ts to Auth.ts requires care. Aura tracks both the filesystem path and the declared module name; on macOS with a case-insensitive filesystem, identity is the declared path, not the OS path.

Grammar divergence. Newer language versions (Python 3.12 pattern matching, TypeScript 5.x using declarations, Rust 2024 edition features) occasionally land before tree-sitter grammars catch up. Adapter health reports parse failures, and Aura falls back to text merge for the affected files while the grammar is updated.

Non-source files pretending to be source. A file ending in .ts that actually holds JSON fragments will parse fine but produce odd identity choices. Aura's adapters cross-check the first few tokens against the declared language and, on mismatch, switch to the correct merger or emit a warning.