aura prove

Mathematically verify that a behavioral goal is implemented by the current code.

Synopsis

aura prove "<goal>" [--explain] [--strict] [--format=text|json] [--at <ref>]

Description

aura prove takes a plain-English goal and answers the question "is this actually implemented?" It does so by resolving the goal into a set of required logic nodes and required edges between them, then checking the current semantic graph to see whether those nodes exist and whether they are wired together. The output is either PROVED, PARTIAL (with a list of missing or unwired pieces), or UNPROVED.

Unlike a test, prove does not execute anything. It reasons over the static graph: which functions exist, which call which, what types flow where, which error paths are handled. This is valuable because it tells you about the shape of the implementation — whether the pieces are present and connected — without requiring the code to be runnable, without test fixtures, and without side effects. A passing aura prove does not guarantee correctness; it guarantees the code's skeleton is consistent with the goal.

Goals are written in a casual imperative style: "User can authenticate via OAuth", "Webhook retries use exponential backoff", "Refunds cannot be issued after 180 days". Aura uses the description, the logic graph, and optionally an LLM resolver to map each goal into verifiable predicates.

How a goal is resolved to logic nodes

Resolution happens in three stages.

First, the goal is parsed into claims. "User can authenticate via OAuth" becomes claims like there exists an entry point accepting OAuth credentials, there exists a token verification path, the success path creates a session, the failure path returns an error.

Second, each claim is grounded to the logic graph. Aura searches for nodes whose names, signatures, type flows, and call relationships satisfy the claim. A claim about "OAuth credentials" is satisfied by a function whose parameter type mentions an OAuth token type and whose body calls into an OAuth verification module, or whose intent history mentions OAuth. If multiple candidates match, the best-scoring one is picked and shown.

Third, the wiring is checked. Each grounded node is verified to be reachable from an entry point (HTTP handler, CLI command, scheduled job, etc.), and the required edges between claims are verified to exist in the call graph. Missing edges are reported as gaps.

Goals can also be registered as named tests in aura.toml so they run as part of aura pr-review; this turns a one-off prove invocation into a permanent acceptance criterion.

Flags

Flag	Description
`"<goal>"`	Natural-language description. Quote it.
`--explain`	Show the intermediate claims, chosen groundings, and edges checked.
`--strict`	Exit non-zero on PARTIAL as well as UNPROVED.
`--format=text`	Default. Human-readable with colored markers.
`--format=json`	Structured result, suitable for CI.
`--at <ref>`	Prove the goal against a specific commit, save id, or branch tip instead of the working tree.
`--save`	Persist the goal into `aura.toml` so it is re-checked on every PR.

Examples

1. A passing proof

aura prove "User can authenticate via OAuth"

PROVED — "User can authenticate via OAuth"

Claims satisfied (4/4):
  ok  entry point accepts OAuth token
      -> fn api::auth::oauth_login  (src/api/auth.rs)
  ok  token is verified against provider
      -> fn auth::oauth::verify_token
  ok  success creates a session
      -> fn session::SessionStore::create
  ok  failure returns a typed error
      -> enum auth::Error::InvalidToken

Each claim is shown with the node that satisfied it. You can follow the call chain with aura trace or aura diff.

2. A partial proof

aura prove "Webhook retries use exponential backoff"

PARTIAL — "Webhook retries use exponential backoff"

Claims satisfied (2/3):
  ok       there is a retry loop on webhook delivery
           -> fn webhooks::dispatcher::send_with_retry
  ok       the loop increases the delay between attempts
           -> fn webhooks::dispatcher::compute_next_delay
  missing  the backoff factor is ≥ 2 (or configurable)
           heuristic: compute_next_delay multiplies by 1.5; goal implies ≥ 2
           suggestion: raise factor, or parameterize the factor and expose in config

The missing claim is explained, not just flagged. In strict mode this would exit 2.

3. Explain the reasoning

aura prove "Refunds cannot be issued after 180 days" --explain

Prints the claim tree, the candidate groundings that were considered and rejected (with scores), the chosen groundings, and the edges in the call graph that were traversed. Useful when prove gives a surprising verdict.

4. CI-friendly JSON

aura prove "Account deletion is soft-delete" --format=json --strict

{"goal":"Account deletion is soft-delete","verdict":"PROVED","claims":[{"text":"deletion sets a deleted_at timestamp","status":"ok","node":"fn Account::soft_delete"},{"text":"queries exclude soft-deleted rows by default","status":"ok","node":"fn AccountRepo::active_query"}],"elapsed_ms":312}

5. Prove against an old commit

aura prove "Refunds cannot be issued after 180 days" --at v0.11.0

Re-runs the proof against the v0.11.0 tag's semantic graph. Useful for bisecting when a behavioral invariant was lost.

Good goals vs bad goals

Good: "Every HTTP handler rate-limits per user." Concrete, checkable against the call graph.
Good: "Money values never flow from user input directly into a DB write without passing through validation." Checkable via type-flow.
Poor: "The code is fast." Not a static-graph property.
Poor: "Users are happy." Not a code property at all.

Prefer goals that name a behavior an auditor or new engineer would want to confirm by reading code. Avoid goals that require running the code to answer.

Exit Codes

Code	Meaning
`0`	PROVED. Also PARTIAL when `--strict` is off.
`2`	UNPROVED, or PARTIAL when `--strict` is on.
`3`	Goal could not be parsed or grounded at all.
`4`	`--at <ref>` could not be resolved.

Writing goals that ground well

A goal grounds well when it names observable behavior in the system's own vocabulary. Use the terms the codebase already uses — if the code calls the thing a Session, write "a Session is created on login", not "a user's auth context is stored". The grounding engine scores candidates by name similarity, so matching vocabulary matters.

A goal grounds poorly when it is written entirely in abstractions the code has no names for. "The system maintains referential integrity" is abstract; "every Order has a non-null customer_id that points to an existing Customer" is concrete. Prefer the concrete phrasing.

When a goal does not ground, --explain will show you what it was looking for and what it found. Edit the goal to match the vocabulary the explanation uses, or add the missing nodes to the code.

Caching and reproducibility

Each proof run memoizes its grounding decisions under .aura/proofs/cache/. Subsequent runs on the same graph reuse the grounding and only re-check the wiring, which is fast. When the LLM resolver is enabled, its outputs are also cached; a cache hit is deterministic, a cache miss consults the resolver. Cache entries are invalidated when the affected nodes change.

This means aura prove in CI is fast after the first run and deterministic. Two CI runs of the same commit produce identical verdicts.

Disabling the LLM resolver

If your environment prohibits LLM calls, set proofs.resolver = "structural" in aura.toml. The structural resolver uses only name matching and call-graph heuristics, which is strictly weaker but fully local. Goals that would need the LLM to ground will return UNRESOLVABLE rather than making a network call.

Saved proofs as acceptance criteria

Running aura prove "<goal>" --save registers the goal in aura.toml under [[proofs]]:

[[proofs]]
id        = "auth_oauth"
goal      = "User can authenticate via OAuth"
added_by  = "ashiq@naridon.io"
added_at  = "2026-04-10T11:00:00Z"
strict    = true

Saved proofs are re-run on every aura pr-review and on every CI invocation of aura prove --all. A previously-proved goal that becomes UNPROVED is a regression; the PR that caused it is flagged. This turns aura prove into a running contract on the codebase: once a behavioral property is proved, you cannot silently break it.

Limits of static proof

aura prove is a static analysis. It does not execute the code and therefore does not know about runtime-only properties. It cannot tell you whether a regex is correct, whether a SQL query returns what you expect, whether a distributed protocol is live, or whether a cryptographic construction is secure. It can tell you whether the call graph matches the shape a correct implementation would have.

In practice the most valuable goals are structural invariants — every HTTP handler passes through the auth middleware, no code path writes to the billing DB without acquiring a lock, all external HTTP calls go through the rate-limited client. These are properties that reviewers care about and that are easy to silently violate during refactors. Static proof catches them.

Integration with LLM agents

When an AI agent is making changes, the agent is expected to run aura prove before calling aura save. A PARTIAL or UNPROVED result on a previously-proved goal is a clear signal to the agent that the change is incomplete. This is one of the primary reasons proofs exist: agents are good at plausible-looking patches and bad at closing the loop on behavioral invariants.