Execution-Aware Validation

Aura does not run your code. It verifies structure, not behavior. The runtime layer is your test suite's job, and we do not pretend otherwise.

"Your tool says the code is correct. My tool says the code is correct. The code crashes in production. Whose tool lied?"

The Concern

The sharpest form of the skeptic's argument: real correctness — the kind that matters in production — depends on what the program does when it runs, not what it looks like on disk. Any tool that claims to verify code without executing it is, at best, a useful approximation; at worst, a source of false confidence.

This is correct. Type systems are static; they catch a class of bugs and miss others. Linters are static; same. Code reviews are static; same. Aura's aura prove is static. None of these replace running the program.

The risk with a tool like Aura is that a developer (or an AI agent) sees aura prove pass, concludes "the goal is achieved," and ships. If aura prove is checking structural properties only, "the goal is achieved" is overstated. It should read "the structural preconditions of the goal are in place." Those are different claims.

We want to be specific about exactly what aura prove does, what it does not do, and where execution-aware validation fits in the pipeline.

How Aura Handles It

What aura prove actually does

aura prove "<goal>" verifies structural properties of the code against a goal description. Mechanically:

  1. Aura parses the goal into a logical form (this is done via a small model; for well-scoped goals, a heuristic grammar).
  2. The goal is decomposed into required "logic nodes" — named functions, classes, handlers, routes, or relationships the goal implies.
  3. Aura checks the semantic graph: does each required node exist? Are the required connections wired (does the auth middleware actually call the auth service)? Are there obvious missing links?
  4. Aura returns a structured result: nodes present, nodes missing, connections wired, connections broken.

Example:

aura prove "User can authenticate via OAuth"

Result:

Goal: User can authenticate via OAuth
  [PRESENT]   handler: auth::oauth_callback
  [PRESENT]   handler: auth::oauth_initiate
  [PRESENT]   connection: oauth_callback -> session::create_session
  [MISSING]   logic: OAuth state parameter validation
  [WIRED]     route /auth/oauth/callback -> auth::oauth_callback

This tells you the handlers exist and are routed, but there is no code that validates the OAuth state parameter — a known CSRF vulnerability if missing. That is a real bug, caught statically.

What aura prove does not do

aura prove does not:

  • Run the program.
  • Run your test suite.
  • Send an HTTP request to /auth/oauth/callback and check the response.
  • Verify that your OAuth configuration is correct (client ID, secret, redirect URL).
  • Verify that the code compiles. (Aura notices non-parseable files but does not invoke your compiler.)
  • Verify that session::create_session actually persists a session — it verifies the function exists and is called, not that it functions correctly.
  • Catch runtime bugs in the body of a function. If create_session throws on every input because of a logic bug, aura prove still says [PRESENT].
  • Catch bugs that depend on external state (database schema, API contracts, environment variables).

If aura prove passes, you have evidence that the structure of your code aligns with the stated goal. You do not have evidence that the code works. Those are different.

Where execution-aware validation lives

Execution-aware validation is the job of:

  • Your compiler. Catches type errors, syntax errors, some undefined-behavior classes.
  • Your linter. Catches stylistic bugs and some correctness patterns.
  • Your unit tests. Verify behavior of individual functions.
  • Your integration tests. Verify behavior of components together.
  • Your end-to-end tests. Verify behavior of the deployed system.
  • Your canary deployments and observability. Verify behavior in production.

Aura has opinions on none of these. We do not ship a test runner, a compiler, or a staging environment. We compose with whatever you already use.

The intended pipeline:

Developer writes code

aura save (logs intent, runs pre-commit hook)

aura prove (structural check of stated goal)

aura push (pushes to git remote)

CI: your test suite runs

CI: integration tests run

Deploy to staging

Canary / smoke tests

Production

aura prove sits early in that pipeline. It catches a class of bugs that unit tests miss (e.g., "the function you meant to wire up is not wired up"). It does not replace anything downstream.

Why this composition matters

Consider a common failure mode of AI-generated code: the agent writes a function that does the right thing in isolation, writes a route that does the right thing in isolation, but does not wire the route to the function. Your unit tests pass (each component works). Your integration tests might pass (if you happened to write a test for the specific route; if not, the missing wiring ships).

Static tools like aura prove catch this. Running tests does not, unless you have a test for every connection. Nobody has a test for every connection.

Conversely, aura prove does not catch a handler that compiles, is wired, and returns the correct shape of response but returns the wrong value. Tests catch that. Neither tool subsumes the other. Together, they are better than either alone.

aura prove in CI

The recommended pattern is to invoke aura prove in CI alongside your test suite, not in place of it. A typical .github/workflows/aura.yml:

- name: Run tests
  run: cargo test --all

- name: Verify semantic goals
  run: |
    aura prove "User can authenticate via OAuth"
    aura prove "Admin can revoke a session"
    aura prove "Rate limiter applies to login endpoint"

- name: Aura PR review
  run: aura pr-review --base main --fail-on violations

Both steps must pass for the PR to merge. See CI/CD integration.

The intent-versus-diff check

Aura adds one more layer that is useful to name in this context: the pre-commit hook that compares your stated intent against your actual diff. See intent tracking.

If you log intent "refactor retry logic to use exponential backoff" and your diff deletes three unrelated functions and adds a new database migration, Aura flags the commit. This is not "execution-aware validation" — the code may be perfectly correct — but it catches a specific failure mode where the author (human or agent) said one thing and did another.

This is cheap and catches a surprising number of real issues, particularly with AI agents that sometimes "wander" from the stated task. It is not a substitute for tests. It is a substitute for nothing; it is a check of a property no other tool checks.

What Aura Does Not Solve

Runtime correctness. Aura's static analysis is structural. A function can be wired correctly, called correctly, and still be wrong. Nothing in Aura catches that. Your tests must.

Behavioral regression. If a function used to return 5 and now returns 4, and both are valid return types, Aura does not notice. A test that asserts assert_eq(f(), 5) notices.

Concurrency bugs. Aura does not model concurrency. A race condition in your code is invisible to Aura. Runtime race detectors (ThreadSanitizer, loom) catch these. Aura does not.

Performance regressions. Aura does not run your code. A change that makes a function 10x slower is invisible to Aura. Benchmark suites catch these.

Environmental bugs. Configuration drift, missing environment variables, misconfigured secrets — all invisible to Aura. Deployment validation and runtime health checks catch these.

Contract drift with external services. If an upstream API changes its response shape, Aura does not know. Contract tests (Pact, OpenAPI validators) catch these.

The Honest Tradeoff

The temptation with a tool like Aura is to overstate what it proves. We fight this temptation in the naming (aura prove is provocative; we considered aura verify and aura check; we kept prove because the alternative was less memorable, and we compensate by being careful about what "prove" claims in docs like this one). We fight it in the output format, which is explicit about structural checks versus runtime behavior. We fight it in this page.

The tradeoff of a structural analysis tool is that it produces confident-sounding output ("GOAL PROVEN") that can be over-interpreted. We believe the value of catching structural gaps outweighs the risk of misinterpretation, as long as the docs are honest about the scope. That is what this page is for.

Used correctly, aura prove is a cheap check that runs in seconds and catches a class of bugs that are hard to test for. Used as a replacement for tests, it is dangerous. The correct mental model: aura prove is a linter for intent. Your test suite is the arbiter of behavior.

See Also