Host · Permissions

LLM Autonomous Review Mode

The 'llm' mode among 4 reviewer modes. The LLM assists in evaluating risk patterns that are hard to catch with simple static rules — natural-language reasons, argument context, cross-tool chains. Its evaluation is only a recommendation — the host makes the actual decision by combining it with the user's grant + RiskLevel.

src/permissions/reviewer/risk-classifier.ts

modes: disabled · rule · llm · strict

Permission card for LLM autonomous review mode

disabled

LLM review off. Only static rules (RiskLevel × Category × grant) apply.

rule

Recommendations based on static rules. No LLM call → fast.

llm

The LLM examines arguments + reason + context and issues a recommendation. Active on medium/high-risk tool calls.

strict

Forces a user dialog for every medium/high action. Minimizes automation.

When does LLM review fire?

At tool-call time, when the reviewer classifies RiskLevel as medium or higher.
In a cross-plugin callTool chain, to check that permission scope matches the manifest's pluginAccess.
For cross-plugin risky actions where hostApi.agentApproval.request was called — the LLM reviews the reason + scope.

What the LLM cannot change directly

A tool's RiskLevel — fixed as metadata. Cannot be downgraded by an LLM result.
A tool's Category (read | write | shell | network | meta) — fixed by the manifest's toolSchemas.<tool>.category.
User grants — only the user can change these.

The no-fallback rule

Even if the LLM recommends allowing auto-run, a static rule that blocks it always takes priority. We never write bypass/fallback code that lets a risky action run anyway. The correct fix instead is to revise the risk metadata itself, split the tool into read/write, or route explicitly through the agentApproval flow.

Permissions — Directory

Permissions — Risk Management