Host · Permissions
LLM Autonomous Review Mode
The 'llm' mode among 4 reviewer modes. The LLM assists in evaluating risk patterns that are hard to catch with simple static rules — natural-language reasons, argument context, cross-tool chains. Its evaluation is only a recommendation — the host makes the actual decision by combining it with the user's grant + RiskLevel.
src/permissions/reviewer/risk-classifier.ts
modes: disabled · rule · llm · strict
disabled
LLM review off. Only static rules (RiskLevel × Category × grant) apply.
rule
Recommendations based on static rules. No LLM call → fast.
llm
The LLM examines arguments + reason + context and issues a recommendation. Active on medium/high-risk tool calls.
strict
Forces a user dialog for every medium/high action. Minimizes automation.
When does LLM review fire?
- At tool-call time, when the reviewer classifies RiskLevel as
mediumor higher. - In a cross-plugin
callToolchain, to check that permission scope matches the manifest'spluginAccess. - For cross-plugin risky actions where
hostApi.agentApproval.requestwas called — the LLM reviews the reason + scope.
What the LLM cannot change directly
- A tool's RiskLevel — fixed as metadata. Cannot be downgraded by an LLM result.
- A tool's Category (
read | write | shell | network | meta) — fixed by the manifest'stoolSchemas.<tool>.category. - User grants — only the user can change these.
The no-fallback rule
Even if the LLM recommends allowing auto-run, a static rule that blocks it always takes priority. We never write bypass/fallback code that lets a risky action run anyway. The correct fix instead is to revise the risk metadata itself, split the tool into read/write, or route explicitly through the agentApproval flow.