13 Boundaries Every AI Agent Needs (And Most Don't Have)

Most production AI agents operate with implicit boundaries — which means no real boundaries. The agent is given a task and broad latitude, and the team trusts that the underlying model has enough judgment to stay within reasonable limits. The trust is often misplaced. Agents will operate outside any limit that hasn't been explicitly enforced, sometimes catastrophically.

The thirteen boundaries below are the most important ones to enforce explicitly. Each one defines a limit on what the agent can do, when, and under what conditions. Without these boundaries, the agent's behavior is whatever the underlying model produces — which varies, drifts, and occasionally surprises. With them, the agent's behavior is bounded in ways that are auditable and recoverable.

1. Authorized action scope

A specific enumeration of actions the agent can take. Not "the agent helps with customer support" but "the agent can read tickets, draft responses, propose status changes, but cannot send messages, modify tickets, or take account actions." Each authorized action is explicit; everything not on the list is implicitly prohibited. This is the foundational boundary; the others build on it.

2. Data access scope

What data the agent can read and what it cannot. Minimum necessary access, enforced at the infrastructure level (not the prompt level). The agent shouldn't have access to data it doesn't need for its authorized actions, because access creates exposure even when not used. This is especially important for sensitive data — personal information, financial records, internal communications — where the cost of inappropriate access is high.

3. Cost ceiling per interaction

The maximum cost the agent can incur in a single interaction. Without this, misbehaving agents can spend thousands of dollars on a single user request by looping through expensive operations. The ceiling needs to be enforced at the infrastructure level — when the cost is approached, the agent halts and escalates rather than continuing.

4. Time bounds for autonomous operation

How long the agent can run without human check-in. For long-running agents (research agents, monitoring agents), this matters: the agent should not be operating for days without periodic human verification. Time bounds prevent the failure mode where the agent drifts over time without anyone noticing.

5. Decision-class boundaries

Specific categories of decisions the agent cannot make autonomously. Examples: decisions that involve money over a threshold, decisions that affect more than N users, decisions that bind future commitments, decisions that involve regulatory or legal questions. These need human review regardless of how confident the agent is.

6. Topic scope

What topics the agent is authorized to engage with. A customer support agent for a software product should not be answering questions about politics, medical advice, or general philosophy. Topic scope prevents the agent from operating in domains where it's likely to be wrong and where the company has no business providing answers.

7. Tone and content boundaries

How the agent communicates. The voice, the appropriate level of formality, prohibited content categories (no humor about sensitive topics, no opinions on contested issues, no committal language on uncertain matters). These boundaries protect both the user and the brand from agent outputs that misrepresent the company's positions.

Put a context layer under your distributed team.

StandIn gives engineers a 60-second wrap at the end of every shift. The next shift wakes up knowing exactly what to pick up — no standup required.

Request early access

8. Identity disclosure requirements

The agent must disclose that it's an AI when asked, in specific situations, or proactively in certain contexts. Without this boundary, the agent will sometimes claim or imply it's human, which produces the worst kind of public incident when discovered.

9. Escalation triggers

Specific conditions under which the agent must escalate to a human rather than continuing. Examples: user expresses distress, user requests human, agent detects ambiguity it can't resolve, situation matches a defined edge case. The triggers are explicit and tested; without them, the agent will sometimes continue when it shouldn't.

10. Memory boundaries

What the agent remembers across sessions and what it forgets. Many agents retain too much for too long (privacy implications) or too little for too short (continuity implications). The right answer depends on the use case, but the answer needs to be explicit rather than default-to-vendor-behavior.

11. Update and version boundaries

Which version of the underlying model the agent uses and when it gets updated. Agents whose models silently update can produce surprising behavior changes for users. The update cadence should be controlled, communicated, and gated by validation that the new model behaves consistently with the deployment requirements.

12. Integration boundaries

Which other systems the agent can call. An agent that can call other agents or external APIs has a much larger blast radius than one that operates in isolation. The integration scope should be explicit, justified, and reviewed periodically. Unauthorized integrations are a common source of agent scope creep.

13. Outcome boundaries

Specific outcomes the agent is not authorized to produce, even if the path to them looks valid. Examples: outcomes that affect protected classes differentially, outcomes that contradict company policy, outcomes that create irreversible state changes. The boundary is at the result level, not the path level, because the agent might reach a prohibited outcome through paths the developers didn't anticipate.

The cumulative effect

Most agents in production today have one or two of these boundaries explicitly enforced and the rest implicit. The implicit boundaries work most of the time and fail unpredictably — which is the worst combination. Agents with all thirteen explicit boundaries have predictable behavior. The bounded behavior may be less impressive than the unconstrained behavior, but it's reliable, and reliability is what produces deployments that work over months and years.

Building the boundaries is not exotic engineering. It's discipline: enumerate, document, enforce, test. The companies that do this consistently have the deployments that work; the companies that skip it have the deployments that produce the headlines about AI agents behaving badly.

Frequently asked questions

Do all thirteen boundaries apply to every agent?

The full list is the baseline for production deployments in real environments. Lower-stakes deployments (internal tools with low blast radius) can simplify. Higher-stakes deployments (customer-facing, financial, healthcare) may need additional boundaries on top of these. The thirteen is a starting framework; the specific list for a deployment depends on the risk profile.

How do you enforce boundaries that the model could theoretically circumvent?

Enforce them in infrastructure, not in prompts. Boundaries enforced only by prompt are bypassable; boundaries enforced by the surrounding system (API gateways, permission systems, cost circuits, output filters) are not. The model's role is to behave within the boundaries; the infrastructure's role is to make sure violations don't reach production even when the model produces them.

Doesn't this much structure make the agent less useful?

Initially, somewhat. A well-bounded agent does less than an unbounded one. But the unbounded agent's "more" includes the failure modes that produce incidents, lawsuits, and shutdowns. Bounded agents are sustainable; unbounded agents are exciting briefly and then unusable. The most valuable agents over time are the carefully bounded ones, not the ones with the broadest theoretical capabilities.