Each task has a different shape, pace, and expectation of how AI can, and should, help.
Agentic AI promises systems that decide and act. In enterprise environments, where decisions scale quickly, require accountability, and are rarely reversible, those promises come with risk. Without deliberate choices about when AI should act or step back, autonomy can be a problem; because when it shows up in the wrong place, at the wrong time, or without the visibility people need, it fails through overrides and disengagement—even when outputs are technically correct.
Successfully designing AI assistants starts with a simple yet consequential idea : T hey succeed only when trust is built into how and when autonomy appears. When we began imagining the next generation of AI assistants for Experience Cloud, we created a modality-driven framework that makes those choices explicit by matching agentic experiences to the rhythms of work while keeping autonomy clear, inspectable, and even reversible as systems scale.
The Modality Framework: Matching AI to the shape of the task
Designing agentic AI isn’t about choosing a single “right” interface; it’s about matching autonomy to task complexity. As scope and risk increase, so does the need for transparency, checkpoints, and reversibility, and for making those choices obvious as users move between tasks.
Modalities—or different ways of interacting with AI—are distinct experience needs expressed through interaction patterns that shape how visible, assistive, and proactive an AI system is. Each one encodes design decisions about leadership, control, and trust.
The Modality Framework maps AI interaction across a spectrum: Lightweight contextual surfaces (prompt bars) support quick, bounded requests. In the middle, structured canvases enable multistep, inspectable work. More complex immersive environments support open-ended strategy and sustained collaboration. To make these decisions repeatable, each one maps to three dimensions that define what users need in that interaction:
- Workflow complexity. A subject line rewrite and a multi-stage campaign plan are different kinds of work. Lightweight prompt bars handle bounded, single-step tasks, and canvas environments handle high-precision, non-linear orchestration.
- AI reasoning. Is the assistant doing a lightweight transformation or genuine synthesis? As reasoning deepens, users need greater visibility into how outputs are generated, especially in enterprise contexts where they’re accountable for the result.
- AI autonomy. Is the agent suggesting, drafting, or acting independently? As autonomy increases, so does the need for transparency, checkpoints, and reversibility.
How the framework reflects human behavior
Designing agentic AI is less about demonstrating intelligence and more about negotiating responsibility. Technology succeeds when it earns a place alongside human judgment. Researchers found that users are highly sensitive to when an agent leads, how visible its reasoning is when it does, and how easily its autonomy can be reversed. Researchers also identified several clear patterns:
Task complexity dominates everything. Users instinctively map interface structure to their work. When the UI matches the task, interaction feels natural. When it doesn’t, friction appears immediately.
Conversation supports thinking, but not always execution. Participants value conversational interaction for exploration and ambiguity, but expect structured, inspectable interfaces when committing to decisions. Treating them as interchangeable creates resistance instead of fluency.
Risk changes tolerance for autonomy. In low-risk contexts, users are comfortable with AI suggesting or acting on bounded steps. As compliance, financial, or brand risk increases, expectations shift toward transparency, previews, and the ability to override. When autonomy outpaces visibility, trust breaks—even with correct outputs.
Switching between modalities requires clear, predictable transitions. Each modality signals an interface change and a shift in responsibility. When transitions are implicit or unpredictable, users can feel disoriented and unsure about whether the system is assisting, leading, or acting independently. People respond better to changes in scope and autonomy when conversational language is supported by visible structural cues.
Sometimes agents should lead and sometimes they should step back. Getting that wrong can feel intrusive or passive. Getting it right depends on what the AI does and how its role is signaled in the moment.
The Modality Framework defines three moments of agency: quick conversational responses that support thinking, structured workspace actions that enable execution, and transparent AI decisioning that builds trust when accountability matters most.
Bringing the framework to life
The Modality Framework defines how agentic AI should behave, but a framework alone can’t sustain an experience. Without systems to enact it, intent breaks down quickly.
- Consistency suffers when agentic responses, personalities, and the forms of generated outputs vary across surfaces, and trust erodes.
- Shared language breaks down when here is no coherent model for how multiple agents relate to one another, or for what users should understand about the assistant acting on their behalf.
Addressing these gaps requires four mechanisms that translate the framework into repeatable design and product decisions.
1. Agent taxonomy sets up a shared mental model
As agentic capabilities expand across products, a fundamental design challenge emerges: How to implement multiple intelligent systems without fragmenting the user experience?
Rather than providing a distinct interface for each agent, a taxonomy establishes a shared organizational model for roles, relationships, and responsibilities across the system. A single AI assistant is the primary interaction layer, while specialized agents operate behind the scenes, interpreting context, coordinating actions, and invoking modular capabilities.
This structure allows the system to grow in capability without increasing cognitive load. Users engage with one coherent surface while multiple agents collaborate invisibly to support outcomes. Clarifying these roles creates the foundation for a scalable agent ecosystem, where complexity expands behind the scenes, not in the experience.
2. Conversational guidelines create consistency
Consistency across agentic experiences doesn’t happen by accident. It requires explicit decisions about how an AI assistant should speak, structure responses, and handle edge cases across teams and surfaces.
The goal is to define experience contracts for each modality: What users expect from AI behavior in each context and how those expectations are met. In lightweight surfaces, that means direct, confident responses without unnecessary explanation. In canvas-based workflows, it means increased transparency that shows what the system did, why it did, and where users can intervene. The guidelines focused on three interconnected aspects of interaction:
- Response content: what information belongs up front, how much context to include, and when to suggest next actions or step back.
- Conversation flow: how the assistant manages multiturn exchanges, ambiguity, errors, and handoffs between agents.
- Response presentation: Structural conventions like formatting, hierarchy, typography, and tone that signal consistency regardless of surface.
Across research and testing, one pattern was clear: Trust isn’t built in isolated responses. It’s built in sequence. When AI behavior shifts unpredictably across turns or surfaces, the experience can feel unreliable.
3. Dynamic Cards inform visual output
The visual structure of AI output matters as much as what it says. A performance summary that works for a lightweight request will fail mid‑workflow if it doesn’t perceptibly expose reasoning, provenance, and next steps.
Dynamic Cards treat AI output as flexible units (recommendations, plans, summaries, or actions) that adapt across modalities without losing coherence. Rather than fixed layouts, cards follow a shared system that adjusts hierarchy, emphasis, and level of detail based on context—so AI involvement is evident; inspectability is preserved, and visual complexity scales with the task. Designing these cards exposed an alignment problem:
- Design teams assumed adaptability meant preserving meaning across contexts.
- Engineering teams interpreted it as responsiveness to screen size.
The distinction mattered because ambiguity about how AI output should change across contexts could directly undermine trust. Resolving it required shared definitions and annotated prototypes that aligned teams around a single output model.
4. A Figma patterns library carries design logic
Together, conversational guidelines and Dynamic Cards ensure that AI behavior and output evolve in sync, preserving trust as users move between moments, surfaces, and levels of autonomy. But neither travel across teams on its own. Without a shared, accessible design system, decisions can be made in isolation, eroding consistency.
The Figma patterns library is that system. It’s a centralized, living collection of AI interaction patterns that teams can pull directly into their design files. These building blocks of agentic AI interaction—things like prompt bar designs, Dynamic Card variants, checkpoints, modality transitions—live where design decisions are made.
Each component is annotated with its underlying rationale: the framework dimension it addresses, the trust contract it supports, and the behaviors it enables. These annotations are what make the library a design system rather than a component collection. They carry the framework logic into everyday design work, without requiring every designer to internalize the full model.
Making autonomy deliberate
A Modality Framework prevents autonomy from outpacing understanding. It aligns AI behavior with task complexity, risk, and intent, and ensures users can see, question, and override decisions when it matters most. Without this kind of thinking, agentic AI erodes slowly through inconsistent behavior, hidden reasoning, and systems people learn to work around, or worse, ignore.
As AI systems grow more capable, it’s tempting to measure success by autonomy. But in enterprise contexts, autonomy creates value only when people trust it enough to let it act. Designing trust is a practical requirement for making autonomy durable at scale. Confidence is earned one interaction at a time—until assistance feels less like automation and more like collaboration.
We want to acknowledge the contributions of Shruthi Andru, Phoebe Atkins, Catherine Chiodo, Gina Ranalli, Eden Wen, and Claudia Yu, whose collaboration and partnership helped shape and advance this work.