🟡 🤝 Agents Published: · 2 min read ·

AWS: Three architectural patterns for scalable voice agents with Amazon Nova Sonic

Editorial illustration:

AWS published a detailed guide for scalable voice agents using Amazon Nova Sonic and AgentCore Gateway. Three clear patterns — direct tools, sub-agents, and session segmentation — offer different tradeoffs between latency and complexity.

🤖

This article was generated using artificial intelligence from primary sources.

Amazon Nova Sonic is AWS’s foundation model for voice AI — it conducts real-time conversations, understands tone and context, and can invoke external tools directly during a voice session. Unlike traditional pipeline solutions, Nova Sonic processes speech end-to-end without an intermediate transcription step.

What is AgentCore Gateway and why does it matter?

AgentCore Gateway is a serverless infrastructure that exposes business logic as MCP (Model Context Protocol) tools — managed endpoints that Nova Sonic can invoke without intermediate inference layers. Voice agent in this context refers to an AI system that receives spoken input, understands intent, and executes actions in real time through a spoken response.

The key advantage of the Gateway is eliminating intermediary latency: when a user asks a question, the model directly calls the tool and returns the response in the same session, without handing off control to a separate agent orchestrator.

Which of the three patterns has the lowest latency?

AWS describes three architectural patterns, each with clear tradeoffs:

  • Direct tools — Nova Sonic calls MCP tools directly through AgentCore Gateway. The simplest architecture with low latency, but it scales poorly for multi-step workflows.
  • Sub-agents — Complex tasks are delegated to specialized agents with their own models and tools. AWS recommends using smaller models (Nova Lite or Micro) to reduce the additional latency from sub-agent inference.
  • Session segmentation — The conversation is split into phases (e.g. authentication → account management → credit inquiries), each as a separate Nova Sonic session with a focused prompt and a minimal toolset. Context from the previous phase is passed as conversation history.

Session segmentation achieves the lowest operational latency because each session carries only the tools relevant to that phase — a smaller toolset means faster selection and less overhead in the system prompt. The only cost is a brief pause when transitioning between phases.

Practical latency optimizations

AWS describes several concrete techniques: parallel invocation of independent tools (natively supported in the Strands framework), prefetching user data immediately after authentication, and using filler phrases like “Let me check…” to mask the time cost of tool calls. All three techniques apply regardless of which architectural pattern is chosen.


Source: AWS Machine Learning Blog

Frequently Asked Questions

What is Amazon Nova Sonic?
Amazon Nova Sonic is AWS's foundation model for voice AI — it handles real-time conversations, understands tone and context, and can invoke external tools directly during a voice session. Unlike traditional pipeline solutions, Nova Sonic processes speech end-to-end without an intermediate transcription step.
Which of the three patterns has the lowest latency?
Session segmentation achieves the lowest operational latency because each session carries only the tools relevant to that phase — a smaller toolset means faster selection and less system prompt overhead. The only cost is a brief pause when transitioning between phases.
What is AgentCore Gateway?
AgentCore Gateway is a serverless infrastructure that exposes business logic as MCP (Model Context Protocol) tools — managed endpoints that Nova Sonic can invoke without intermediate inference layers, eliminating intermediary latency.