Why would someone use SageMaker instead of Bedrock?

AWS lists four reasons: infrastructure control over compute instances and networking, model flexibility (custom, fine-tuned, open-source), cost predictability through reserved instances and spot pricing, and advanced MLOps capabilities such as model registry and A/B testing.

What does the MLflow integration provide?

SageMaker AI Serverless MLflow automatically captures execution traces, tool usage patterns, and decision workflows. It is activated by calling mlflow.strands.autolog() and requires no custom instrumentation. It enables tracking agent behavior, identifying bottlenecks, and maintaining audit trails.

What are the typical use cases for this architecture?

AWS highlights four: conversational AI with strict latency requirements, high-volume applications requiring predictable cost, compliance-sensitive deployments requiring data residency control, and model experimentation through A/B testing before full migration.

AWS Strands Agents + SageMaker + MLflow: guide for agent architecture

Q: What is Strands Agents?

Strands Agents is an open-source SDK that uses a 'model-driven approach to building and running AI agents in only a few lines of code'. The framework combines a language model, system prompt, and toolset to build agents capable of complex tasks.

The AWS Machine Learning Blog has published a comprehensive technical guide on building AI agents using three components: the open-source Strands Agents SDK, Amazon SageMaker AI endpoints for model hosting, and SageMaker AI Serverless MLflow for observability. The guide is relevant for organizations that want to put agents into production but cannot rely exclusively on managed services like Amazon Bedrock for operational or regulatory reasons.

What Is Strands Agents

Strands Agents is an “open source SDK that takes a model-driven approach to building and running AI agents in only a few lines of code.” Architecturally, the framework combines three components:

a language model (the LLM that reasons),
a system prompt (role and behavior definition),
a toolset (list of functions the agent can call).

This minimalist pattern resembles other agent SDKs (LangChain, AutoGen, CrewAI), but the model-driven philosophy — the agent delegates decisions to the model rather than encoding them in rigid workflows — is a distinctive characteristic.

Why SageMaker Instead of Bedrock

AWS explicitly lists four reasons why an organization might choose a self-managed SageMaker endpoint over the managed Bedrock service:

Infrastructure control — over compute instances, VPC networking, and security settings;
Model flexibility — custom models, fine-tuned variants, open-source models not necessarily in the Bedrock catalog;
Cost predictability — reserved instances and spot pricing yield a lower coefficient of variation in operational cost;
Advanced MLOps — model registry, A/B testing, version management.

The Strands SDK includes a SageMakerAIModel provider that allows agents to use endpoints running models such as Qwen3 or other open-source LLMs.

MLflow as the Observability Layer

The primary value the guide adds is integration with SageMaker AI Serverless MLflow:

automatic capture of execution traces,
tool usage patterns (which tool the agent calls when),
decision workflows (which reasoning path led to the final action).

Activation is declarative — a single mlflow.strands.autolog() call in the code — with no need for custom instrumentation. This achieves two important outcomes that regulatory frameworks increasingly require:

behavioral monitoring of agents in production,
audit trails enabling retrospective analysis of decisions.

This is particularly relevant in the context of the EU AI Act, whose Article 12 requires automatic logging and Article 72 requires post-market monitoring.

Implementation Workflow: Five Steps

AWS describes a five-step workflow:

deploy foundation models through SageMaker JumpStart,
create Strands agents using SageMakerAIModel referencing the deployed endpoint,
enable automatic logging through mlflow.strands.autolog(),
deploy multiple model variants for parallel A/B testing,
evaluate agents through the MLflow GenAI evaluation framework with custom scorers.

Step 4 (A/B testing) is especially useful in the context of production development — instead of risky “big bang” migrations to new models, organizations can gradually shift traffic to the new variant and measure differences in quality and cost.

Highlighted Use Cases

AWS recommends this architecture for four types of workloads:

conversational AI with strict latency requirements — where a shared managed service does not give sufficient control over p99 latency,
high-volume applications requiring predictable cost — where the per-token price of managed services becomes the dominant cost driver,
compliance-sensitive deployments with data residency requirements — where precise control of data location and encryption is needed,
model experimentation — systematic A/B tests before full migration.

Practical Implications

Three things worth considering for teams already using AWS:

MLflow integration is low-overhead and can be introduced into existing agent implementations without major refactoring;
SageMakerAIModel provider opens the door to domestic or regional open-source models not in the Bedrock catalog, relevant for European organizations wanting to control data residency;
A/B testing of agent variants requires more than just deployment — defined quality metrics (custom scorers) and traffic routing strategies are needed.

The guide is technically dense and aimed at practitioners already experienced with SageMaker. For teams starting with agents in the AWS environment, it defines one standard architecture — an alternative to the Bedrock managed service offering greater control but also greater operational responsibility.

AWS publishes guide for building Strands Agents with SageMaker AI models and MLflow observability: SageMakerAIModel provider, autolog tracing, and A/B variant testing