🟡 🏥 In Practice Published: · 2 min read ·

AWS: SageMaker Gets Over 100 Detailed Inference Metrics and an Insights Dashboard on CloudWatch

Editorial illustration: monitoring dashboard with latency and token throughput graphs

Amazon SageMaker is AWS's machine learning platform that now emits more than 100 detailed metrics for generative AI inference into CloudWatch, available through the new Insights dashboard. The dashboard displays token-level latency (TTFT, inter-token latency), KV cache tracking and cold start diagnostics, and is compatible with Grafana and Datadog via a PromQL endpoint.

🤖

This article was generated using artificial intelligence from primary sources.

Amazon announced that SageMaker — AWS’s managed machine learning platform — now emits more than 100 detailed metrics for generative AI inference directly into CloudWatch, AWS’s monitoring service. Along with the metrics comes a new SageMaker Insights dashboard with Performance, Capacity and Reliability tabs. The goal is to give teams the visibility that previously required custom instrumentation.

What the Dashboard Measures

Inference is the phase in which a trained model responds to queries in real time, making latency the key metric. The new dashboard tracks token-level latency — time to first token (TTFT) and inter-token latency — along with KV cache tracking (the key-value memory the model uses to accelerate generation), Honeycomb fleet visualization and cold start diagnostics. The dashboard covers both single-model and multi-model (inference component) endpoints.

How It Differs from the Previous Approach

Until now, teams had to build their own instrumentation for this level of visibility or rely on external tools. SageMaker now delivers metrics in OpenTelemetry format, so via a PromQL-compatible endpoint it connects with Grafana and Datadog using pre-built Grafana templates. Billing follows the standard CloudWatch OpenTelemetry price of $0.50 per gigabyte.

What This Means in Practice

More detailed metrics help engineers diagnose bottlenecks such as slow cold starts or an overloaded KV cache without manual instrumentation. For organizations serving large language models in production, measuring TTFT and inter-token latency directly affects user experience and cost, so a standardized dashboard reduces time to fault diagnosis.

Frequently Asked Questions

How many metrics does SageMaker now emit?
SageMaker emits more than 100 detailed inference metrics to CloudWatch, including token-level latency, KV cache tracking and cold start diagnostics.
Which tools does the Insights dashboard support?
The dashboard works directly in CloudWatch, and via a PromQL-compatible endpoint it integrates with Grafana and Datadog with ready-made templates.
How much does it cost?
It is billed at the standard CloudWatch OpenTelemetry price of $0.50 per gigabyte.