AWS: SageMaker Gets Over 100 Detailed Inference Metrics and an Insights Dashboard on CloudWatch
Amazon SageMaker is AWS's machine learning platform that now emits more than 100 detailed metrics for generative AI inference into CloudWatch, available through the new Insights dashboard. The dashboard displays token-level latency (TTFT, inter-token latency), KV cache tracking and cold start diagnostics, and is compatible with Grafana and Datadog via a PromQL endpoint.
This article was generated using artificial intelligence from primary sources.
Amazon announced that SageMaker — AWS’s managed machine learning platform — now emits more than 100 detailed metrics for generative AI inference directly into CloudWatch, AWS’s monitoring service. Along with the metrics comes a new SageMaker Insights dashboard with Performance, Capacity and Reliability tabs. The goal is to give teams the visibility that previously required custom instrumentation.
What the Dashboard Measures
Inference is the phase in which a trained model responds to queries in real time, making latency the key metric. The new dashboard tracks token-level latency — time to first token (TTFT) and inter-token latency — along with KV cache tracking (the key-value memory the model uses to accelerate generation), Honeycomb fleet visualization and cold start diagnostics. The dashboard covers both single-model and multi-model (inference component) endpoints.
How It Differs from the Previous Approach
Until now, teams had to build their own instrumentation for this level of visibility or rely on external tools. SageMaker now delivers metrics in OpenTelemetry format, so via a PromQL-compatible endpoint it connects with Grafana and Datadog using pre-built Grafana templates. Billing follows the standard CloudWatch OpenTelemetry price of $0.50 per gigabyte.
What This Means in Practice
More detailed metrics help engineers diagnose bottlenecks such as slow cold starts or an overloaded KV cache without manual instrumentation. For organizations serving large language models in production, measuring TTFT and inter-token latency directly affects user experience and cost, so a standardized dashboard reduces time to fault diagnosis.
Frequently Asked Questions
- How many metrics does SageMaker now emit?
- SageMaker emits more than 100 detailed inference metrics to CloudWatch, including token-level latency, KV cache tracking and cold start diagnostics.
- Which tools does the Insights dashboard support?
- The dashboard works directly in CloudWatch, and via a PromQL-compatible endpoint it integrates with Grafana and Datadog with ready-made templates.
- How much does it cost?
- It is billed at the standard CloudWatch OpenTelemetry price of $0.50 per gigabyte.
Related news
Anthropic: Claude Code v2.1.183 Blocks Destructive Git and Infrastructure Commands in Auto Mode
GitHub: Copilot Retires Opus 4.6 (fast) on June 29, Adds AGENTS.md to Code Review and ai_credits_used Field to API
arXiv:2606.19245: TxBench-PP — AI Agents in the Search for New Drugs