AWS: SageMaker AI gains OpenAI-compatible API endpoints
AWS announced on 20 May 2026 that Amazon SageMaker AI now exposes an OpenAI-compatible API on its real-time inference endpoints — a drop-in replacement for the OpenAI SDK, LangChain, and Strands Agents that requires only a URL change. Endpoints expose the /openai/v1 path with Chat Completions and streaming support, while authentication uses time-limited bearer tokens generated from AWS credentials.
This article was generated using artificial intelligence from primary sources.
AWS announced on 20 May 2026 that Amazon SageMaker AI now exposes an OpenAI-compatible API on its real-time inference endpoints. Applications written for the OpenAI Python SDK, LangChain, AWS Strands Agents, or any OpenAI-API-compatible library can switch to SageMaker without code changes — only by changing the base_url and API key.
What does the endpoint expose?
Endpoints expose the /openai/v1 path with support for Chat Completions and streaming. The URL format is:
https://runtime.sagemaker.REGION.amazonaws.com/endpoints/ENDPOINT_NAME/openai/v1
All standard OpenAI SDK functions (client.chat.completions.create, streaming via stream=True) work identically to the OpenAI public API. This means that the LangChain ChatOpenAI class, AWS Strands Agents, LlamaIndex OpenAI module, and any other library using the OpenAI client all receive a drop-in replacement on their own infrastructure.
How does authentication work?
Authentication uses time-limited bearer tokens generated from AWS credentials through the SageMaker Python SDK. Tokens last up to 12 hours and are generated as client-side SigV4 pre-signed URLs without an additional network call at the moment of authentication — practically eliminating overhead compared to a direct OpenAI API key.
What is the strategic implication?
This move consolidates the OpenAI API standard as the de-facto portable interface for LLM tooling. AWS thereby acknowledges that it is easier for the developer ecosystem to write for one API and run it on different backends than to enforce its own SDK. AWS did something similar with the S3 API more than a decade ago, which became the standard adopted by MinIO, Cloudflare R2, Wasabi, and others.
For enterprise customers, a serious path now opens: an application developed on OpenAI during a proof-of-concept phase can be migrated to its own SageMaker infrastructure with DeepSeek V3, Llama 4, Mistral Large, or other open-source models — without additional refactoring. Inference cost becomes predictable (per-hour GPU reservation instead of per-token API billing), and data stays within the organisation’s own AWS account.
What comes next?
Support for the OpenAI Embeddings and OpenAI Vision APIs was not explicitly announced in this release, but AWS documentation suggests it is on the roadmap. Multi-model inference components already work, so teams will be able to scale the same OpenAI-compatible logic across multiple models on the same endpoint.
Frequently Asked Questions
- How exactly does the switch from OpenAI to SageMaker work?
- The developer changes only the base_url and API key — code using the OpenAI SDK (Python, Node.js, JavaScript clients) remains identical. The URL format is https://runtime.sagemaker.REGION.amazonaws.com/endpoints/NAME/openai/v1, and the API key is a bearer token generated from AWS credentials.
- How does authentication change?
- Bearer tokens last up to 12 hours and are generated through the SageMaker Python SDK as client-side SigV4 pre-signed URLs — without an additional network call in the authentication system. This means practically zero overhead compared to a direct OpenAI API key.
- Which models and deployment options are supported?
- The API supports all existing SageMaker real-time inference endpoints — single-model and multi-model deployment through inference components. Any open-source model deployed on SageMaker (Llama, Mistral, Qwen, DeepSeek) automatically gets an OpenAI-compatible layer.
Related news
Google DeepMind: WeatherNext predicted Hurricane Melissa as Category 5 five days ahead
CNCF: Kubernetes debugger erases traces — a serious problem for security audits
AWS: Amazon Quick — document-level access control for S3 knowledge bases with deny-by-default and ALLOW/DENY rules