Are GPT OSS models the same as ChatGPT or GPT-4?

No. GPT OSS are OpenAI's open-weight models intended for deployment outside OpenAI infrastructure — gpt-oss-120b and gpt-oss-20b are not identical to GPT-4 or ChatGPT models.

What does zero operator-access design in GovCloud mean?

No one — not AWS, not the customer, not the model vendor — can access the prompts or responses that pass through the Bedrock inference engine in the GovCloud region.

What API standard does the bedrock-mantle endpoint support?

Bedrock-mantle implements the OpenAI Chat Completions and Responses API, meaning code written for the OpenAI Python or TypeScript SDK works without changes, only requiring a different endpoint URL.

Nemotron & GPT OSS on AWS GovCloud: FedRAMP High Certified

AWS GovCloud (US) is receiving six new models on Amazon Bedrock: OpenAI open-weight gpt-oss-120b and gpt-oss-20b, and four NVIDIA Nemotron models with a 1M token context. The infrastructure meets FedRAMP High, DoD IL 2/4/5, ITAR, and CJIS requirements with a zero operator-access design.

Amazon Web Services has expanded the model catalog on Amazon Bedrock in the AWS GovCloud (US) region with six new models covering a range from small specialized systems to large general-purpose ones — all with the infrastructure guarantees required by the US public sector.

Which Models Are Available?

The OpenAI GPT OSS series brings two models. gpt-oss-120b is a 120-billion-parameter model intended for production, general-purpose, and demanding reasoning tasks, with a context window of 128K tokens and output of up to 16K tokens. gpt-oss-20b is a more compact variant optimized for low latency and specialized applications. It is important to note: these models are not GPT-4 or ChatGPT — they are OpenAI’s open-weight models distributed for independent deployment.

The NVIDIA Nemotron series brings four models. Nemotron 3 Super 120B is a hybrid Mixture-of-Experts architecture: 120 billion parameters in total, but with only 12 billion active per token — balancing model capacity with computational efficiency. The critical characteristic is a 1-million-token context window, which opens up possibilities for agents that process entire legal corpora, technical manuals, or long-running conversational archives without truncating context. The Nano variants (9B v2, 12B v2, 30B) cover lightweight tasks and edge deployment scenarios.

What Makes GovCloud Different

AWS GovCloud (US) is not merely a physically separate region — it is isolated infrastructure that satisfies a specific set of regulatory requirements of the US public sector.

The certification set covered by this integration includes: FedRAMP High with a Provisional Authority to Operate (P-ATO), the highest level of federal cloud authorization; DoD Cloud Security Requirements Guide Impact Levels 2, 4, and 5, covering controlled secrets and critical national security systems; ITAR (International Traffic in Arms Regulations) for the defense industry; and CJIS (Criminal Justice Information Services) for law enforcement agencies.

Zero Operator Access

The central architectural requirement for such environments is guaranteed privacy of inference requests. AWS’s zero operator-access design means the inference engine leaves no accessible trace of prompts or responses — no one, including AWS engineers, the customer, or the model vendor (OpenAI or NVIDIA), can reconstruct conversation content.

This guarantee holds regardless of the selected pricing tier or API path.

Dual API

Why Are There Two Paths?

Bedrock in GovCloud offers two ways to communicate with models, each with different trade-offs.

bedrock-mantle is an OpenAI-compatible HTTPS endpoint implementing the Chat Completions and Responses API. Code already written for the OpenAI Python or TypeScript SDK works without changes — the only modification is the endpoint URL. Tool calling and streaming are supported, meaning migrating agents built on the OpenAI ecosystem requires no architectural changes.

bedrock-runtime is the native AWS SDK path that provides access to Bedrock-specific functionality: Guardrails for content filtering and enforcing safety policies, and the Converse API that normalizes the interface across all Bedrock models through a single call regardless of vendor.

Pricing Tiers and Data Residency

Models are available in three tiers: Standard (on-demand, per-token billing), Priority (for latency-sensitive traffic), and Flex (batch processing at lower cost). The Reserved tier is not currently available in GovCloud regions.

Data residency has two options. In-Region routing keeps all requests within us-gov-west-1. Geo Cross-Region routing can direct traffic between us-gov-west-1 and us-gov-east-1 for availability purposes, but exclusively within the AWS GovCloud boundary — data never leaves the GovCloud perimeter.

Context for the Public Sector

The availability of models such as Nemotron 3 Super 120B with a million-token context window in a FedRAMP High environment is particularly relevant for scenarios like legal analysis of long documents, medical records, or extended conversational memory in security-sensitive applications — use cases where the previous context limitation was not a technical problem but a regulatory barrier due to the unavailability of certified models.

NVIDIA Nemotron and OpenAI GPT OSS Models Available in AWS GovCloud with FedRAMP High Certification