🔴 🔧 Hardware Published: · 3 min read ·

OpenAI: Jalapeño — custom ASIC chip for LLM inference reducing NVIDIA dependence

Editorial illustration: futuristic AI inference chip labeled Jalapeno on a circuit board, green accent lighting, no faces or text

OpenAI and Broadcom jointly announced Jalapeño, a custom ASIC chip optimized for LLM inference. A strategic move by which OpenAI enters the custom silicon category — on equal footing with Google TPU, Apple Neural Engine, and AWS Trainium — and reduces its dependence on NVIDIA GPUs.

🤖

This article was generated using artificial intelligence from primary sources.

OpenAI and Broadcom announced on June 24, 2026 the Jalapeño — a custom ASIC chip (Application-Specific Integrated Circuit, an integrated circuit intended exclusively for one type of task) optimized for LLM inference, i.e. for running language models in production rather than training them. The announcement marks a turning point: OpenAI stops being exclusively a buyer of third-party hardware and begins building its own silicon stack.

Why is Jalapeño a strategic shift for OpenAI?

Until now, OpenAI based its infrastructure almost exclusively on NVIDIA GPUs — expensive, globally in demand, and controlled by a single supplier whose deliveries have lagged in recent years as demand exceeded capacity. Jalapeño places OpenAI alongside Google (TPU — Tensor Processing Unit), Amazon (AWS Trainium), and Apple (Neural Engine) as companies that have taken control of their own silicon stack. Each of these chips is designed for a specific AI workload and achieves a better performance-per-watt ratio than general-purpose GPUs for that narrow task. For comparison: Google runs Gemini models on TPUs at a cost per token that NVIDIA H100 clusters struggle to match.

Performance, efficiency, and Broadcom’s expertise

The project has a threefold goal: higher performance for inference tasks, greater energy efficiency, and easier infrastructure scalability. Broadcom brings years of experience in custom silicon design and supply-chain capacity — the same company is involved in developing Google’s TPUs and Meta’s MTIA chips — pointing to a high level of integration with existing datacenter infrastructure.

Detailed technical specifications — transistor count, memory bandwidth, supported numerical precisions (FP8, BF16, INT8) — were not disclosed in the initial announcement, which is the standard approach when revealing a chip ahead of production deployment. Full architecture details and benchmark results are expected in later announcements.

The race for custom silicon — Google, Amazon, Microsoft, Tesla

The race for custom AI silicon is intensifying. Google achieved a cost-per-token advantage for Gemini with TPU v5e and v5p. AWS Trainium 2 covers Anthropic’s training and inference needs. Microsoft relies on Maia 100 for Azure AI workloads. Tesla uses Dojo for autonomous driving. Jalapeño gives OpenAI an analogous lever — the ability to optimize the entire stack from model architecture to the silicon it runs on, without dependence on NVIDIA’s roadmap and pricing policy.

What are the implications for the industry?

If OpenAI successfully deploys Jalapeño at mass production scale, inference costs could decrease significantly — which would be reflected in ChatGPT API pricing and capacity for future, larger models. The move also intensifies pressure on NVIDIA: while AI chip revenue exceeded $100 billion in fiscal year 2025, an increasing number of major customers are developing alternatives. Jalapeño is for now only an announcement — but its strategic weight exceeds the technical details the announcement has yet to reveal.

Frequently Asked Questions

What is the Jalapeño chip and how does it differ from an NVIDIA GPU?
Jalapeño is a custom ASIC — an integrated circuit designed exclusively for LLM inference, unlike NVIDIA GPUs which are general-purpose accelerators for a wide range of tasks. The specialized architecture enables better energy efficiency and higher performance per watt for the specific workload of language models.
Why is OpenAI developing its own chip instead of continuing to buy NVIDIA hardware?
Dependence on a single supplier brings cost and supply constraints — especially when GPUs are in demand across the industry. With custom silicon, OpenAI takes control of its infrastructure, reduces cost per token, and can optimize hardware specifically for its own models.