🟢 🔧 Hardware Published: · 1 min read ·

AMD: ROCm Optimization of Matrix3D for 3D Worlds Speeds Up Rendering by up to 54 Percent on Instinct GPUs

Editorial illustration: procedurally generated 3D landscape emerging from a GPU core

AMD described on the ROCm blog the optimization of the Matrix3D framework for generating explorable 3D worlds on AMD Instinct GPUs. By replacing CUDA-specific components with Triton kernels and using the gsplat library for 3DGS, rendering on the MI250 GPU was accelerated by 54 percent and on the MI300 by 50 percent, while the rendering kernel itself is 36 percent faster than the CUDA version.

🤖

This article was generated using artificial intelligence from primary sources.

AMD described on the ROCm blog the optimization of the Matrix3D framework for generating explorable 3D worlds on AMD Instinct GPUs. ROCm is AMD’s software stack for GPU computing, a direct competitor to NVIDIA’s CUDA platform, so porting AI workloads to ROCm is important for reducing single-vendor dependency.

What Changed

Engineers replaced CUDA-specific components with Triton kernels — portable low-level GPU code — and leveraged the gsplat library for 3DGS (3D Gaussian Splatting), a technique for reconstructing 3D scenes from images. This tailored the optimization to AMD hardware without relying on NVIDIA’s closed ecosystem.

Results by the Numbers

Rendering on the MI250 GPU was accelerated by 54 percent (from 2887 to 1306 seconds), and on the MI300 by 50 percent (from 972 to 482 seconds). The Triton-based rendering kernel is 36 percent faster than the CUDA version, while 3DGS fitting with gsplat is 66 percent cheaper. The technology positions itself as a foundation for spatial and embodied AI applications, where generating 3D environments is increasingly in demand.

Frequently Asked Questions

What did AMD optimize?
AMD optimized the Matrix3D framework for generating explorable 3D worlds on AMD Instinct GPUs, replacing CUDA-specific components with Triton kernels and using the gsplat library.
How much speedup was achieved?
Rendering on the MI250 GPU is 54 percent faster (from 2887 to 1306 seconds), on the MI300 50 percent faster (from 972 to 482 seconds); the rendering kernel itself is 36 percent faster than the CUDA version.
What is 3DGS fitting used for?
3DGS (3D Gaussian Splatting) reconstructs 3D scenes from images; using the gsplat library reduced its costs by 66 percent.