Why doesn't the test name in CI match the name I see in the source code?

PyTorch generates concrete tests at import time using the instantiate_device_type_tests() function. A template class TestMatmul with a method test_basic is expanded at runtime into classes TestMatmulCPU, TestMatmulCUDA, TestMatmulMPS with methods like test_basic_cuda_float32 — meaning the original template in the code does not exist as an executable class.

What are OpInfos and what do they do?

OpInfos are metadata entries that centrally describe how each individual operator should be tested: supported dtypes, input samples, tolerances, and skips. A generic test template decorated with @ops automatically receives all those combinations and runs a test for each — without duplicating code per operator.

How do I reproduce a CI failure locally?

From the Dr. CI comment, extract the shard name, find the generated test name (e.g., TestMatmulCUDA.test_basic_cuda_float32), and run: pytest test/test_torch.py -k 'test_basic_cuda_float32' -x. Targeting the template class without the device suffix will find nothing.

PyTorch Test Infrastructure: How Thousands of Tests Are Generated

A PyTorch blog post by Riya Punia (Red Hat) explains why CI failures carry strange names like TestMatmulCUDA.test_basic_cuda_float32 instead of the original TestMatmul.test_basic — tests are generated at import time through combinations of devices and dtypes.

When PyTorch CI reports a failure with a name like TestMatmulCUDA.test_basic_cuda_float32, the developer who looks at the source code often cannot find that class. That is not a bug — it is intentional. Riya Punia from Red Hat published a detailed guide on the PyTorch blog on July 3, 2026, uncovering the mechanisms behind this system.

Generating Tests at Import Time

The core of PyTorch’s test infrastructure is the function instantiate_device_type_tests() defined in torch/testing/_internal/common_device_type.py. This function takes a template class — which is not executable on its own — and at runtime creates concrete classes for each supported device.

A template class TestMatmul with a single method test_basic becomes three concrete classes: TestMatmulCPU, TestMatmulCUDA, and TestMatmulMPS. Each method receives suffixes for all relevant dtype combinations: test_basic_cuda_float32, test_basic_cuda_float16, test_basic_cuda_bfloat16, and so on. Supported devices include CPU, CUDA, MPS (Apple Silicon), and XPU (Intel), while dtypes cover float16, float32, float64, and bfloat16.

The naming convention is consistent: <ClassName><DEVICE>.<method>_<device>_<dtype>. Understanding this convention is essential for any CI failure debugging.

A Common Pitfall: Targeting the Template Class in the pytest Filter

One of the most common mistakes Punia highlights is using the original template name in the pytest -k filter. The command pytest test/test_torch.py -k "TestMatmul" will find nothing because the class TestMatmul does not exist at runtime as an executable — only TestMatmulCPU, TestMatmulCUDA, and the other generated variants exist. The correct target is pytest test/test_torch.py -k "test_basic_cuda_float32" -x.

OpInfos: The Metadata Layer for Operators

The second pillar of the test infrastructure is OpInfos — centralized entries that describe how each PyTorch operator should be tested. They are defined in torch/testing/_internal/opinfo/core.py, with the global operator registry in torch/testing/_internal/common_methods_invocations.py.

Each OpInfo entry contains: the operator name, calling variants, supported dtypes, sample input generators, numerical tolerances, and skips for specific combinations. A generic test template annotated with the @ops decorator automatically iterates through all registered operators and runs a test for every combination — without a single line of duplicated code.

The result is that a small number of template tests covers thousands of concrete test cases. Punia presents this as the fundamental mechanism that makes PyTorch’s test suite maintainable when it spans a vast number of operators and hardware targets.

Key File Structure

Punia identifies five key files that form the infrastructural foundation:

torch/testing/_internal/common_utils.py — shared test utilities, the base TestCase class, run_tests()
torch/testing/_internal/common_device_type.py — instantiate_device_type_tests(), decorators @dtypes, @ops
torch/testing/_internal/opinfo/core.py — OpInfo entry definitions and metadata
torch/testing/_internal/common_methods_invocations.py — op_db registry of all operators
test/run_test.py — CI-style runner with sharding support

How to Debug a CI Failure?

PyTorch CI uses Dr. CI — a tool that automatically comments on pull requests with grouped failures and information about the shard where the failure was recorded. The debugging workflow Punia recommends: (1) extract the shard information from the Dr. CI comment, (2) find the logs for the specific CI job on hud.pytorch.org, (3) extract the generated test name, (4) reproduce locally with the exact generated name.

For controlling test execution, environment variables are also available: PYTORCH_TESTING_DEVICE_ONLY_FOR limits tests to a selected device, PYTORCH_TEST_WITH_SLOW enables slow tests, and PYTORCH_TEST_WITH_DYNAMO covers tests under torch.compile. There is also EXPECTTEST_ACCEPT, which automatically updates recorded outputs (snapshots) for tests that use them.

Punia also warns about a second common mistake: using torch.randn() inside dtype-generic tests. torch.randn() always produces a float32 tensor — in a test running for bfloat16 this is incorrect. The recommended replacement call is make_tensor(), which respects the dtype argument received from the generated test.

Practical Takeaway

The guide is aimed at everyone contributing to PyTorch or debugging unusual CI failures, but its value extends further: dynamic test generation at import time is a pattern that appears in larger Python projects wherever combinatorial spaces need to be covered without a code explosion. Understanding the difference between a template class and its generated classes is the first step toward productive work with such systems. For PyTorch specifically, that insight shortens the path from “CI failed with an unknown name” to “I have a local reproduction” to a single search on hud.pytorch.org.

How PyTorch Generates Thousands of Tests from a Few Template Classes — A Guide to the Test Infrastructure

Generating Tests at Import Time

A Common Pitfall: Targeting the Template Class in the pytest Filter

OpInfos: The Metadata Layer for Operators

Key File Structure

How to Debug a CI Failure?

Practical Takeaway

Frequently Asked Questions

Sources

Related news