🟡 🤖 Models Published: · 3 min read ·

Google Research Introduces TabFM: A Zero-Shot Foundation Model for Tabular Data

Editorial illustration: Google TabFM foundation model for zero-shot tabular data analysis

Google Research has published TabFM, a foundation model for tabular data that delivers zero-shot predictions in a single forward pass, without hyperparameter tuning or feature engineering. The model achieved top Elo scores on the TabArena benchmark and is available on Hugging Face and GitHub, with a planned integration into Google BigQuery.

🤖

This article was generated using artificial intelligence from primary sources.

Machine learning on tabular data has traditionally required a high degree of expertise: feature selection and engineering, hyperparameter tuning, and sometimes architecture redesign for each new dataset. On June 30, 2026, Google Research published TabFM — a foundation model that reduces that entire workflow to a single forward pass, with no per-problem modifications.

The Problem TabFM Addresses

The classic ML workflow for tabular data involves an iterative process: data exploration, feature engineering, architecture selection (gradient boosting, random forests, neural networks), and hours of hyperparameter tuning. Every new dataset requires this cycle to restart from scratch. For organizations working with dozens or hundreds of different tabular problems, that cost multiplies proportionally.

TabFM skips the entire cycle: once trained, the model delivers predictions for a new dataset without any modifications. The model receives a table as context and directly outputs a prediction based on the structure of the data in the input — framing tabular prediction as an in-context learning problem.

How Does TabFM Work?

TabFM’s architecture combines three components operating in sequence. Alternating row and column attention processes the raw tabular structure — the model simultaneously learns relationships between records and between features, capturing both horizontal and vertical dependencies in the data.

Row compression in the second stage converts information about each row into a dense representation vector. This step reduces sequence length and prepares the data for more efficient processing. Finally, a Transformer for in-context learning makes predictions based on the compressed vectors, applying the same principle that allows LLMs to generalize to tasks they have never explicitly seen.

The result is a prediction in a single forward pass. No fine-tuning, no tuning, no feature engineering — the model receives a table and returns a prediction.

Training on Synthetic Data

Google Research faced a fundamental challenge: an insufficient number of publicly available tabular datasets to train a model of adequate capacity. The solution was structural causal models (SCMs) — mathematical frameworks that generate synthetic data with realistic distributions, nonlinear relationships, and diverse dependency structures.

TabFM was trained on hundreds of millions of synthetically generated datasets. The SCM approach enables controlled diversity: the model has seen data simulating media, financial, technical, and business domains, without relying on real, potentially proprietary datasets. This also resolves an ethical issue around collecting tabular data that often contains personal or confidential information.

Results on TabArena and Availability

Evaluation used TabArena — a benchmark covering 38 classification and 13 regression datasets, with sizes ranging from 700 to 150,000 samples per dataset. TabFM-Ensemble, the version that uses cross features, SVD decomposition, and Platt scaling for output calibration, achieved top Elo scores on TabArena, outperforming standard baseline models.

TabFM is available on Hugging Face and GitHub. Google announced integration into Google BigQuery via the SQL command AI.PREDICT, which should allow analysts to make predictions on tabular data without leaving the SQL environment or writing ML code.

The researchers behind the project are Weihao Kong and Abhimanyu Das (Google Research), with collaboration from Erez Louidor Ilan, Taman Narayana, Shuxin Nie, Rajat Sen, Yichen Zhou, Joe Toth, Deqing Fu, and Samet Oymak.

Frequently Asked Questions

What is TabFM and what is it used for?
TabFM is Google's foundation model for tabular data that delivers zero-shot predictions in a single forward pass, without hyperparameter tuning or feature engineering, relying solely on context in the input.
Where is TabFM available?
The model is available on Hugging Face and GitHub, with a planned integration into Google BigQuery via the AI.PREDICT SQL command that allows analysts to make predictions without leaving the SQL interface.
How was TabFM trained?
It was trained on hundreds of millions of synthetically generated datasets that use structural causal models to simulate diverse distributions, nonlinear relationships, and varied dependency structures between features.