Kumo's new foundation model replaces months of data science engineering with plain-English queries - The New Stack

Kumo’s new foundation model replaces months of data science engineering with plain-English queries

KumoRFM-2 outperforms supervised ML on enterprise relational data, requiring zero training.

Apr 14th, 2026 12:01pm by Adrian Bridgwater

Featued image for: Kumo’s new foundation model replaces months of data science engineering with plain-English queries

Hartono Creative Studio

Large language models are very good at predicting the next word in a sequence. That’s because they’ve been trained on massive amounts of unstructured text — books, web pages, code. But enterprise data looks different. It lives in relational databases: rows and columns linking customers to orders to products to transactions. LLMs can’t natively reason over those relationships.

That’s the gap relational foundation models are looking to fill. Where an LLM ingests a corpus of text and learns statistical patterns across tokens, a relational foundation model ingests structured, tabular data and learns the patterns that connect entities across tables.

On Tuesday, predictive AI specialist Kumo detailed an advancement in this space. The company announced KumoRFM-2, a foundation model designed to outperform fully supervised machine learning on enterprise relational data.

Built by a team that includes the co-founder of PyTorch Geometric (a library for graph machine learning), KumoRFM-2 claims to replace the need for months of feature engineering and dedicated model builds.

KumoRFM-2 is a single model that can be queried in plain English with zero training, and it scales to over 500 billion rows of data.

Vanja Josifovski, CEO and co-founder of Kumo, breaks this technology down to the most basic terms. He says that a custom-trained model for every predictive task can now be generated by anyone in an organization.

“KumoRFM-2 is a purpose-built foundation model for relational data, designed from the ground up with its own architecture,” Josifovski tells The New Stack. “Engineers connect to their existing data warehouse, whether that’s Snowflake, Databricks, or any SQL database, write a predictive query, and receive results. There is no ETL requirement, no feature store, no model training.”

Benchmark barometer

On the Stanford RelBench v1 deep learning benchmark, KumoRFM-2 outperforms its predecessor by 10% and outperforms the strongest supervised machine learning model by 5% across both classification and regression tasks.

On the SAP SALT (Sales Autocompletion Linked Business Tables) enterprise benchmark, KumoRFM-2 beats tabular model ensembles such as AutoGluon (an open source ML framework from AWS) as well as other comparable foundation models. The team says fine-tuning could improve the model’s performance by another 13%.

As a native relational foundation model, KumoRFM-2 is purpose-built to reason over databases and data warehouses. Jure Leskovec, chief scientist and co-founder of Kumo, says that this represents real progress. Traditional approaches, after all, force data science teams to flatten multi-table data into a single table before any model can touch it.

“KumoRFM-2 works directly on the graph of connected tables, preserving every foreign-key relationship,” Leskovec tells The New Stack. “It sees the full relational structure in a single forward pass, with no task-specific training, and surpasses supervised baselines that were trained end-to-end on the same data.”

Leskovec, who is also a professor at Stanford, clarifies the structures at play here further and says that the team has also introduced hierarchical in-context learning, where the model extracts task-aware features at both the individual table level and across tables simultaneously. The result is a foundation model that a developer can point at a data warehouse and get predictions immediately, without building a single feature pipeline.

Alternative foundation model technologies

As competent as KumoRFM-2 is claimed to be, it is not the only foundation model technology on the market. Engineers working in this arena will likely be aware of SAP-RPT-1 (Relational Pretrained Transformer), a foundation model designed to work on structured, tabular business data without any heavy payload requirement for customized machine learning training.

MotherNet is a hypernetwork foundational model (or conditional neural process) for tabular data classification that creates a small neural network. In the paper announcing this model, the team notes that MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset-specific gradient descent.

Other contenders in this arena include TabICL, a fully open source academic project designed to provide a tabular foundation model that doesn’t require hyperparameter tuning and that can scale to datasets with more than a million rows via in-context learning.

Amazon AWS’s Mitra, too, is a tabular foundation model. Amazon Science describes Mitra as being “pretrained on synthetic datasets” generated by a carefully designed mixture of prior distributions (priors).

Predictive signals live in relationships

Kumo’s Josifovski underlines his company’s position in this market by reminding us that the most valuable predictive signals live in the relationships that exist across multiple tables in a data warehouse, but traditional approaches have flattened multi-table data into a single table before modeling even begins.

“KumoRFM-2 changes that: it’s the only model that actually understands the relationships across your tables instead of destroying them, it scales to hundreds of billions of rows, and it lets any team ask predictive questions in natural language,” he says.

For longer-term durability in real-world use cases, the Kumo team says that the technology has been built for robustness to noise, missing data, and structural degradation. By aggregating information across the relational graph, the model effectively “fills in” missing information from neighboring entities and tables.

With so much of the development in foundation model technologies still within research institutions, academic bodies, and vendor R&D divisions, this may be the next battleground where AI dominance is fought out.

Adrian Bridgwater is a technology journalist with three decades of press experience. He has an extensive background in communications, starting in print media, newspapers and also television. Primarily working as an analysis writer dedicated to a software application development ‘beat’,...

Sunbelt Computer Software

PL/B Language Development and Support

Kumo’s new foundation model replaces months of data science engineering with plain-English queries

Benchmark barometer

Alternative foundation model technologies

Predictive signals live in relationships