TabSTAR: A Tabular Foundation Model for Data with Text Fields
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
TabSTAR is an innovative tabular foundation model designed to effectively process tabular data that includes text fields. It offers a user-friendly package for integrating pretrained models into your own datasets, alongside a comprehensive research mode for advanced development and benchmarking. This powerful tool simplifies the application of deep learning to complex tabular structures.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
TabSTAR is a groundbreaking Tabular Foundation Model specifically engineered to handle tabular data enriched with text fields. It addresses the challenge of integrating unstructured text information within structured tabular datasets, offering a powerful solution for various machine learning tasks. Whether you're looking to apply a pretrained model or delve into advanced research, TabSTAR provides a robust framework. It excels at processing tabular data where text fields are crucial, leveraging a foundation model approach to achieve high performance. It supports both a straightforward package mode for quick integration and a comprehensive research mode for in-depth experimentation and development.
Installation
TabSTAR offers two primary modes of operation, each with its own installation method:
Package Mode
For users who want to quickly integrate a pretrained TabSTAR model into their projects, install it via pip:
pip install tabstar
Research Mode
If you plan to engage in model development, pretraining, or benchmark evaluations, clone the repository and set up the environment:
source init.sh
This script will install all necessary dependencies and prepare your environment.
Examples
TabSTAR is designed for both practical application and research.
Package Mode Inference
Using TabSTAR for inference on your own data is straightforward. Here's a quick example for classification:
from importlib.resources import files
import pandas as pd
from sklearn.model_selection import train_test_split
from tabstar.tabstar_model import TabSTARClassifier
csv_path = files("tabstar").joinpath("resources", "imdb.csv")
x = pd.read_csv(csv_path)
y = x.pop('Genre_is_Drama')
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1)
tabstar = TabSTARClassifier()
tabstar.fit(x_train, y_train)
metric = tabstar.score(X=x_test, y=y_test)
print(f"AUC: {metric:.4f}")
Research Mode Operations
For researchers, TabSTAR provides scripts for advanced tasks:
- Benchmark Evaluation: Evaluate TabSTAR on public datasets using
python tabstar_paper/do_benchmark.py --model=tabstar --dataset_id=<DATASET_ID>. - Pretraining: Pretrain the model on a specified number of datasets with
python tabstar_paper/do_pretrain.py --n_datasets=256. - Finetuning: Finetune a pretrained model on a downstream task using
python tabstar_paper/do_finetune.py --pretrain_exp=<PRETRAINED_EXP> --dataset_id=46655.
Why Use TabSTAR?
TabSTAR stands out for several compelling reasons:
- Handles Complex Data: It uniquely addresses the challenge of tabular data containing text fields, a common scenario in real-world datasets where traditional tabular models often struggle.
- Foundation Model Power: By leveraging a foundation model approach, TabSTAR can learn rich representations from diverse tabular data, leading to superior performance on various tasks.
- Versatility: It caters to both practitioners needing a quick, effective solution via its package mode and researchers aiming to push the boundaries of tabular deep learning through its research mode.
- Ease of Use: The package mode provides a simple API for fitting and predicting with minimal setup, making it accessible for data scientists.
- Cutting-Edge Research: Backed by a scientific paper and ongoing development, TabSTAR represents a cutting-edge solution in the field of tabular machine learning.
Links
Related repositories
Similar repositories that may be relevant next.

NVIDIA PhysicsNeMo: Deep Learning Framework for Physics-ML Models
June 16, 2026
NVIDIA PhysicsNeMo is an open-source deep learning framework designed for building, training, and fine-tuning Physics AI models. It leverages state-of-the-art scientific machine learning methods, enabling real-time predictions by combining physics knowledge with data. This framework provides scalable, GPU-optimized tools for AI4Science and engineering applications.
JARVIS: Connecting LLMs with the ML Community for AGI Exploration
May 16, 2026
JARVIS is an innovative system developed by Microsoft that aims to bridge Large Language Models (LLMs) with the broader Machine Learning community. It serves as a collaborative platform, using an LLM as a controller to orchestrate numerous expert models from Hugging Face Hub, thereby facilitating the exploration of Artificial General Intelligence (AGI) and solving complex AI tasks. This system streamlines the process of task planning, model selection, execution, and response generation.

Roboflow Notebooks: Master State-of-the-Art Computer Vision Models
April 6, 2026
Roboflow Notebooks offers a comprehensive collection of tutorials designed to help users master state-of-the-art computer vision models and techniques. This repository covers a wide range of topics, from foundational architectures like ResNet to cutting-edge models such as RF-DETR, YOLO11, SAM 3, and Qwen3-VL. It serves as an invaluable resource for anyone looking to explore and implement advanced computer vision solutions.

AUTOMATIC1111/stable-diffusion-webui: Powerful AI Image Generation Web UI
March 29, 2026
The AUTOMATIC1111/stable-diffusion-webui project offers a comprehensive web interface for Stable Diffusion, simplifying AI art generation. It provides a robust set of features, including text-to-image, image-to-image, inpainting, and upscaling, all within a user-friendly environment. This Python-based UI is a popular choice for both beginners and advanced users exploring generative AI.
Source repository
Open the original repository on GitHub.