TabSTAR: A Tabular Foundation Model for Data with Text Fields

This repository profile is provided by osrepos.com, an open source repository discovery platform.

TabSTAR: A Tabular Foundation Model for Data with Text Fields

Summary

TabSTAR is an innovative tabular foundation model designed to effectively process tabular data that includes text fields. It offers a user-friendly package for integrating pretrained models into your own datasets, alongside a comprehensive research mode for advanced development and benchmarking. This powerful tool simplifies the application of deep learning to complex tabular structures.

Repository Information

Analyzed by OSRepos on January 2, 2026

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

TabSTAR is a groundbreaking Tabular Foundation Model specifically engineered to handle tabular data enriched with text fields. It addresses the challenge of integrating unstructured text information within structured tabular datasets, offering a powerful solution for various machine learning tasks. Whether you're looking to apply a pretrained model or delve into advanced research, TabSTAR provides a robust framework. It excels at processing tabular data where text fields are crucial, leveraging a foundation model approach to achieve high performance. It supports both a straightforward package mode for quick integration and a comprehensive research mode for in-depth experimentation and development.

Installation

TabSTAR offers two primary modes of operation, each with its own installation method:

Package Mode

For users who want to quickly integrate a pretrained TabSTAR model into their projects, install it via pip:

pip install tabstar

Research Mode

If you plan to engage in model development, pretraining, or benchmark evaluations, clone the repository and set up the environment:

source init.sh

This script will install all necessary dependencies and prepare your environment.

Examples

TabSTAR is designed for both practical application and research.

Package Mode Inference

Using TabSTAR for inference on your own data is straightforward. Here's a quick example for classification:

from importlib.resources import files
import pandas as pd
from sklearn.model_selection import train_test_split

from tabstar.tabstar_model import TabSTARClassifier

csv_path = files("tabstar").joinpath("resources", "imdb.csv")
x = pd.read_csv(csv_path)
y = x.pop('Genre_is_Drama')
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1)
tabstar = TabSTARClassifier()
tabstar.fit(x_train, y_train)
metric = tabstar.score(X=x_test, y=y_test)
print(f"AUC: {metric:.4f}")

Research Mode Operations

For researchers, TabSTAR provides scripts for advanced tasks:

  • Benchmark Evaluation: Evaluate TabSTAR on public datasets using python tabstar_paper/do_benchmark.py --model=tabstar --dataset_id=<DATASET_ID>.
  • Pretraining: Pretrain the model on a specified number of datasets with python tabstar_paper/do_pretrain.py --n_datasets=256.
  • Finetuning: Finetune a pretrained model on a downstream task using python tabstar_paper/do_finetune.py --pretrain_exp=<PRETRAINED_EXP> --dataset_id=46655.

Why Use TabSTAR?

TabSTAR stands out for several compelling reasons:

  • Handles Complex Data: It uniquely addresses the challenge of tabular data containing text fields, a common scenario in real-world datasets where traditional tabular models often struggle.
  • Foundation Model Power: By leveraging a foundation model approach, TabSTAR can learn rich representations from diverse tabular data, leading to superior performance on various tasks.
  • Versatility: It caters to both practitioners needing a quick, effective solution via its package mode and researchers aiming to push the boundaries of tabular deep learning through its research mode.
  • Ease of Use: The package mode provides a simple API for fitting and predicting with minimal setup, making it accessible for data scientists.
  • Cutting-Edge Research: Backed by a scientific paper and ongoing development, TabSTAR represents a cutting-edge solution in the field of tabular machine learning.

Links

Related repositories

Similar repositories that may be relevant next.

NVIDIA PhysicsNeMo: Deep Learning Framework for Physics-ML Models

NVIDIA PhysicsNeMo: Deep Learning Framework for Physics-ML Models

June 16, 2026

NVIDIA PhysicsNeMo is an open-source deep learning framework designed for building, training, and fine-tuning Physics AI models. It leverages state-of-the-art scientific machine learning methods, enabling real-time predictions by combining physics knowledge with data. This framework provides scalable, GPU-optimized tools for AI4Science and engineering applications.

deep-learningmachine-learningphysics-ml
JARVIS: Connecting LLMs with the ML Community for AGI Exploration

JARVIS: Connecting LLMs with the ML Community for AGI Exploration

May 16, 2026

JARVIS is an innovative system developed by Microsoft that aims to bridge Large Language Models (LLMs) with the broader Machine Learning community. It serves as a collaborative platform, using an LLM as a controller to orchestrate numerous expert models from Hugging Face Hub, thereby facilitating the exploration of Artificial General Intelligence (AGI) and solving complex AI tasks. This system streamlines the process of task planning, model selection, execution, and response generation.

deep-learningplatformpytorch
Roboflow Notebooks: Master State-of-the-Art Computer Vision Models

Roboflow Notebooks: Master State-of-the-Art Computer Vision Models

April 6, 2026

Roboflow Notebooks offers a comprehensive collection of tutorials designed to help users master state-of-the-art computer vision models and techniques. This repository covers a wide range of topics, from foundational architectures like ResNet to cutting-edge models such as RF-DETR, YOLO11, SAM 3, and Qwen3-VL. It serves as an invaluable resource for anyone looking to explore and implement advanced computer vision solutions.

computer-visiondeep-learningobject-detection
AUTOMATIC1111/stable-diffusion-webui: Powerful AI Image Generation Web UI

AUTOMATIC1111/stable-diffusion-webui: Powerful AI Image Generation Web UI

March 29, 2026

The AUTOMATIC1111/stable-diffusion-webui project offers a comprehensive web interface for Stable Diffusion, simplifying AI art generation. It provides a robust set of features, including text-to-image, image-to-image, inpainting, and upscaling, all within a user-friendly environment. This Python-based UI is a popular choice for both beginners and advanced users exploring generative AI.

aiai-artstable-diffusion

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️