dataset: Easy-to-Use Data Handling for SQL in Python
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
Dataset is a Python library designed to simplify data handling for SQL data stores. It offers features like implicit table creation, bulk loading, and transaction support, making database interactions as straightforward as working with JSON files.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
dataset is a powerful Python library designed to simplify interactions with SQL databases. It provides an intuitive, high-level API that makes reading and writing data as straightforward as working with JSON files. Key features include implicit table creation, efficient bulk loading, and robust transaction support, streamlining common database operations for developers.
It's important to note that as of version 1.0, dataset has split its data export features into a separate, standalone package called datafreeze.
Installation
Installing dataset is simple using pip:
$ pip install dataset
Examples
Here's a quick example demonstrating how to connect to a database, insert data, and query it using dataset:
import dataset
# Connect to an SQLite database (or any other SQL DB)
db = dataset.connect('sqlite:///mydatabase.db')
# Get a table, implicitly created if it doesn't exist
table = db['mytable']
# Insert data
table.insert(dict(name='John Doe', age=30))
table.insert(dict(name='Jane Smith', age=25))
# Find data based on conditions
print("People younger than 30:")
for row in table.find(age={'<': 30}):
print(f"- {row['name']}")
# Update data
table.update(dict(name='John Doe', age=31), ['name'])
print("\nUpdated John Doe's age:")
print(table.find_one(name='John Doe'))
Why Use It
Dataset excels at simplifying common database tasks, making it an excellent choice for developers who need to interact with SQL data stores without the complexity of full-fledged ORMs. Its features, such as implicit table creation, bulk loading, and transaction management, significantly reduce boilerplate code. This allows for rapid data manipulation and exploration, making it particularly useful for scripting, data analysis, and developing small to medium-sized applications where speed and ease of use are paramount.
Links
- GitHub Repository: pudo/dataset
- Official Documentation: Read the Docs
- Related Project (datafreeze): pudo/datafreeze
Related repositories
Similar repositories that may be relevant next.

Guardrails: Enhancing LLM Reliability and Structured Data Generation
June 26, 2026
Guardrails is a Python framework designed to build reliable AI applications by adding guardrails to large language models. It helps detect, quantify, and mitigate risks in LLM inputs/outputs, and facilitates the generation of structured data. This framework ensures more predictable and safer interactions with AI models.

Hiring Agent: An AI Agent for Resume Evaluation and Scoring
June 26, 2026
Hiring Agent is an open-source AI agent designed to evaluate and score resumes objectively. It extracts structured data from PDF resumes, enriches it with GitHub profile signals, and provides a fair, explainable evaluation with detailed scores and evidence. This tool supports both local LLMs via Ollama and cloud-based options like Google Gemini.

LLM Guard: The Security Toolkit for LLM Interactions
June 26, 2026
LLM Guard is an open-source security toolkit developed by Protect AI, designed to fortify the safety of Large Language Models. It offers comprehensive protection against various threats, including prompt injection, data leakage, and harmful language, ensuring secure and reliable LLM interactions.

AuditNLG: Auditing Generative AI for Trustworthiness
June 25, 2026
AuditNLG is an open-source library from Salesforce designed to enhance the trustworthiness of generative AI language models. It provides state-of-the-art techniques to detect and improve factualness, safety, and constraint adherence in AI-generated text. This library simplifies the process of auditing AI outputs, offering explanations and alternative suggestions for problematic content.
Source repository
Open the original repository on GitHub.