dataset: Easy-to-Use Data Handling for SQL in Python

Summary
Dataset is a Python library designed to simplify data handling for SQL data stores. It offers features like implicit table creation, bulk loading, and transaction support, making database interactions as straightforward as working with JSON files.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
dataset is a powerful Python library designed to simplify interactions with SQL databases. It provides an intuitive, high-level API that makes reading and writing data as straightforward as working with JSON files. Key features include implicit table creation, efficient bulk loading, and robust transaction support, streamlining common database operations for developers.
It's important to note that as of version 1.0, dataset has split its data export features into a separate, standalone package called datafreeze.
Installation
Installing dataset is simple using pip:
$ pip install dataset
Examples
Here's a quick example demonstrating how to connect to a database, insert data, and query it using dataset:
import dataset
# Connect to an SQLite database (or any other SQL DB)
db = dataset.connect('sqlite:///mydatabase.db')
# Get a table, implicitly created if it doesn't exist
table = db['mytable']
# Insert data
table.insert(dict(name='John Doe', age=30))
table.insert(dict(name='Jane Smith', age=25))
# Find data based on conditions
print("People younger than 30:")
for row in table.find(age={'<': 30}):
print(f"- {row['name']}")
# Update data
table.update(dict(name='John Doe', age=31), ['name'])
print("\nUpdated John Doe's age:")
print(table.find_one(name='John Doe'))
Why Use It
Dataset excels at simplifying common database tasks, making it an excellent choice for developers who need to interact with SQL data stores without the complexity of full-fledged ORMs. Its features, such as implicit table creation, bulk loading, and transaction management, significantly reduce boilerplate code. This allows for rapid data manipulation and exploration, making it particularly useful for scripting, data analysis, and developing small to medium-sized applications where speed and ease of use are paramount.
Links
- GitHub Repository: pudo/dataset
- Official Documentation: Read the Docs
- Related Project (datafreeze): pudo/datafreeze