pg_textsearch: Modern BM25 Full-Text Search for PostgreSQL
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
pg_textsearch is a powerful PostgreSQL extension that brings modern BM25 relevance-ranked full-text search capabilities directly to your database. It offers a simple syntax, configurable parameters, and integrates seamlessly with existing PostgreSQL text search configurations. This extension provides best-in-class performance and scalability for your search needs.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
pg_textsearch is a PostgreSQL extension developed by Timescale that provides advanced full-text search functionality using the BM25 ranking algorithm. This extension allows developers to implement highly relevant search experiences directly within their PostgreSQL databases, leveraging the power of a proven ranking function. It's designed for performance and scalability, making it suitable for a wide range of applications requiring sophisticated text search. The project, originally named Tapir, maintains its mascot and is currently production-ready at v1.4.0-dev, supporting PostgreSQL 17 and 18.
Installation
Getting pg_textsearch up and running is straightforward. You can either use pre-built binaries or compile it from source.
Pre-built Binaries:
Download the appropriate binaries for your system (Linux or macOS, amd64 or arm64) and PostgreSQL version (17 or 18) from the official Releases page.
Build from Source:
For those who prefer to build from source, follow these steps:
cd /tmp
git clone https://github.com/timescale/pg_textsearch
cd pg_textsearch
make
make install # may need sudo
If you have multiple PostgreSQL installations, specify PG_CONFIG:
export PG_CONFIG=/path/to/your/pg_config
make clean && make && make install
On Debian/Ubuntu, you might need to install development files:
sudo apt install postgresql-server-dev-17 # for PostgreSQL 17
sudo apt install postgresql-server-dev-18 # for PostgreSQL 18
Enable the Extension:
After installation, you need to load the extension in your postgresql.conf and then enable it in your database:
- Add
pg_textsearchtoshared_preload_librariesinpostgresql.confand restart your server:shared_preload_libraries = 'pg_textsearch' # add to existing list if needed - Enable the extension in your database:
CREATE EXTENSION pg_textsearch;
Examples
Here's how to get started with pg_textsearch for basic full-text search.
Basic Search:
- Create a table with text content:
CREATE TABLE documents (id bigserial PRIMARY KEY, content text); INSERT INTO documents (content) VALUES ('PostgreSQL is a powerful database system'), ('BM25 is an effective ranking function'), ('Full text search with custom scoring'); - Create a
pg_textsearchindex on the text column, specifying a text configuration (e.g., 'english'):CREATE INDEX docs_idx ON documents USING bm25(content) WITH (text_config='english'); - Query for relevant documents using the
<@>operator. Note that lower scores indicate better matches.
For explicit index specification, useSELECT * FROM documents ORDER BY content <@> 'database system' LIMIT 5;to_bm25query:SELECT * FROM documents ORDER BY content <@> to_bm25query('database system', 'docs_idx') LIMIT 5;
Expression Indexes:
You can also create indexes on expressions, useful for JSONB fields or multi-column searches:
-- Indexing a JSONB field
CREATE INDEX ON events USING bm25 ((data->>'description'))
WITH (text_config='english');
-- Querying the JSONB field
SELECT * FROM events
ORDER BY (data->>'description') <@> to_bm25query('network error', 'events_expr_idx')
LIMIT 10;
Why Use pg_textsearch?
pg_textsearch offers several compelling reasons for integrating it into your PostgreSQL-based applications:
- BM25 Ranking: It provides the industry-standard BM25 ranking algorithm, delivering highly relevant search results based on term frequency and document length normalization.
- Simple Integration: With a straightforward SQL syntax and seamless integration with PostgreSQL's native text search configurations, it's easy to adopt.
- Performance and Scalability: Designed for efficiency, it supports fast top-k queries via Block-Max WAND optimization, parallel index builds, and an optimized memtable architecture for writes.
- Flexibility: Supports expression indexes for complex data types like JSONB, partial indexes for scoped searches, and multilingual tables, allowing for highly customized search solutions.
- PostgreSQL Native: Being a PostgreSQL extension, it benefits from PostgreSQL's robustness, transactional guarantees, and familiar administration.
Links
- GitHub Repository: https://github.com/timescale/pg_textsearch
- Releases Page: https://github.com/timescale/pg_textsearch/releases
- Contributing Guidelines: https://github.com/timescale/pg_textsearch/blob/main/CONTRIBUTING.md
Source repository
Open the original repository on GitHub.