TextDistance: A Comprehensive Python Library for Sequence Distance Calculation

This repository profile is provided by osrepos.com, an open source repository discovery platform.

TextDistance: A Comprehensive Python Library for Sequence Distance Calculation

Summary

TextDistance is a powerful Python library designed to compute the distance and similarity between sequences using over 30 different algorithms. It offers a pure Python implementation with a common, easy-to-use interface, and can optionally leverage external libraries for maximum performance. This tool is ideal for tasks requiring robust string comparison, such as fuzzy matching and data cleaning.

Repository Information

Analyzed by OSRepos on October 31, 2025

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

TextDistance is a comprehensive and versatile Python library designed for computing the distance and similarity between sequences. It stands out by offering implementations of over 30 different algorithms, all accessible through a common and intuitive interface. Whether you need to compare strings, lists, or any other sequence, TextDistance provides a robust solution.

Key features include:

  • Extensive Algorithm Collection: Over 30 algorithms covering edit-based, token-based, sequence-based, compression-based, phonetic, and simple comparisons.
  • Pure Python Implementation: Ensures compatibility and ease of use across various environments.
  • Common Interface: All algorithms share a consistent API for calculating distance, similarity, normalized distance, and normalized similarity.
  • Optional External Libraries: For maximum speed, TextDistance can leverage highly optimized external C-based libraries like jellyfish and Levenshtein if installed.
  • Multi-sequence Comparison: Supports comparing more than two sequences simultaneously.

Installation

Installing TextDistance is straightforward. You can choose between a pure Python version or include optional extra libraries for enhanced performance.

Stable Version (Pure Python):

pip install textdistance

Stable Version (With Extra Libraries for Maximum Speed):

pip install "textdistance[extras]"

Development Version:

You can install the development version directly from GitHub:

pip install -e git+https://github.com/life4/textdistance.git#egg=textdistance

Alternatively, clone the repository and install with benchmark extras:

git clone https://github.com/life4/textdistance.git
pip install -e ".[benchmark]"

Examples

TextDistance provides a simple and consistent interface across all its algorithms. Here's an example using the Hamming distance:

import textdistance

# Calculate Hamming distance
print(textdistance.hamming('test', 'text'))
# Output: 1

# Using the explicit distance method
print(textdistance.hamming.distance('test', 'text'))
# Output: 1

# Calculate similarity
print(textdistance.hamming.similarity('test', 'text'))
# Output: 3

# Calculate normalized distance (0 to 1, 0 means equal)
print(textdistance.hamming.normalized_distance('test', 'text'))
# Output: 0.25

# Calculate normalized similarity (0 to 1, 1 means equal)
print(textdistance.hamming.normalized_similarity('test', 'text'))
# Output: 0.75

# Using q-grams for comparison (e.g., qval=2 for bigrams)
print(textdistance.Hamming(qval=2).distance('test', 'text'))
# Output: 2

This consistent API makes it easy to experiment with different algorithms and find the best fit for your specific use case.

Why Use TextDistance?

TextDistance offers compelling reasons for developers and data scientists:

  • Unmatched Versatility: With over 30 algorithms, it covers almost every scenario for sequence comparison, from simple character differences to complex phonetic matching. This breadth makes it a go-to tool for diverse applications.
  • Optimized Performance: While providing a pure Python fallback, TextDistance intelligently integrates with faster, C-optimized external libraries when available. Benchmarks clearly demonstrate significant speed improvements, making it suitable for performance-critical applications.
  • Simplified Development: The unified API across all algorithms drastically reduces the learning curve and development time. You can switch between algorithms with minimal code changes.
  • Robustness for Data Tasks: It's an invaluable asset for data cleaning, deduplication, fuzzy matching, natural language processing, and bioinformatics, where accurate and efficient sequence comparison is crucial.
  • Active Development and Community: The project is actively maintained, welcoming contributions and ensuring its continued evolution.

Links

Related repositories

Similar repositories that may be relevant next.

OpenMontage: The First Open-Source, Agentic Video Production System

OpenMontage: The First Open-Source, Agentic Video Production System

June 29, 2026

OpenMontage is the world's first open-source, agentic video production system, designed to transform your AI coding assistant into a full video production studio. It features 12 pipelines, 52 tools, and over 500 agent skills, enabling end-to-end video creation from a simple prompt. This powerful tool handles research, scripting, asset generation, editing, and final composition, including the unique ability to produce real video from stock footage.

agentic-aivideo-productionopen-source
MarkLLM: An Open-Source Toolkit for LLM Watermarking

MarkLLM: An Open-Source Toolkit for LLM Watermarking

June 23, 2026

MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

large-language-modelsllmsafety
Agent-Reach: Empower Your AI Agents with Internet Access, Zero API Fees

Agent-Reach: Empower Your AI Agents with Internet Access, Zero API Fees

June 21, 2026

Agent-Reach is a powerful GitHub repository that equips AI agents with the ability to access and search the entire internet, including platforms like Twitter, Reddit, YouTube, and Bilibili. It provides a streamlined CLI experience, eliminating the need for complex API configurations and associated fees. This project ensures your AI agent can "see" and interact with web content effortlessly.

ai-agentagent-infrastructureai-search
REAL Video Enhancer: AI-Powered Video Interpolation, Upscaling, and Denoising

REAL Video Enhancer: AI-Powered Video Interpolation, Upscaling, and Denoising

June 19, 2026

REAL Video Enhancer is a powerful open-source application designed to enhance video quality across Linux, Windows, and macOS. It leverages AI models for advanced video processing tasks such as frame interpolation, upscaling, decompression, and denoising. This tool provides a modern alternative to older software, making high-quality video enhancement accessible to a wider audience.

video-enhancementaiupscaling

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️