TextDistance: A Comprehensive Python Library for Sequence Distance Calculation
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
TextDistance is a powerful Python library designed to compute the distance and similarity between sequences using over 30 different algorithms. It offers a pure Python implementation with a common, easy-to-use interface, and can optionally leverage external libraries for maximum performance. This tool is ideal for tasks requiring robust string comparison, such as fuzzy matching and data cleaning.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
TextDistance is a comprehensive and versatile Python library designed for computing the distance and similarity between sequences. It stands out by offering implementations of over 30 different algorithms, all accessible through a common and intuitive interface. Whether you need to compare strings, lists, or any other sequence, TextDistance provides a robust solution.
Key features include:
- Extensive Algorithm Collection: Over 30 algorithms covering edit-based, token-based, sequence-based, compression-based, phonetic, and simple comparisons.
- Pure Python Implementation: Ensures compatibility and ease of use across various environments.
- Common Interface: All algorithms share a consistent API for calculating distance, similarity, normalized distance, and normalized similarity.
- Optional External Libraries: For maximum speed, TextDistance can leverage highly optimized external C-based libraries like
jellyfishandLevenshteinif installed. - Multi-sequence Comparison: Supports comparing more than two sequences simultaneously.
Installation
Installing TextDistance is straightforward. You can choose between a pure Python version or include optional extra libraries for enhanced performance.
Stable Version (Pure Python):
pip install textdistance
Stable Version (With Extra Libraries for Maximum Speed):
pip install "textdistance[extras]"
Development Version:
You can install the development version directly from GitHub:
pip install -e git+https://github.com/life4/textdistance.git#egg=textdistance
Alternatively, clone the repository and install with benchmark extras:
git clone https://github.com/life4/textdistance.git
pip install -e ".[benchmark]"
Examples
TextDistance provides a simple and consistent interface across all its algorithms. Here's an example using the Hamming distance:
import textdistance
# Calculate Hamming distance
print(textdistance.hamming('test', 'text'))
# Output: 1
# Using the explicit distance method
print(textdistance.hamming.distance('test', 'text'))
# Output: 1
# Calculate similarity
print(textdistance.hamming.similarity('test', 'text'))
# Output: 3
# Calculate normalized distance (0 to 1, 0 means equal)
print(textdistance.hamming.normalized_distance('test', 'text'))
# Output: 0.25
# Calculate normalized similarity (0 to 1, 1 means equal)
print(textdistance.hamming.normalized_similarity('test', 'text'))
# Output: 0.75
# Using q-grams for comparison (e.g., qval=2 for bigrams)
print(textdistance.Hamming(qval=2).distance('test', 'text'))
# Output: 2
This consistent API makes it easy to experiment with different algorithms and find the best fit for your specific use case.
Why Use TextDistance?
TextDistance offers compelling reasons for developers and data scientists:
- Unmatched Versatility: With over 30 algorithms, it covers almost every scenario for sequence comparison, from simple character differences to complex phonetic matching. This breadth makes it a go-to tool for diverse applications.
- Optimized Performance: While providing a pure Python fallback, TextDistance intelligently integrates with faster, C-optimized external libraries when available. Benchmarks clearly demonstrate significant speed improvements, making it suitable for performance-critical applications.
- Simplified Development: The unified API across all algorithms drastically reduces the learning curve and development time. You can switch between algorithms with minimal code changes.
- Robustness for Data Tasks: It's an invaluable asset for data cleaning, deduplication, fuzzy matching, natural language processing, and bioinformatics, where accurate and efficient sequence comparison is crucial.
- Active Development and Community: The project is actively maintained, welcoming contributions and ensuring its continued evolution.
Links
- GitHub Repository: life4/textdistance
- PyPI Project Page: textdistance on PyPI
- Guide to Fuzzy Matching with Python: Read the article
- String similarity, the basic know your algorithms guide!: Read the article
- Normalized compression distance: Read the article
Related repositories
Similar repositories that may be relevant next.
OpenMontage: The First Open-Source, Agentic Video Production System
June 29, 2026
OpenMontage is the world's first open-source, agentic video production system, designed to transform your AI coding assistant into a full video production studio. It features 12 pipelines, 52 tools, and over 500 agent skills, enabling end-to-end video creation from a simple prompt. This powerful tool handles research, scripting, asset generation, editing, and final composition, including the unique ability to produce real video from stock footage.

MarkLLM: An Open-Source Toolkit for LLM Watermarking
June 23, 2026
MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

Agent-Reach: Empower Your AI Agents with Internet Access, Zero API Fees
June 21, 2026
Agent-Reach is a powerful GitHub repository that equips AI agents with the ability to access and search the entire internet, including platforms like Twitter, Reddit, YouTube, and Bilibili. It provides a streamlined CLI experience, eliminating the need for complex API configurations and associated fees. This project ensures your AI agent can "see" and interact with web content effortlessly.
REAL Video Enhancer: AI-Powered Video Interpolation, Upscaling, and Denoising
June 19, 2026
REAL Video Enhancer is a powerful open-source application designed to enhance video quality across Linux, Windows, and macOS. It leverages AI models for advanced video processing tasks such as frame interpolation, upscaling, decompression, and denoising. This tool provides a modern alternative to older software, making high-quality video enhancement accessible to a wider audience.
Source repository
Open the original repository on GitHub.