# TextDistance: A Comprehensive Python Library for Sequence Distance Calculation

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/orsinium-textdistance
Generated for open source discovery and AI-assisted research.

TextDistance is a powerful Python library designed to compute the distance and similarity between sequences using over 30 different algorithms. It offers a pure Python implementation with a common, easy-to-use interface, and can optionally leverage external libraries for maximum performance. This tool is ideal for tasks requiring robust string comparison, such as fuzzy matching and data cleaning.

GitHub: https://github.com/orsinium/textdistance
OSRepos URL: https://osrepos.com/repo/orsinium-textdistance

## Summary

TextDistance is a powerful Python library designed to compute the distance and similarity between sequences using over 30 different algorithms. It offers a pure Python implementation with a common, easy-to-use interface, and can optionally leverage external libraries for maximum performance. This tool is ideal for tasks requiring robust string comparison, such as fuzzy matching and data cleaning.

## Topics

- python
- textdistance
- algorithm
- string comparison
- fuzzy matching
- levenshtein
- damerau-levenshtein
- natural language processing

## Repository Information

Last analyzed by OSRepos: Fri Oct 31 2025 08:01:28 GMT+0000 (Western European Standard Time)
Detail views: 5
GitHub clicks: 2

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

TextDistance is a comprehensive and versatile Python library designed for computing the distance and similarity between sequences. It stands out by offering implementations of over 30 different algorithms, all accessible through a common and intuitive interface. Whether you need to compare strings, lists, or any other sequence, TextDistance provides a robust solution.

Key features include:
*   **Extensive Algorithm Collection**: Over 30 algorithms covering edit-based, token-based, sequence-based, compression-based, phonetic, and simple comparisons.
*   **Pure Python Implementation**: Ensures compatibility and ease of use across various environments.
*   **Common Interface**: All algorithms share a consistent API for calculating distance, similarity, normalized distance, and normalized similarity.
*   **Optional External Libraries**: For maximum speed, TextDistance can leverage highly optimized external C-based libraries like `jellyfish` and `Levenshtein` if installed.
*   **Multi-sequence Comparison**: Supports comparing more than two sequences simultaneously.

## Installation

Installing TextDistance is straightforward. You can choose between a pure Python version or include optional extra libraries for enhanced performance.

**Stable Version (Pure Python):**

bash
pip install textdistance


**Stable Version (With Extra Libraries for Maximum Speed):**

bash
pip install "textdistance[extras]"


**Development Version:**

You can install the development version directly from GitHub:

bash
pip install -e git+https://github.com/life4/textdistance.git#egg=textdistance


Alternatively, clone the repository and install with benchmark extras:

bash
git clone https://github.com/life4/textdistance.git
pip install -e ".[benchmark]"


## Examples

TextDistance provides a simple and consistent interface across all its algorithms. Here's an example using the Hamming distance:

python
import textdistance

# Calculate Hamming distance
print(textdistance.hamming('test', 'text'))
# Output: 1

# Using the explicit distance method
print(textdistance.hamming.distance('test', 'text'))
# Output: 1

# Calculate similarity
print(textdistance.hamming.similarity('test', 'text'))
# Output: 3

# Calculate normalized distance (0 to 1, 0 means equal)
print(textdistance.hamming.normalized_distance('test', 'text'))
# Output: 0.25

# Calculate normalized similarity (0 to 1, 1 means equal)
print(textdistance.hamming.normalized_similarity('test', 'text'))
# Output: 0.75

# Using q-grams for comparison (e.g., qval=2 for bigrams)
print(textdistance.Hamming(qval=2).distance('test', 'text'))
# Output: 2


This consistent API makes it easy to experiment with different algorithms and find the best fit for your specific use case.

## Why Use TextDistance?

TextDistance offers compelling reasons for developers and data scientists:

*   **Unmatched Versatility**: With over 30 algorithms, it covers almost every scenario for sequence comparison, from simple character differences to complex phonetic matching. This breadth makes it a go-to tool for diverse applications.
*   **Optimized Performance**: While providing a pure Python fallback, TextDistance intelligently integrates with faster, C-optimized external libraries when available. Benchmarks clearly demonstrate significant speed improvements, making it suitable for performance-critical applications.
*   **Simplified Development**: The unified API across all algorithms drastically reduces the learning curve and development time. You can switch between algorithms with minimal code changes.
*   **Robustness for Data Tasks**: It's an invaluable asset for data cleaning, deduplication, fuzzy matching, natural language processing, and bioinformatics, where accurate and efficient sequence comparison is crucial.
*   **Active Development and Community**: The project is actively maintained, welcoming contributions and ensuring its continued evolution.

## Links

*   **GitHub Repository**: [life4/textdistance](https://github.com/life4/textdistance){:target="_blank"}
*   **PyPI Project Page**: [textdistance on PyPI](https://pypi.org/project/textdistance/){:target="_blank"}
*   **Guide to Fuzzy Matching with Python**: [Read the article](http://theautomatic.net/2019/11/13/guide-to-fuzzy-matching-with-python/){:target="_blank"}
*   **String similarity, the basic know your algorithms guide!**: [Read the article](https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227){:target="_blank"}
*   **Normalized compression distance**: [Read the article](https://articles.life4web.ru/other/ncd/){:target="_blank"}