{"name":"TextDistance: A Comprehensive Python Library for Sequence Distance Calculation","description":"TextDistance is a powerful Python library designed to compute the distance and similarity between sequences using over 30 different algorithms. It offers a pure Python implementation with a common, easy-to-use interface, and can optionally leverage external libraries for maximum performance. This tool is ideal for tasks requiring robust string comparison, such as fuzzy matching and data cleaning.","github":"https://github.com/orsinium/textdistance","url":"https://osrepos.com/repo/orsinium-textdistance","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/orsinium-textdistance","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/orsinium-textdistance.md","json":"https://osrepos.com/repo/orsinium-textdistance.json","topics":["python","textdistance","algorithm","string comparison","fuzzy matching","levenshtein","damerau-levenshtein","natural language processing"],"keywords":["python","textdistance","algorithm","string comparison","fuzzy matching","levenshtein","damerau-levenshtein","natural language processing"],"stars":null,"summary":"TextDistance is a powerful Python library designed to compute the distance and similarity between sequences using over 30 different algorithms. It offers a pure Python implementation with a common, easy-to-use interface, and can optionally leverage external libraries for maximum performance. This tool is ideal for tasks requiring robust string comparison, such as fuzzy matching and data cleaning.","content":"## Introduction\n\nTextDistance is a comprehensive and versatile Python library designed for computing the distance and similarity between sequences. It stands out by offering implementations of over 30 different algorithms, all accessible through a common and intuitive interface. Whether you need to compare strings, lists, or any other sequence, TextDistance provides a robust solution.\n\nKey features include:\n*   **Extensive Algorithm Collection**: Over 30 algorithms covering edit-based, token-based, sequence-based, compression-based, phonetic, and simple comparisons.\n*   **Pure Python Implementation**: Ensures compatibility and ease of use across various environments.\n*   **Common Interface**: All algorithms share a consistent API for calculating distance, similarity, normalized distance, and normalized similarity.\n*   **Optional External Libraries**: For maximum speed, TextDistance can leverage highly optimized external C-based libraries like `jellyfish` and `Levenshtein` if installed.\n*   **Multi-sequence Comparison**: Supports comparing more than two sequences simultaneously.\n\n## Installation\n\nInstalling TextDistance is straightforward. You can choose between a pure Python version or include optional extra libraries for enhanced performance.\n\n**Stable Version (Pure Python):**\n\nbash\npip install textdistance\n\n\n**Stable Version (With Extra Libraries for Maximum Speed):**\n\nbash\npip install \"textdistance[extras]\"\n\n\n**Development Version:**\n\nYou can install the development version directly from GitHub:\n\nbash\npip install -e git+https://github.com/life4/textdistance.git#egg=textdistance\n\n\nAlternatively, clone the repository and install with benchmark extras:\n\nbash\ngit clone https://github.com/life4/textdistance.git\npip install -e \".[benchmark]\"\n\n\n## Examples\n\nTextDistance provides a simple and consistent interface across all its algorithms. Here's an example using the Hamming distance:\n\npython\nimport textdistance\n\n# Calculate Hamming distance\nprint(textdistance.hamming('test', 'text'))\n# Output: 1\n\n# Using the explicit distance method\nprint(textdistance.hamming.distance('test', 'text'))\n# Output: 1\n\n# Calculate similarity\nprint(textdistance.hamming.similarity('test', 'text'))\n# Output: 3\n\n# Calculate normalized distance (0 to 1, 0 means equal)\nprint(textdistance.hamming.normalized_distance('test', 'text'))\n# Output: 0.25\n\n# Calculate normalized similarity (0 to 1, 1 means equal)\nprint(textdistance.hamming.normalized_similarity('test', 'text'))\n# Output: 0.75\n\n# Using q-grams for comparison (e.g., qval=2 for bigrams)\nprint(textdistance.Hamming(qval=2).distance('test', 'text'))\n# Output: 2\n\n\nThis consistent API makes it easy to experiment with different algorithms and find the best fit for your specific use case.\n\n## Why Use TextDistance?\n\nTextDistance offers compelling reasons for developers and data scientists:\n\n*   **Unmatched Versatility**: With over 30 algorithms, it covers almost every scenario for sequence comparison, from simple character differences to complex phonetic matching. This breadth makes it a go-to tool for diverse applications.\n*   **Optimized Performance**: While providing a pure Python fallback, TextDistance intelligently integrates with faster, C-optimized external libraries when available. Benchmarks clearly demonstrate significant speed improvements, making it suitable for performance-critical applications.\n*   **Simplified Development**: The unified API across all algorithms drastically reduces the learning curve and development time. You can switch between algorithms with minimal code changes.\n*   **Robustness for Data Tasks**: It's an invaluable asset for data cleaning, deduplication, fuzzy matching, natural language processing, and bioinformatics, where accurate and efficient sequence comparison is crucial.\n*   **Active Development and Community**: The project is actively maintained, welcoming contributions and ensuring its continued evolution.\n\n## Links\n\n*   **GitHub Repository**: [life4/textdistance](https://github.com/life4/textdistance){:target=\"_blank\"}\n*   **PyPI Project Page**: [textdistance on PyPI](https://pypi.org/project/textdistance/){:target=\"_blank\"}\n*   **Guide to Fuzzy Matching with Python**: [Read the article](http://theautomatic.net/2019/11/13/guide-to-fuzzy-matching-with-python/){:target=\"_blank\"}\n*   **String similarity, the basic know your algorithms guide!**: [Read the article](https://itnext.io/string-similarity-the-basic-know-your-algorithms-guide-3de3d7346227){:target=\"_blank\"}\n*   **Normalized compression distance**: [Read the article](https://articles.life4web.ru/other/ncd/){:target=\"_blank\"}","metrics":{"detailViews":5,"githubClicks":2},"dates":{"published":null,"modified":"2025-10-31T08:01:28.000Z"}}