Repository History
Explore all analyzed open source repositories

Jieba: The Leading Python Library for Chinese Text Segmentation
Jieba is a highly popular and efficient Python library designed for Chinese text segmentation. It offers various cutting modes, including accurate, full, and search engine modes, making it versatile for different NLP tasks. With features like custom dictionaries and part-of-speech tagging, Jieba provides a comprehensive solution for processing Chinese text.

python-ftfy: Effortlessly Fixing Mojibake and Unicode Glitches
ftfy is a powerful Python library designed to automatically correct "mojibake" and other common glitches in Unicode text. It intelligently detects and fixes encoding mix-ups, transforming unreadable characters into their intended form. This tool is essential for developers and data scientists working with messy text data, ensuring readability and data integrity.