{"name":"sumy: Automatic Text Summarization for Documents and HTML Pages","description":"sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.","github":"https://github.com/miso-belica/sumy","url":"https://osrepos.com/repo/miso-belica-sumy","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/miso-belica-sumy","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/miso-belica-sumy.md","json":"https://osrepos.com/repo/miso-belica-sumy.json","topics":["Python","summarization","NLP","text-extraction","html-extraction","summarizer","document-processing"],"keywords":["Python","summarization","NLP","text-extraction","html-extraction","summarizer","document-processing"],"stars":null,"summary":"sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.","content":"## Introduction\n\n`sumy` is a powerful and easy-to-use Python library for automatic text summarization. It allows you to extract concise summaries from various sources, including plain text documents and HTML pages. Built with flexibility in mind, `sumy` supports several popular summarization algorithms, such as LexRank, LSA, Luhn, and Edmundson, making it adaptable to different summarization needs. Furthermore, it boasts multi-language support, with an extensible framework to add new languages easily.\n\n## Installation\n\nGetting started with `sumy` is straightforward. Ensure you have Python 3.6+ and `pip` installed on your system.\n\nTo install the stable version:\n\nsh\n$ pip install sumy\n\n\nFor the very latest version directly from the GitHub repository:\n\nsh\n$ pip install git+git://github.com/miso-belica/sumy.git\n\n\nYou can also run `sumy` as a Docker container, avoiding local installation complexities:\n\nsh\n$ docker run --rm misobelica/sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization\n\n\n## Examples\n\n`sumy` provides both a command-line interface for quick summarization and a Python API for integration into your projects.\n\n### Command-Line Usage\n\nSummarize content directly from a URL:\n\nsh\n$ sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization\n\n\nGet help and explore more options:\n\nsh\n$ sumy --help\n\n\n`sumy` also includes a utility for evaluating summarization methods:\n\nsh\n$ sumy_eval lex-rank reference_summary.txt --url=https://en.wikipedia.org/wiki/Automatic_summarization\n\n\n### Python API\n\nIntegrate `sumy` into your Python applications as a library. Here's a basic example to summarize an HTML page:\n\npython\n# -*- coding: utf-8 -*-\n\nfrom __future__ import absolute_import\nfrom __future__ import division, print_function, unicode_literals\n\nfrom sumy.parsers.html import HtmlParser\nfrom sumy.parsers.plaintext import PlaintextParser\nfrom sumy.nlp.tokenizers import Tokenizer\nfrom sumy.summarizers.lsa import LsaSummarizer as Summarizer\nfrom sumy.nlp.stemmers import Stemmer\nfrom sumy.utils import get_stop_words\n\n\nLANGUAGE = \"english\"\nSENTENCES_COUNT = 10\n\n\nif __name__ == \"__main__\":\n    url = \"https://en.wikipedia.org/wiki/Automatic_summarization\"\n    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))\n    # or for plain text files\n    # parser = PlaintextParser.from_file(\"document.txt\", Tokenizer(LANGUAGE))\n    # parser = PlaintextParser.from_string(\"Check this out.\", Tokenizer(LANGUAGE))\n    stemmer = Stemmer(LANGUAGE)\n\n    summarizer = Summarizer(stemmer)\n    summarizer.stop_words = get_stop_words(LANGUAGE)\n\n    for sentence in summarizer(parser.document, SENTENCES_COUNT):\n        print(sentence)\n\n\n## Why Use sumy?\n\n`sumy` stands out as an excellent choice for text summarization due to several key features:\n\n*   **Versatile Input:** It can process both plain text and HTML content, making it suitable for a wide range of applications, from local documents to web scraping.\n*   **Multiple Algorithms:** With implementations of various summarization techniques like LSA, LexRank, Luhn, and Edmundson, you can choose the method best suited for your specific summarization task.\n*   **Multi-language Support:** `sumy` is designed to support multiple natural languages, and its architecture makes it easy to extend support for new languages.\n*   **Ease of Use:** Whether you prefer a quick command-line summary or deep integration into a Python project, `sumy` offers intuitive interfaces for both.\n*   **Active Development:** The project is actively maintained and has a strong community, as evidenced by its significant number of stars and forks on GitHub.\n*   **Evaluation Framework:** It includes tools for evaluating the quality of generated summaries, which is crucial for research and fine-tuning.\n\n## Links\n\n*   **GitHub Repository:** [https://github.com/miso-belica/sumy](https://github.com/miso-belica/sumy){:target=\"_blank\"}\n*   **Hugging Face Demo:** [https://huggingface.co/spaces/issam9/sumy_space](https://huggingface.co/spaces/issam9/sumy_space){:target=\"_blank\"}\n*   **Documentation:** [Explore `sumy`'s documentation on GitHub](https://github.com/miso-belica/sumy/tree/main/docs){:target=\"_blank\"}","metrics":{"detailViews":5,"githubClicks":7},"dates":{"published":null,"modified":"2025-12-14T08:01:10.000Z"}}