# sumy: Automatic Text Summarization for Documents and HTML Pages

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/miso-belica-sumy
Generated for open source discovery and AI-assisted research.

sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.

GitHub: https://github.com/miso-belica/sumy
OSRepos URL: https://osrepos.com/repo/miso-belica-sumy

## Summary

sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.

## Topics

- Python
- summarization
- NLP
- text-extraction
- html-extraction
- summarizer
- document-processing

## Repository Information

Last analyzed by OSRepos: Sun Dec 14 2025 08:01:10 GMT+0000 (Western European Standard Time)
Detail views: 5
GitHub clicks: 7

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

`sumy` is a powerful and easy-to-use Python library for automatic text summarization. It allows you to extract concise summaries from various sources, including plain text documents and HTML pages. Built with flexibility in mind, `sumy` supports several popular summarization algorithms, such as LexRank, LSA, Luhn, and Edmundson, making it adaptable to different summarization needs. Furthermore, it boasts multi-language support, with an extensible framework to add new languages easily.

## Installation

Getting started with `sumy` is straightforward. Ensure you have Python 3.6+ and `pip` installed on your system.

To install the stable version:

sh
$ pip install sumy


For the very latest version directly from the GitHub repository:

sh
$ pip install git+git://github.com/miso-belica/sumy.git


You can also run `sumy` as a Docker container, avoiding local installation complexities:

sh
$ docker run --rm misobelica/sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization


## Examples

`sumy` provides both a command-line interface for quick summarization and a Python API for integration into your projects.

### Command-Line Usage

Summarize content directly from a URL:

sh
$ sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization


Get help and explore more options:

sh
$ sumy --help


`sumy` also includes a utility for evaluating summarization methods:

sh
$ sumy_eval lex-rank reference_summary.txt --url=https://en.wikipedia.org/wiki/Automatic_summarization


### Python API

Integrate `sumy` into your Python applications as a library. Here's a basic example to summarize an HTML page:

python
# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "english"
SENTENCES_COUNT = 10


if __name__ == "__main__":
    url = "https://en.wikipedia.org/wiki/Automatic_summarization"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    # or for plain text files
    # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
    # parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, SENTENCES_COUNT):
        print(sentence)


## Why Use sumy?

`sumy` stands out as an excellent choice for text summarization due to several key features:

*   **Versatile Input:** It can process both plain text and HTML content, making it suitable for a wide range of applications, from local documents to web scraping.
*   **Multiple Algorithms:** With implementations of various summarization techniques like LSA, LexRank, Luhn, and Edmundson, you can choose the method best suited for your specific summarization task.
*   **Multi-language Support:** `sumy` is designed to support multiple natural languages, and its architecture makes it easy to extend support for new languages.
*   **Ease of Use:** Whether you prefer a quick command-line summary or deep integration into a Python project, `sumy` offers intuitive interfaces for both.
*   **Active Development:** The project is actively maintained and has a strong community, as evidenced by its significant number of stars and forks on GitHub.
*   **Evaluation Framework:** It includes tools for evaluating the quality of generated summaries, which is crucial for research and fine-tuning.

## Links

*   **GitHub Repository:** [https://github.com/miso-belica/sumy](https://github.com/miso-belica/sumy){:target="_blank"}
*   **Hugging Face Demo:** [https://huggingface.co/spaces/issam9/sumy_space](https://huggingface.co/spaces/issam9/sumy_space){:target="_blank"}
*   **Documentation:** [Explore `sumy`'s documentation on GitHub](https://github.com/miso-belica/sumy/tree/main/docs){:target="_blank"}