sumy: Automatic Text Summarization for Documents and HTML Pages

This repository profile is provided by osrepos.com, an open source repository discovery platform.

sumy: Automatic Text Summarization for Documents and HTML Pages

Summary

sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.

Repository Information

Analyzed by OSRepos on December 14, 2025

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

sumy is a powerful and easy-to-use Python library for automatic text summarization. It allows you to extract concise summaries from various sources, including plain text documents and HTML pages. Built with flexibility in mind, sumy supports several popular summarization algorithms, such as LexRank, LSA, Luhn, and Edmundson, making it adaptable to different summarization needs. Furthermore, it boasts multi-language support, with an extensible framework to add new languages easily.

Installation

Getting started with sumy is straightforward. Ensure you have Python 3.6+ and pip installed on your system.

To install the stable version:

$ pip install sumy

For the very latest version directly from the GitHub repository:

$ pip install git+git://github.com/miso-belica/sumy.git

You can also run sumy as a Docker container, avoiding local installation complexities:

$ docker run --rm misobelica/sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization

Examples

sumy provides both a command-line interface for quick summarization and a Python API for integration into your projects.

Command-Line Usage

Summarize content directly from a URL:

$ sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization

Get help and explore more options:

$ sumy --help

sumy also includes a utility for evaluating summarization methods:

$ sumy_eval lex-rank reference_summary.txt --url=https://en.wikipedia.org/wiki/Automatic_summarization

Python API

Integrate sumy into your Python applications as a library. Here's a basic example to summarize an HTML page:

# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "english"
SENTENCES_COUNT = 10


if __name__ == "__main__":
    url = "https://en.wikipedia.org/wiki/Automatic_summarization"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    # or for plain text files
    # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
    # parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, SENTENCES_COUNT):
        print(sentence)

Why Use sumy?

sumy stands out as an excellent choice for text summarization due to several key features:

  • Versatile Input: It can process both plain text and HTML content, making it suitable for a wide range of applications, from local documents to web scraping.
  • Multiple Algorithms: With implementations of various summarization techniques like LSA, LexRank, Luhn, and Edmundson, you can choose the method best suited for your specific summarization task.
  • Multi-language Support: sumy is designed to support multiple natural languages, and its architecture makes it easy to extend support for new languages.
  • Ease of Use: Whether you prefer a quick command-line summary or deep integration into a Python project, sumy offers intuitive interfaces for both.
  • Active Development: The project is actively maintained and has a strong community, as evidenced by its significant number of stars and forks on GitHub.
  • Evaluation Framework: It includes tools for evaluating the quality of generated summaries, which is crucial for research and fine-tuning.

Links

Related repositories

Similar repositories that may be relevant next.

LazyLLM: Low-Code Development for Multi-Agent LLM Applications

LazyLLM: Low-Code Development for Multi-Agent LLM Applications

July 2, 2026

LazyLLM offers a low-code development tool designed for building multi-agent LLM applications with ease. It simplifies the creation of complex AI applications, providing a streamlined workflow for rapid prototyping, data feedback, and iterative optimization. Developers can leverage its extensive features for deployment, cross-platform compatibility, and efficient model fine-tuning.

PythonAI DevelopmentMulti-Agent
ChatArena: Multi-Agent Language Game Environments for LLMs

ChatArena: Multi-Agent Language Game Environments for LLMs

July 1, 2026

ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.

AILarge Language ModelsMulti-Agent Systems
Agentarium: A Python Framework for AI Agent Simulations

Agentarium: A Python Framework for AI Agent Simulations

July 1, 2026

Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.

PythonAIAgents
Lighteval: Your All-in-One Toolkit for LLM Evaluation

Lighteval: Your All-in-One Toolkit for LLM Evaluation

July 1, 2026

Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.

evaluationevaluation-frameworkevaluation-metrics

Source repository

Open the original repository on GitHub.

7 counted GitHub visits

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️