sumy: Automatic Text Summarization for Documents and HTML Pages
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
sumy is a robust Python module designed for automatic summarization of text documents and HTML pages. It provides various summarization methods, supports multiple natural languages, and offers both a command-line utility and a flexible Python API. This versatile tool enables users to efficiently extract concise summaries from lengthy content.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
sumy is a powerful and easy-to-use Python library for automatic text summarization. It allows you to extract concise summaries from various sources, including plain text documents and HTML pages. Built with flexibility in mind, sumy supports several popular summarization algorithms, such as LexRank, LSA, Luhn, and Edmundson, making it adaptable to different summarization needs. Furthermore, it boasts multi-language support, with an extensible framework to add new languages easily.
Installation
Getting started with sumy is straightforward. Ensure you have Python 3.6+ and pip installed on your system.
To install the stable version:
$ pip install sumy
For the very latest version directly from the GitHub repository:
$ pip install git+git://github.com/miso-belica/sumy.git
You can also run sumy as a Docker container, avoiding local installation complexities:
$ docker run --rm misobelica/sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization
Examples
sumy provides both a command-line interface for quick summarization and a Python API for integration into your projects.
Command-Line Usage
Summarize content directly from a URL:
$ sumy lex-rank --length=10 --url=https://en.wikipedia.org/wiki/Automatic_summarization
Get help and explore more options:
$ sumy --help
sumy also includes a utility for evaluating summarization methods:
$ sumy_eval lex-rank reference_summary.txt --url=https://en.wikipedia.org/wiki/Automatic_summarization
Python API
Integrate sumy into your Python applications as a library. Here's a basic example to summarize an HTML page:
# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals
from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
LANGUAGE = "english"
SENTENCES_COUNT = 10
if __name__ == "__main__":
url = "https://en.wikipedia.org/wiki/Automatic_summarization"
parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
# or for plain text files
# parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
# parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE))
stemmer = Stemmer(LANGUAGE)
summarizer = Summarizer(stemmer)
summarizer.stop_words = get_stop_words(LANGUAGE)
for sentence in summarizer(parser.document, SENTENCES_COUNT):
print(sentence)
Why Use sumy?
sumy stands out as an excellent choice for text summarization due to several key features:
- Versatile Input: It can process both plain text and HTML content, making it suitable for a wide range of applications, from local documents to web scraping.
- Multiple Algorithms: With implementations of various summarization techniques like LSA, LexRank, Luhn, and Edmundson, you can choose the method best suited for your specific summarization task.
- Multi-language Support:
sumyis designed to support multiple natural languages, and its architecture makes it easy to extend support for new languages. - Ease of Use: Whether you prefer a quick command-line summary or deep integration into a Python project,
sumyoffers intuitive interfaces for both. - Active Development: The project is actively maintained and has a strong community, as evidenced by its significant number of stars and forks on GitHub.
- Evaluation Framework: It includes tools for evaluating the quality of generated summaries, which is crucial for research and fine-tuning.
Links
- GitHub Repository: https://github.com/miso-belica/sumy
- Hugging Face Demo: https://huggingface.co/spaces/issam9/sumy_space
- Documentation: Explore
sumy's documentation on GitHub
Related repositories
Similar repositories that may be relevant next.

LazyLLM: Low-Code Development for Multi-Agent LLM Applications
July 2, 2026
LazyLLM offers a low-code development tool designed for building multi-agent LLM applications with ease. It simplifies the creation of complex AI applications, providing a streamlined workflow for rapid prototyping, data feedback, and iterative optimization. Developers can leverage its extensive features for deployment, cross-platform compatibility, and efficient model fine-tuning.

ChatArena: Multi-Agent Language Game Environments for LLMs
July 1, 2026
ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.
Agentarium: A Python Framework for AI Agent Simulations
July 1, 2026
Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.
Lighteval: Your All-in-One Toolkit for LLM Evaluation
July 1, 2026
Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.
Source repository
Open the original repository on GitHub.
7 counted GitHub visits