Speakr: Self-Hosted AI Transcription and Intelligent Note-Taking Platform

Summary
Speakr is a powerful, self-hosted web application for transcribing audio recordings into organized, searchable, and intelligent notes. It prioritizes privacy by running on your own infrastructure and offers advanced AI features like speaker identification, interactive chat, and semantic search. This platform is ideal for individuals and groups seeking to transform audio into actionable insights while maintaining full control over their data.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Speakr is an innovative, self-hosted web application designed to revolutionize how you manage audio recordings. It transforms raw audio into organized, searchable, and intelligent notes, all while ensuring your privacy by operating entirely on your own infrastructure. Built with Python and Vue.js, Speakr offers a comprehensive suite of features for transcription, collaboration, and smart organization, making it an ideal solution for anyone needing to convert spoken words into actionable text.
Installation
Getting Speakr up and running is straightforward, especially using Docker. This method ensures all dependencies are managed efficiently.
First, create a project directory and navigate into it:
mkdir speakr && cd speakr
Next, download the docker-compose.yml configuration and the environment template:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.transcription.example -O .env
Before launching, you'll need to configure your API keys in the .env file. Open it with a text editor like nano:
nano .env
Ensure you set TRANSCRIPTION_API_KEY (for OpenAI or similar) or ASR_BASE_URL (for self-hosted WhisperX) and TEXT_MODEL_API_KEY (for summaries, titles, and chat).
Finally, launch Speakr using Docker Compose:
docker compose up -d
You can then access Speakr at http://localhost:8899. For detailed installation options, including self-hosted WhisperX with GPU support, refer to the official documentation.
Examples
Speakr's versatility shines through its real-world use cases and creative tag prompts, allowing users to tailor its functionality to specific needs.
Real-World Use Cases:
- Family Memories: Create a "Family" group with a protected tag to automatically share and preserve recordings of trips and events indefinitely.
- Book Club Discussions: Use a "Book Club" group and tag monthly meetings for automatic sharing among members, allowing for personal notes.
- Work Project Groups: Share recordings individually with teammates for temporary collaboration, with easy revocation of access when projects conclude.
- Daily Standups: Implement a group tag with a 14-day retention policy for automatic sharing and cleanup of routine meetings.
- Legal Consultations: Utilize a group tag with a 7-year retention policy for automatic sharing with a legal group, ensuring compliance-based retention.
Creative Tag Prompt Examples:
Speakr's smart tagging system allows for custom AI prompts, transforming raw recordings into structured outputs:
- Recipe Recordings: Tag recordings of yourself cooking with "Recipe" to convert spoken instructions into formatted recipes with ingredient lists and numbered steps.
- Lecture Notes: Students can tag lectures with "Study Notes" to generate organized outlines with concepts, examples, and definitions.
- Meeting Summaries: An "Action Items" tag can filter discussions to return only decisions, tasks, and deadlines.
- Tag Stacking: Combine multiple tags, such as "Recipe" + "Gluten Free", to get a formatted recipe with gluten substitution suggestions, demonstrating powerful layered AI instructions.
Why Use Speakr
Speakr stands out as a premier solution for audio transcription and intelligent note-taking due to several compelling reasons:
- Privacy-First & Self-Hosted: It runs entirely on your own infrastructure, ensuring sensitive conversations remain private and under your control.
- Advanced AI Capabilities: Features high-accuracy AI transcription with speaker identification, voice profiles, interactive chat to query recordings, and semantic search across all your notes.
- Robust Collaboration: Offers internal sharing with granular permissions, group management with automatic sharing via group-scoped tags, and secure public sharing options.
- Intelligent Organization: Utilizes smart tagging with custom AI prompts and tag stacking for powerful transformations, alongside flexible retention policies and automatic cleanup.
- Seamless Integrations: Supports automated exports to popular note-taking systems like Obsidian and Logseq, documentation wikis, and can be integrated into project management workflows via its comprehensive REST API.
- User-Friendly Interface: Provides a beautiful, internationalized interface with light/dark modes, audio-transcript synchronization, and performance optimizations for large transcripts.