TL;DR

Handy is a free, open-source desktop app that transcribes speech locally and pastes the text into any active text field. Built with Tauri (Rust + React/TypeScript), it runs on macOS, Windows, and Linux and supports Whisper and Parakeet models for transcription without sending audio to the cloud.

What happened

A new open-source application called Handy provides a privacy-focused, offline speech-to-text workflow for desktop users. Launched as a cross-platform Tauri app, Handy uses a configurable global shortcut (or push-to-talk) to record, filters silence with voice-activity detection (Silero), then runs local transcription using either Whisper models or CPU-optimized Parakeet V3. The resulting text is automatically pasted into whatever application the user was typing in. The project includes build instructions, a debug mode, and tooling for manual model installation for environments that block automatic downloads. Handy’s code combines a React/TypeScript frontend and a Rust backend, and integrates libraries such as whisper-rs, transcription-rs, cpal, vad-rs, rdev and rubato. The project is MIT licensed, under active development, and lists known issues and platform-specific notes—most notably intermittent Whisper crashes on some Windows/Linux configurations and limited Wayland support on Linux.

Why it matters

  • Local transcription keeps audio data on the user’s machine, addressing privacy concerns tied to cloud speech services.
  • Open-source licensing and modular architecture make the tool extensible for developers and organizations.
  • Supports both GPU-accelerated Whisper models and CPU-optimized Parakeet, offering options for varied hardware.
  • Cross-platform availability lowers the barrier for desktop accessibility tooling without subscription fees.

Key facts

  • Handy is free and released under the MIT license.
  • The app runs entirely offline; audio is processed locally rather than in the cloud.
  • Built with Tauri: frontend in React + TypeScript, backend in Rust for system integration and ML inference.
  • Uses VAD (Silero) to trim silence and offers transcription via Whisper (Small/Medium/Turbo/Large) or Parakeet V3 models.
  • Supports macOS (Intel and Apple Silicon), 64-bit Windows, and 64-bit Linux.
  • Manual model installation is supported; Whisper .bin files and Parakeet .tar.gz archives can be placed in the app data models directory.
  • Known issues include Whisper model crashes on certain Windows/Linux setups and limited Wayland support requiring helper tools (xdotool, wtype, dotool).
  • Handy provides a debug mode (Cmd/Ctrl+Shift+D) and includes platform-specific troubleshooting tips such as an environment variable to try when rendering issues occur.

What to watch next

  • Addition of file-based debug logging to aid diagnosis and developer contributions.
  • macOS keyboard improvements: support for the Globe key and a rewrite of global shortcut handling.
  • Settings refactor and cleanup of Tauri command patterns (including evaluation of tauri-specta).
  • Opt-in analytics for anonymous usage data (privacy-first, clear opt-in) — details and implementation pending.

Quick glossary

  • Tauri: A framework for building lightweight desktop applications using a web frontend and a Rust backend.
  • Whisper: A family of speech-recognition models originally from OpenAI; used here for local transcription with optional GPU acceleration.
  • Parakeet: A CPU-optimized, on-device speech recognition model family referenced in the project (Parakeet V2/V3).
  • VAD (Voice Activity Detection): A processing step that detects when a user is speaking so non-speech segments can be suppressed before transcription.

Reader FAQ

Does Handy send audio to the cloud for transcription?
No. Handy processes audio locally on the user’s machine.

Which operating systems are supported?
macOS (Intel and Apple Silicon), x64 Windows, and x64 Linux are supported.

Do I need a GPU to run Handy?
Not necessarily. Parakeet V3 is CPU-optimized; Whisper models can use GPU acceleration when available but may run on CPU with reduced performance.

Is Wayland fully supported on Linux?
Limited support — Wayland users may need tools such as wtype or dotool for reliable text input; further improvements are noted as work in progress.

Can I contribute or report issues?
Yes. The project accepts contributions via its GitHub repository and lists guidance for reporting issues and submitting pull requests.

Handy A free, open source, and extensible speech-to-text application that works completely offline. Handy is a cross-platform desktop application built with Tauri (Rust + React/TypeScript) that provides simple, privacy-focused speech…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *