Real-time OCR mode
Use a movable OCR capture area and a separate translation window for games, videos, subtitles, and live dialogue.
Capabilities
Thaluna is not only a live OCR window. It combines real-time OCR, beta Audio Translation, Lens Mode, backlog history, transcript saving, built-in local models, Ollama, OpenAI-compatible local APIs, optional cloud models, hardware controls, presets, shortcuts, and manga-focused reading tools inside one Windows app.




Use a movable OCR capture area and a separate translation window for games, videos, subtitles, and live dialogue.
Use live loopback/system audio transcription in Real-Time mode when OCR is unreliable or subtitles are unavailable.
Switch to a snapshot-based overlay workflow for full-screen scans, selected areas, and on-image translation placement.
Inside Lens Mode, monitor a selected area and refresh overlay translations automatically when the content changes.
Lens Mode includes manga-focused grouping for speech bubbles and comic-style layouts.
Snapshot mode includes a vertical-text option for harder Japanese layouts instead of assuming only left-to-right text.
OCR language is selected separately from the translation target, so you can match recognition to what is actually on screen.
Lens overlays support font-size changes plus text color, background color, and background opacity customization for readability on complex artwork.
The default workflow includes local translation, so the app can work offline without forcing a separate cloud account.
Refresh and use your own Ollama models inside the same app when you want a local model stack beyond the built-in workflow.
Connect a local OpenAI-compatible server such as llama.cpp, LM Studio, or another compatible endpoint with a base URL, model ID, and optional API key.
Add an API key, choose a model, and use cloud translation when you want more quality or broader model choice.
Audio Translation can use OpenRouter STT models to transcribe live system audio before sending recognized text through the normal translation backend.
Use your existing OpenRouter API key to read translated text or original OCR text aloud with selectable voice and speech speed.
Built-in languages cover the normal workflow, with additional flexibility available through Ollama, Custom Local API, and OpenRouter.
Review previous translated lines in a separate panel, similar to VN-style dialogue history.
Save sessions to TXT and choose between translation-only or original-plus-translation output.
Session recording filters repeated lines so long transcripts stay readable instead of filling with repeated OCR noise.
Save and load named layouts for different games, videos, emulators, or reading setups.
Hide most of the interface and keep only the translated output visible when you want less UI clutter.
Switch between standard, game/pixel-text, and manga-oriented OCR behavior depending on the content.
Optionally trim excessive empty margins before OCR when subtitles occupy only a small part of the capture region.
Audio Translation includes latency presets, with Stable recommended by default for better transcription quality.
Choose CPU, GPU, or auto for OCR instead of relying on a hidden hardware choice.
Choose CPU, GPU, or auto for local translation models when you want speed or compatibility control.
Reduce refresh pressure when you are playing or reading on weaker hardware.
Pause processing automatically when the text stops changing instead of wasting resources on static scenes.
An experimental alternate capture path can improve real-time OCR alignment above 100% scaling on supported Windows setups.
Adjust sensitivity, minimum auto-refresh delay, and anti-flicker behavior instead of relying on hidden defaults.
Adjust theme, custom text color, background color, and UI color for the real-time translation window.
Lens Mode has separate text color, background color, and background opacity settings so overlay reading stays legible.
Manual Lens OCR behavior, Continuous Lens controls, Hide After timing, and overlay readability controls now live in dedicated settings instead of being buried in general performance options.
Trigger Minimal Mode, hide icon, toggle mode, quick snapshot, select area, toggle overlay, and load presets through hotkeys.
Font size and outline thickness are exposed so the translated output remains readable over bright or noisy content.
This page is based on Thaluna's actual current feature set as implemented and documented in the app itself: beta Audio Translation, session transcript recording, snapshot/Lens controls, presets, hardware settings, shortcut handling, Ollama/OpenRouter configuration, and backlog behavior. It is meant to describe the real product surface, not generic marketing copy.
The best way to judge Thaluna is to test the exact workflow you care about: live dialogue, snapshots, backlog, transcripts, local models, and hardware settings on your own machine.