1. Start your local server
Start llama.cpp, LM Studio, or another OpenAI-compatible server before selecting Local API in Thaluna.
Thaluna 3.0 Preview
Thaluna 3.0 Preview can connect to a local OpenAI-compatible chat completion server. This is useful when you want to use llama.cpp, LM Studio, or another local model stack instead of the built-in models, Ollama, or OpenRouter.
http://127.0.0.1:8080/v14096-81920.1-0.3512-2048Example:
llama-server -m model.gguf --port 8080 --ctx-size 8192
Thaluna sends short OCR translation requests, so very large context windows are usually not needed.
Start llama.cpp, LM Studio, or another OpenAI-compatible server before selecting Local API in Thaluna.
In Thaluna, open Settings -> Ollama/Cloud and set the Custom Local API base URL. For llama.cpp on port 8080, use http://127.0.0.1:8080/v1.
Keep the model ID matching your local server. The default local-model is fine for many local servers.
Most localhost servers do not require an API key. Only fill the key field if your server expects one.
Open the translation model selector, choose Local API, pick your target language, and confirm.
If translation stalls, check the local server console first. A runaway generation can keep the request open until the model reaches its token or context limit.
131072 unless you know you need them, because they can increase VRAM/RAM usage and slow down local inference.