Requires Ollama: Slave nodes rely on Ollama for inference. Each translation machine needs Ollama installed with a translation model pulled.
What it solves: Local batch translation, data never leaves your LAN, leverage multiple machines in parallel. (中文版)
- Privacy: Contracts, medical records, technical docs — you don't want them on the cloud
- Too slow: Batch translating hundreds of pages on one machine takes hours
- Old hardware: A single weak machine can't handle a decent LLM fast enough
Multiple machines translate in parallel. Master routes. Data stays local.
- Machine A runs Master: receives requests → round-robins to idle workers
- Machines B/C/D run Slave: each uses Ollama to load a translation model, translates in parallel
- All communication is LAN-only REST. No external network needed for translation.
Client ──POST /translate──> Master :8000 ──round-robin──> Slave :8001 (Ollama)
Slave :8002 (Ollama)
Slave :8003 (Ollama)
Each Slave machine needs the following steps:
Visit https://ollama.com/download to download the installer for your OS.
After installation, verify in your terminal:
ollama --versionIf Ollama is installed but not running, install_slave.bat will try to start ollama serve automatically.
This project uses hunyuan-mt:1.8b-q4 by default (Tencent Hunyuan translation model, 1.8B params, Q4 quantized, ~1.1GB, good EN↔ZH quality).
Option A: Direct ollama pull (recommended)
ollama pull hunyuan-mt:1.8b-q4Verify:
ollama list
# You should see: hunyuan-mt:1.8b-q4
Option B: Download GGUF from HuggingFace community, then import (if ollama pull doesn't work)
- Download the GGUF file from HuggingFace (use mirror if needed):
# Download GGUF (adjust URL if using mirror)
curl -L -o HY-MT1.5-1.8B-Q4_K_M.gguf ^
"https://hf-mirror.com/AngelSlim/Hy-MT1.5-1.8B-Q4_K_M/resolve/main/Hy-MT1.5-1.8B-Q4_K_M.gguf"- Place the GGUF file in the project root (same folder as
Modelfile.hunyuan), then import:
ollama create hunyuan-mt:1.8b-q4 -f Modelfile.hunyuanOption C: Use a different model
You can use any model available on Ollama:
ollama pull qwen2.5:1.5b-instruct-q4_K_M # Qwen2.5 (general-purpose, also translates)
ollama pull llama3.2:3b-instruct-q4_K_M # Llama 3.2 (strong English)If you change the model, update slave/config.yaml line 13:
ollama:
model: "your-model-name" # e.g. "qwen2.5:1.5b-instruct-q4_K_M"# Check model is listed
ollama list
# Quick test
ollama run hunyuan-mt:1.8b-q4 "Translate: Hello world -> Chinese"Each translation machine: run install_slave.bat
↓
6-step self-test passes → shows local IP
↓
Copy the IP to the Master machine
↓
Master machine: run install_master.bat
↓
Enter Slave IPs line by line → Master verifies connections
↓
Ready
Make sure Ollama + hunyuan-mt:1.8b-q4 is ready (see above), then:
install_slave.batThe script automatically:
- Checks Python 64-bit
- Installs pip dependencies (fastapi uvicorn httpx pydantic pyyaml)
- Detects/starts Ollama + confirms
hunyuan-mt:1.8b-q4model exists - Starts Slave service in background (port 8001, logs in
logs/slave.log) - Runs 5-paragraph parallel translation self-test
- ✅ On success, displays the local IP
install_master.batThe script automatically:
- Checks Python 64-bit
- Installs pip dependencies
- Starts Master in background (port 8000, logs in
logs/master.log) - Self-tests the
/healthendpoint - Prompts for Slave IPs line by line (e.g.
192.168.1.101:8001, Enter to finish) - Writes to
master/config.yamland pings each Slave
# Install on other machines that need to access Master
pip install localtrans
localtrans-save http://192.168.1.100:8000from localtrans import translate
print(translate("Hello world", target_lang="zh")) # 你好世界from client import TranslatorClient
client = TranslatorClient("http://<Master_IP>:8000")
# Simple translation
result = client.translate("Hello world", target_lang="zh")
print(result.translated_text) # 你好世界
# With glossary (force specific translations)
result = client.translate(
"We use PyTorch for deep learning research.",
target_lang="zh",
glossary={"PyTorch": "PyTorch框架", "deep learning": "深度学习"}
)
print(result.translated_text) # 我们使用PyTorch框架进行深度学习研究。
# Multi-language (French → Chinese)
result = client.translate("Bonjour monde", target_lang="zh", source_lang="fr")
# Batch translation
texts = ["Hello", "Good morning", "Thank you"]
for text in texts:
r = client.translate(text, target_lang="zh")
print(r.translated_text)# Simple translation
curl -X POST http://localhost:8000/translate ^
-H "Content-Type: application/json" ^
-d "{\"text\":\"Hello world\",\"target_lang\":\"zh\"}"http://localhost:8000/docs
FastAPI auto-generates an interactive docs page. Test it right in the browser.
| Endpoint | Method | Description |
|---|---|---|
/translate |
POST | Translate |
/health |
GET | All node health status (includes Ollama model readiness) |
/slaves |
GET | Slave node list |
/models |
GET | All models from all slaves |
A: Slave depends on Ollama for inference — translation won't work without it. Install from https://ollama.com/download
A: Download GGUF via HuggingFace mirror and import:
curl -L -o HY-MT1.5-1.8B-Q4_K_M.gguf ^
"https://hf-mirror.com/AngelSlim/Hy-MT1.5-1.8B-Q4_K_M/resolve/main/Hy-MT1.5-1.8B-Q4_K_M.gguf"
ollama create hunyuan-mt:1.8b-q4 -f Modelfile.hunyuanA: Make sure the model name is consistent in these 3 places:
ollama listoutputslave/config.yamlline 13ollama.modelinstall_slave.batline 19set OLLAMA_MODEL=
Default model name: hunyuan-mt:1.8b-q4.
A: Make sure they're on the same LAN and firewall allows the ports:
New-NetFirewallRule -DisplayName "DistraPorts" -Direction Inbound -LocalPort 8000,8001,8002 -Protocol TCP -Action AllowA: Try a smaller quantization:
| Quantization | Size | RAM needed |
|---|---|---|
| Q4_K_M (recommended) | 1.1GB | ~2GB |
| Q2_K | 0.6GB | ~1GB |
| FP16 | 3.6GB | ~4GB |
A: Check Ollama status:
ollama list # Is the model in the list?
ollama ps # Is the model loaded in memory?A: Free up disk space:
# Anaconda pkgs cache (can free 10GB+)
Remove-Item "$env:USERPROFILE\.conda\pkgs" -Recurse -Force -ErrorAction SilentlyContinue
# Temp files
Remove-Item "$env:TEMP\*" -Recurse -Force -ErrorAction SilentlyContinueTest: Intel i5-10210U + 8GB RAM + Ollama +
hunyuan-mt:1.8b-q4
Adding more Slave machines roughly scales throughput linearly.
分布式翻译/
├── master/ # Master (routing + health check)
├── slave/ # Slave (calls Ollama for translation)
├── common/ # Shared protocol (schemas.py)
├── localtrans/ # Python client package (pip install localtrans)
├── scripts/ # Self-test helpers
├── install_slave.bat # Slave 6-step self-test + start (requires Ollama)
├── install_master.bat # Master config + start
├── Modelfile.hunyuan # Ollama Hunyuan model import example
├── logs/ # Runtime logs (auto-created, git-ignored)
└── README_zh.md # 中文版
