Sunbelt Computer Software

📄 DocTranslator - 文档翻译工具

将英文文档翻译成中文，保留原始格式、图片、表格和排版。

支持 PDF / DOCX / Markdown / TXT，提供 Web 界面上传即翻译。

✨ 特性

🎯 格式保留 — 粗体、斜体、颜色、表格、列表、页眉页脚全部保留
🖼️ 图片不丢失 — DOCX 和 PDF 中的图片、图形原封不动
🌐 多引擎支持 — DeepSeek（国内推荐）/ Claude / Google Translate
⚡ 批量翻译 — 一次 API 调用处理整页文本，42 页 PDF 仅需 90 秒
📂 多格式 — PDF、DOCX、Markdown、TXT

🚀 快速开始

1. 安装依赖

pip install -r requirements.txt

2. 配置 API Key

本项目默认使用 DeepSeek API（国内直连，新用户送免费额度）。

获取 Key：https://platform.deepseek.com/

# 方式 A：环境变量（推荐）
export DEEPSEEK_API_KEY="sk-your-key-here"

# 方式 B：修改 config_example.json 为 config.json 并填入 key

3. 启动

python app.py

浏览器打开 **http://127.0.0.1:5000**，拖拽文档即可翻译。

🔧 翻译引擎对比

📁 项目结构

├── app.py                  # Flask Web 应用
├── config_example.json     # 配置文件模板
├── requirements.txt        # Python 依赖
├── translators/            # 翻译引擎
│   ├── base.py            # 抽象基类
│   ├── deepseek_translator.py  # DeepSeek（默认）
│   ├── claude_translator.py    # Claude API
│   └── google_translator.py    # Google Translate
├── parsers/                # 文档解析器
│   ├── docx_parser.py     # DOCX 格式保留
│   ├── pdf_parser.py      # PDF 原位覆盖
│   ├── markdown_parser.py # Markdown 结构保留
│   └── txt_parser.py      # 纯文本
├── static/                 # CSS / JS
├── templates/              # HTML 模板
├── uploads/                # 上传暂存
└── outputs/                # 翻译输出

📝 技术实现

DOCX

使用 python-docx 原生 API 遍历每个 run，仅修改文本内容。对包含图片的 run 做 XML 级别保护，只替换 <w:t> 节点，不动 <w:drawing>。翻译后自动设置 eastAsia 字体确保中文正确渲染。

PDF

使用 PyMuPDF 将原页面复制到新 PDF → 按 y 坐标合并同行单词为完整句子 → DeepSeek 批量翻译 → 红action 覆盖原文 → insert_htmlbox 插入中文。所有图片、矢量图形、非文本元素完整保留。

Markdown

正则识别代码块、链接、图片、frontmatter 等结构，仅翻译自然语言文本行，inline code 和 URL 原样保留。

引擎	国内可用	需要 Key	费用	质量
DeepSeek	✅	需要	新用户免费额度	⭐⭐⭐⭐
Claude	✅	需要	付费	⭐⭐⭐⭐⭐
Google	❌	不需要	免费	⭐⭐⭐

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 DocTranslator - 文档翻译工具

✨ 特性

🚀 快速开始

1. 安装依赖

2. 配置 API Key

3. 启动

🔧 翻译引擎对比

📁 项目结构

📝 技术实现

DOCX

PDF

Markdown

📄 许可证

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
outputs		outputs
parsers		parsers
static		static
templates		templates
translators		translators
uploads		uploads
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config_example.json		config_example.json
requirements.txt		requirements.txt

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

📄 DocTranslator - 文档翻译工具

✨ 特性

🚀 快速开始

1. 安装依赖

2. 配置 API Key

3. 启动

🔧 翻译引擎对比

📁 项目结构

📝 技术实现

DOCX

PDF

Markdown

📄 许可证

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages