簡俊能
commited on
Commit
·
4540e30
1
Parent(s):
ac50ff4
Update HF model card to bilingual EN+ZH with 3B + 8bit info
Browse files
README.md
CHANGED
|
@@ -17,9 +17,215 @@ license: agpl-3.0
|
|
| 17 |
|
| 18 |
# MLX-Video-OCR-DeepSeek-Apple-Silicon
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
基於 `deepseek-ai/DeepSeek-OCR` 與 MLX 生態,提供:
|
| 24 |
|
| 25 |
- 📹 **影片截圖 + OCR**(從影片自動抽幀再做 OCR)
|
|
@@ -31,7 +237,9 @@ license: agpl-3.0
|
|
| 31 |
|
| 32 |
> 權重不在本 repo 中,而是透過 `mlx-community/DeepSeek-OCR-8bit` 自動下載並快取到本機。
|
| 33 |
|
| 34 |
-
在目前以 `deepseek-ai/DeepSeek-OCR` 為基底的專案中,本方案聚焦於
|
|
|
|
|
|
|
| 35 |
|
| 36 |
---
|
| 37 |
|
|
@@ -44,7 +252,7 @@ license: agpl-3.0
|
|
| 44 |
|
| 45 |
也就是說:
|
| 46 |
|
| 47 |
-
- 🧠 **模型能力**:沿用 DeepSeek-OCR
|
| 48 |
- 💾 **儲存體積**:使用 8bit 量化,適合 Mac 本地環境
|
| 49 |
- ⚡ **執行效率**:在 Apple Silicon 上搭配 MLX + Metal GPU,加速推理
|
| 50 |
|
|
@@ -214,14 +422,3 @@ _model_instance, _processor_instance = load(model_path)
|
|
| 214 |
|
| 215 |
- 本 repo(GUI + backend)的 AGPL-3.0
|
| 216 |
- `deepseek-ai/DeepSeek-OCR` 與 `mlx-community/DeepSeek-OCR-8bit` 的授權條款
|
| 217 |
-
|
| 218 |
-
{
|
| 219 |
-
"cells": [],
|
| 220 |
-
"metadata": {
|
| 221 |
-
"language_info": {
|
| 222 |
-
"name": "python"
|
| 223 |
-
}
|
| 224 |
-
},
|
| 225 |
-
"nbformat": 4,
|
| 226 |
-
"nbformat_minor": 2
|
| 227 |
-
}
|
|
|
|
| 17 |
|
| 18 |
# MLX-Video-OCR-DeepSeek-Apple-Silicon
|
| 19 |
|
| 20 |
+
🎯 **One-click Mac deployment · 📹 Video / 📄 PDF / 🖼 Image 3-in-1 OCR · 🖥 Full local GUI**
|
| 21 |
|
| 22 |
+
This is a local OCR application optimized for **Apple Silicon (M1/M2/M3/M4)**,
|
| 23 |
+
built on top of `deepseek-ai/DeepSeek-OCR` and the MLX ecosystem. It provides:
|
| 24 |
+
|
| 25 |
+
- 📹 **Video frame extraction + OCR** (automatically samples frames from videos, then runs OCR)
|
| 26 |
+
- 📄 **PDF batch OCR** (supports multi-page PDFs, batch mode and single-page mode)
|
| 27 |
+
- 🖼 **Image OCR** (documents, tables, handwriting, scene text)
|
| 28 |
+
- 🎨 **Image pre-processing** (auto-rotation, enhancement, de-shadow, background removal)
|
| 29 |
+
- 🖥 **Full Web GUI** (drag-and-drop upload, progress display, result preview)
|
| 30 |
+
- 🍎 **One-click Mac deployment** (`./start.sh` automatically sets up the environment and dependencies)
|
| 31 |
+
|
| 32 |
+
Weights are **not** re-uploaded in this repo. Instead, they are automatically downloaded and cached locally via `mlx-community/DeepSeek-OCR-8bit`.
|
| 33 |
+
|
| 34 |
+
In the current ecosystem of projects based on `deepseek-ai/DeepSeek-OCR`, this solution focuses on
|
| 35 |
+
**Mac Apple Silicon local deployment + unified Video/PDF/Image workflow + a complete GUI**,
|
| 36 |
+
acting as an application-layer integration rather than “just another weights-only model repo”.
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## 🧮 Precision & Weights (3B + 8bit)
|
| 41 |
+
|
| 42 |
+
This project **does not re-upload any weights**, but directly uses:
|
| 43 |
+
|
| 44 |
+
- Base model: `deepseek-ai/DeepSeek-OCR` (around **3B parameters**)
|
| 45 |
+
- MLX quantized version: `mlx-community/DeepSeek-OCR-8bit`
|
| 46 |
+
|
| 47 |
+
In practice, this means:
|
| 48 |
+
|
| 49 |
+
- 🧠 **Model capability**: Leverages the original DeepSeek-OCR architecture and performance
|
| 50 |
+
- 💾 **Storage footprint**: 8bit quantization makes it suitable for local Mac environments
|
| 51 |
+
- ⚡ **Runtime efficiency**: Uses MLX + Metal GPU on Apple Silicon for accelerated inference
|
| 52 |
+
|
| 53 |
+
If you need:
|
| 54 |
+
|
| 55 |
+
- **Maximum precision / research use** → Use [`deepseek-ai/DeepSeek-OCR`](https://huggingface.co/deepseek-ai/DeepSeek-OCR) directly
|
| 56 |
+
- **Practical Mac local tooling** → Use this project + `mlx-community/DeepSeek-OCR-8bit` to run the full Video/PDF/Image workflow via GUI
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## 🔗 Project & Base Models
|
| 61 |
+
|
| 62 |
+
- **Base model**: [`deepseek-ai/DeepSeek-OCR`](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
|
| 63 |
+
- **MLX quantized version**: [`mlx-community/DeepSeek-OCR-8bit`](https://huggingface.co/mlx-community/DeepSeek-OCR-8bit)
|
| 64 |
+
- **Local application source code (GUI + backend)**:
|
| 65 |
+
[`matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon`](https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon)
|
| 66 |
+
|
| 67 |
+
This Hugging Face model card is mainly intended to:
|
| 68 |
+
|
| 69 |
+
- Document this project as a **local GUI / deployment example** built on `deepseek-ai/DeepSeek-OCR`
|
| 70 |
+
- Make it easy to discover this **Mac GUI solution** when searching for `base_model: deepseek-ai/DeepSeek-OCR`
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## ✨ Features
|
| 75 |
+
|
| 76 |
+
### 🎬 Video OCR
|
| 77 |
+
|
| 78 |
+
- Automatically extracts key frames from videos (MP4 / AVI / MOV / MKV / WebM)
|
| 79 |
+
- Sends all extracted frames to DeepSeek-OCR in batches
|
| 80 |
+
- Supports:
|
| 81 |
+
- Frame preview
|
| 82 |
+
- Batch download of frames
|
| 83 |
+
- “Frames → OCR” one-click workflow
|
| 84 |
+
|
| 85 |
+
### 📄 PDF OCR (Multi-page Batch)
|
| 86 |
+
|
| 87 |
+
- Supports **multi-page PDF batch processing**
|
| 88 |
+
- Two modes:
|
| 89 |
+
- **Batch mode**: process the document in batches of N pages
|
| 90 |
+
- **Single-page mode**: precisely select specific pages
|
| 91 |
+
- Provides:
|
| 92 |
+
- PDF thumbnail preview
|
| 93 |
+
- Page selection, progress display, pause/resume/cancel controls
|
| 94 |
+
|
| 95 |
+
### 🖼 Image OCR
|
| 96 |
+
|
| 97 |
+
- Supports common formats: PNG / JPG / JPEG
|
| 98 |
+
- Multiple scenarios:
|
| 99 |
+
- Documents, tables, academic content
|
| 100 |
+
- Handwriting
|
| 101 |
+
- Street signs / shop signs / product packaging
|
| 102 |
+
- Output formats:
|
| 103 |
+
- Markdown
|
| 104 |
+
- LaTeX (math formulas)
|
| 105 |
+
- Plain text
|
| 106 |
+
|
| 107 |
+
### 🎨 Image Pre-processing
|
| 108 |
+
|
| 109 |
+
Built-in presets (scan optimize, photo enhance, background removal, etc.) including:
|
| 110 |
+
|
| 111 |
+
- Auto-rotation (deskew)
|
| 112 |
+
- Contrast enhancement + sharpening
|
| 113 |
+
- Shadow removal
|
| 114 |
+
- Binarization
|
| 115 |
+
- Background removal (via `rembg` or an OpenCV-based fallback pipeline)
|
| 116 |
+
|
| 117 |
+
Pre-processing can:
|
| 118 |
+
|
| 119 |
+
- Batch process multiple images
|
| 120 |
+
- Package processed results into a ZIP file for download
|
| 121 |
+
- Send processed images **directly into the OCR pipeline** with one click
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## 🖥 GUI Overview
|
| 126 |
+
|
| 127 |
+
- ✅ Single-page Web GUI (Flask + vanilla JS + Tailwind)
|
| 128 |
+
- ✅ Drag-and-drop upload
|
| 129 |
+
- ✅ Thumbnails for images / PDFs
|
| 130 |
+
- ✅ Batch progress bar and status text
|
| 131 |
+
- ✅ Result panel supports:
|
| 132 |
+
- One-click copy
|
| 133 |
+
- Downloading result files
|
| 134 |
+
- ✅ Responsive design: works well on desktops, laptops, and tablets
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## 🍎 Mac One-click Deployment
|
| 139 |
+
|
| 140 |
+
### Requirements
|
| 141 |
+
|
| 142 |
+
- macOS 13.0+
|
| 143 |
+
- Apple Silicon (M1 / M2 / M3 / M4)
|
| 144 |
+
- Python 3.11+
|
| 145 |
+
- Recommended RAM: ≥ 16GB
|
| 146 |
+
|
| 147 |
+
### One-click Install & Run (Recommended)
|
| 148 |
+
|
| 149 |
+
```bash
|
| 150 |
+
# 1. Choose an install directory
|
| 151 |
+
cd ~/Downloads # or cd ~ / cd ~/Documents / any location you prefer
|
| 152 |
+
|
| 153 |
+
# 2. Clone the project
|
| 154 |
+
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
|
| 155 |
+
cd MLX-Video-OCR-DeepSeek-Apple-Silicon
|
| 156 |
+
|
| 157 |
+
# 3. One-click start (creates venv, installs deps, finds a free port)
|
| 158 |
+
./start.sh
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
After startup, open your browser at:
|
| 162 |
+
|
| 163 |
+
- `http://localhost:5000` (or another port between 5000–5010 if 5000 is taken)
|
| 164 |
+
|
| 165 |
+
---
|
| 166 |
+
|
| 167 |
+
## ⚙️ Model Download & Caching
|
| 168 |
+
|
| 169 |
+
Internally, the application does:
|
| 170 |
+
|
| 171 |
+
```python
|
| 172 |
+
os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
|
| 173 |
+
model_path = "mlx-community/DeepSeek-OCR-8bit"
|
| 174 |
+
_model_instance, _processor_instance = load(model_path)
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
This means:
|
| 178 |
+
|
| 179 |
+
- On **first use**, it downloads `mlx-community/DeepSeek-OCR-8bit` from Hugging Face
|
| 180 |
+
- Download location: `~/hf_cache/`
|
| 181 |
+
- Subsequent runs, even from different project directories, **reuse the same local model cache** and do not re-download
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
## 🔒 Privacy & Local Execution
|
| 186 |
+
|
| 187 |
+
- All inference (video frame extraction, PDF processing, image OCR) runs **entirely on your machine**
|
| 188 |
+
- No documents or images are uploaded to any external servers
|
| 189 |
+
- Weights and cache are stored under your user directory (e.g. `~/hf_cache/`)
|
| 190 |
+
|
| 191 |
+
---
|
| 192 |
+
|
| 193 |
+
## 📦 Use Cases
|
| 194 |
+
|
| 195 |
+
This project is ideal if you want to run DeepSeek-OCR on **Mac + Apple Silicon** and:
|
| 196 |
+
|
| 197 |
+
- Prefer a **visual GUI** instead of pure scripts
|
| 198 |
+
- Want **one-click startup** without manual environment setup
|
| 199 |
+
- Need to handle **Video + PDF + Image** in a single workflow
|
| 200 |
+
- Require all data to remain **on-device**, with no cloud dependency
|
| 201 |
+
|
| 202 |
+
---
|
| 203 |
+
|
| 204 |
+
## 🧩 Development & Contributions
|
| 205 |
+
|
| 206 |
+
Source code and issues:
|
| 207 |
+
|
| 208 |
+
- GitHub: [`matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon`](https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon)
|
| 209 |
+
- Issues: bug reports and feature requests are welcome
|
| 210 |
+
|
| 211 |
+
---
|
| 212 |
+
|
| 213 |
+
## 📜 License
|
| 214 |
+
|
| 215 |
+
This application uses the **AGPL-3.0** license.
|
| 216 |
+
|
| 217 |
+
Please also respect the licenses of:
|
| 218 |
+
|
| 219 |
+
- This repo (GUI + backend) under AGPL-3.0
|
| 220 |
+
- `deepseek-ai/DeepSeek-OCR` and `mlx-community/DeepSeek-OCR-8bit` as published on Hugging Face
|
| 221 |
+
|
| 222 |
+
---
|
| 223 |
+
|
| 224 |
+
## 繁體中文說明
|
| 225 |
+
|
| 226 |
+
🎯 **Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI**
|
| 227 |
+
|
| 228 |
+
這是一個針對 **Apple Silicon (M1/M2/M3/M4)** 優化的本地 OCR 應用,
|
| 229 |
基於 `deepseek-ai/DeepSeek-OCR` 與 MLX 生態,提供:
|
| 230 |
|
| 231 |
- 📹 **影片截圖 + OCR**(從影片自動抽幀再做 OCR)
|
|
|
|
| 237 |
|
| 238 |
> 權重不在本 repo 中,而是透過 `mlx-community/DeepSeek-OCR-8bit` 自動下載並快取到本機。
|
| 239 |
|
| 240 |
+
在目前以 `deepseek-ai/DeepSeek-OCR` 為基底的專案中,本方案聚焦於
|
| 241 |
+
**Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面**,
|
| 242 |
+
屬於偏應用層的整合解決方案,而非單純「只提供模型權重」的 repo。
|
| 243 |
|
| 244 |
---
|
| 245 |
|
|
|
|
| 252 |
|
| 253 |
也就是說:
|
| 254 |
|
| 255 |
+
- 🧠 **模型能力**:沿用 DeepSeek-OCR 的構造與效果
|
| 256 |
- 💾 **儲存體積**:使用 8bit 量化,適合 Mac 本地環境
|
| 257 |
- ⚡ **執行效率**:在 Apple Silicon 上搭配 MLX + Metal GPU,加速推理
|
| 258 |
|
|
|
|
| 422 |
|
| 423 |
- 本 repo(GUI + backend)的 AGPL-3.0
|
| 424 |
- `deepseek-ai/DeepSeek-OCR` 與 `mlx-community/DeepSeek-OCR-8bit` 的授權條款
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|