簡俊能 commited on
Commit
4540e30
·
1 Parent(s): ac50ff4

Update HF model card to bilingual EN+ZH with 3B + 8bit info

Browse files
Files changed (1) hide show
  1. README.md +212 -15
README.md CHANGED
@@ -17,9 +17,215 @@ license: agpl-3.0
17
 
18
  # MLX-Video-OCR-DeepSeek-Apple-Silicon
19
 
20
- **🎯 Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI**
21
 
22
- 這是一個針對 **Apple Silicon (M1/M2/M3/M4)** 優化的本地 OCR 應用,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  基於 `deepseek-ai/DeepSeek-OCR` 與 MLX 生態,提供:
24
 
25
  - 📹 **影片截圖 + OCR**(從影片自動抽幀再做 OCR)
@@ -31,7 +237,9 @@ license: agpl-3.0
31
 
32
  > 權重不在本 repo 中,而是透過 `mlx-community/DeepSeek-OCR-8bit` 自動下載並快取到本機。
33
 
34
- 在目前以 `deepseek-ai/DeepSeek-OCR` 為基底的專案中,本方案聚焦於 **Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面**,屬於偏應用層的整合解決方案,而非單純「只提供模型權重」的 repo。
 
 
35
 
36
  ---
37
 
@@ -44,7 +252,7 @@ license: agpl-3.0
44
 
45
  也就是說:
46
 
47
- - 🧠 **模型能力**:沿用 DeepSeek-OCR 的結構與效果
48
  - 💾 **儲存體積**:使用 8bit 量化,適合 Mac 本地環境
49
  - ⚡ **執行效率**:在 Apple Silicon 上搭配 MLX + Metal GPU,加速推理
50
 
@@ -214,14 +422,3 @@ _model_instance, _processor_instance = load(model_path)
214
 
215
  - 本 repo(GUI + backend)的 AGPL-3.0
216
  - `deepseek-ai/DeepSeek-OCR` 與 `mlx-community/DeepSeek-OCR-8bit` 的授權條款
217
-
218
- {
219
- "cells": [],
220
- "metadata": {
221
- "language_info": {
222
- "name": "python"
223
- }
224
- },
225
- "nbformat": 4,
226
- "nbformat_minor": 2
227
- }
 
17
 
18
  # MLX-Video-OCR-DeepSeek-Apple-Silicon
19
 
20
+ 🎯 **One-click Mac deployment · 📹 Video / 📄 PDF / 🖼 Image 3-in-1 OCR · 🖥 Full local GUI**
21
 
22
+ This is a local OCR application optimized for **Apple Silicon (M1/M2/M3/M4)**,
23
+ built on top of `deepseek-ai/DeepSeek-OCR` and the MLX ecosystem. It provides:
24
+
25
+ - 📹 **Video frame extraction + OCR** (automatically samples frames from videos, then runs OCR)
26
+ - 📄 **PDF batch OCR** (supports multi-page PDFs, batch mode and single-page mode)
27
+ - 🖼 **Image OCR** (documents, tables, handwriting, scene text)
28
+ - 🎨 **Image pre-processing** (auto-rotation, enhancement, de-shadow, background removal)
29
+ - 🖥 **Full Web GUI** (drag-and-drop upload, progress display, result preview)
30
+ - 🍎 **One-click Mac deployment** (`./start.sh` automatically sets up the environment and dependencies)
31
+
32
+ Weights are **not** re-uploaded in this repo. Instead, they are automatically downloaded and cached locally via `mlx-community/DeepSeek-OCR-8bit`.
33
+
34
+ In the current ecosystem of projects based on `deepseek-ai/DeepSeek-OCR`, this solution focuses on
35
+ **Mac Apple Silicon local deployment + unified Video/PDF/Image workflow + a complete GUI**,
36
+ acting as an application-layer integration rather than “just another weights-only model repo”.
37
+
38
+ ---
39
+
40
+ ## 🧮 Precision & Weights (3B + 8bit)
41
+
42
+ This project **does not re-upload any weights**, but directly uses:
43
+
44
+ - Base model: `deepseek-ai/DeepSeek-OCR` (around **3B parameters**)
45
+ - MLX quantized version: `mlx-community/DeepSeek-OCR-8bit`
46
+
47
+ In practice, this means:
48
+
49
+ - 🧠 **Model capability**: Leverages the original DeepSeek-OCR architecture and performance
50
+ - 💾 **Storage footprint**: 8bit quantization makes it suitable for local Mac environments
51
+ - ⚡ **Runtime efficiency**: Uses MLX + Metal GPU on Apple Silicon for accelerated inference
52
+
53
+ If you need:
54
+
55
+ - **Maximum precision / research use** → Use [`deepseek-ai/DeepSeek-OCR`](https://huggingface.co/deepseek-ai/DeepSeek-OCR) directly
56
+ - **Practical Mac local tooling** → Use this project + `mlx-community/DeepSeek-OCR-8bit` to run the full Video/PDF/Image workflow via GUI
57
+
58
+ ---
59
+
60
+ ## 🔗 Project & Base Models
61
+
62
+ - **Base model**: [`deepseek-ai/DeepSeek-OCR`](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
63
+ - **MLX quantized version**: [`mlx-community/DeepSeek-OCR-8bit`](https://huggingface.co/mlx-community/DeepSeek-OCR-8bit)
64
+ - **Local application source code (GUI + backend)**:
65
+ [`matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon`](https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon)
66
+
67
+ This Hugging Face model card is mainly intended to:
68
+
69
+ - Document this project as a **local GUI / deployment example** built on `deepseek-ai/DeepSeek-OCR`
70
+ - Make it easy to discover this **Mac GUI solution** when searching for `base_model: deepseek-ai/DeepSeek-OCR`
71
+
72
+ ---
73
+
74
+ ## ✨ Features
75
+
76
+ ### 🎬 Video OCR
77
+
78
+ - Automatically extracts key frames from videos (MP4 / AVI / MOV / MKV / WebM)
79
+ - Sends all extracted frames to DeepSeek-OCR in batches
80
+ - Supports:
81
+ - Frame preview
82
+ - Batch download of frames
83
+ - “Frames → OCR” one-click workflow
84
+
85
+ ### 📄 PDF OCR (Multi-page Batch)
86
+
87
+ - Supports **multi-page PDF batch processing**
88
+ - Two modes:
89
+ - **Batch mode**: process the document in batches of N pages
90
+ - **Single-page mode**: precisely select specific pages
91
+ - Provides:
92
+ - PDF thumbnail preview
93
+ - Page selection, progress display, pause/resume/cancel controls
94
+
95
+ ### 🖼 Image OCR
96
+
97
+ - Supports common formats: PNG / JPG / JPEG
98
+ - Multiple scenarios:
99
+ - Documents, tables, academic content
100
+ - Handwriting
101
+ - Street signs / shop signs / product packaging
102
+ - Output formats:
103
+ - Markdown
104
+ - LaTeX (math formulas)
105
+ - Plain text
106
+
107
+ ### 🎨 Image Pre-processing
108
+
109
+ Built-in presets (scan optimize, photo enhance, background removal, etc.) including:
110
+
111
+ - Auto-rotation (deskew)
112
+ - Contrast enhancement + sharpening
113
+ - Shadow removal
114
+ - Binarization
115
+ - Background removal (via `rembg` or an OpenCV-based fallback pipeline)
116
+
117
+ Pre-processing can:
118
+
119
+ - Batch process multiple images
120
+ - Package processed results into a ZIP file for download
121
+ - Send processed images **directly into the OCR pipeline** with one click
122
+
123
+ ---
124
+
125
+ ## 🖥 GUI Overview
126
+
127
+ - ✅ Single-page Web GUI (Flask + vanilla JS + Tailwind)
128
+ - ✅ Drag-and-drop upload
129
+ - ✅ Thumbnails for images / PDFs
130
+ - ✅ Batch progress bar and status text
131
+ - ✅ Result panel supports:
132
+ - One-click copy
133
+ - Downloading result files
134
+ - ✅ Responsive design: works well on desktops, laptops, and tablets
135
+
136
+ ---
137
+
138
+ ## 🍎 Mac One-click Deployment
139
+
140
+ ### Requirements
141
+
142
+ - macOS 13.0+
143
+ - Apple Silicon (M1 / M2 / M3 / M4)
144
+ - Python 3.11+
145
+ - Recommended RAM: ≥ 16GB
146
+
147
+ ### One-click Install & Run (Recommended)
148
+
149
+ ```bash
150
+ # 1. Choose an install directory
151
+ cd ~/Downloads # or cd ~ / cd ~/Documents / any location you prefer
152
+
153
+ # 2. Clone the project
154
+ git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
155
+ cd MLX-Video-OCR-DeepSeek-Apple-Silicon
156
+
157
+ # 3. One-click start (creates venv, installs deps, finds a free port)
158
+ ./start.sh
159
+ ```
160
+
161
+ After startup, open your browser at:
162
+
163
+ - `http://localhost:5000` (or another port between 5000–5010 if 5000 is taken)
164
+
165
+ ---
166
+
167
+ ## ⚙️ Model Download & Caching
168
+
169
+ Internally, the application does:
170
+
171
+ ```python
172
+ os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
173
+ model_path = "mlx-community/DeepSeek-OCR-8bit"
174
+ _model_instance, _processor_instance = load(model_path)
175
+ ```
176
+
177
+ This means:
178
+
179
+ - On **first use**, it downloads `mlx-community/DeepSeek-OCR-8bit` from Hugging Face
180
+ - Download location: `~/hf_cache/`
181
+ - Subsequent runs, even from different project directories, **reuse the same local model cache** and do not re-download
182
+
183
+ ---
184
+
185
+ ## 🔒 Privacy & Local Execution
186
+
187
+ - All inference (video frame extraction, PDF processing, image OCR) runs **entirely on your machine**
188
+ - No documents or images are uploaded to any external servers
189
+ - Weights and cache are stored under your user directory (e.g. `~/hf_cache/`)
190
+
191
+ ---
192
+
193
+ ## 📦 Use Cases
194
+
195
+ This project is ideal if you want to run DeepSeek-OCR on **Mac + Apple Silicon** and:
196
+
197
+ - Prefer a **visual GUI** instead of pure scripts
198
+ - Want **one-click startup** without manual environment setup
199
+ - Need to handle **Video + PDF + Image** in a single workflow
200
+ - Require all data to remain **on-device**, with no cloud dependency
201
+
202
+ ---
203
+
204
+ ## 🧩 Development & Contributions
205
+
206
+ Source code and issues:
207
+
208
+ - GitHub: [`matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon`](https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon)
209
+ - Issues: bug reports and feature requests are welcome
210
+
211
+ ---
212
+
213
+ ## 📜 License
214
+
215
+ This application uses the **AGPL-3.0** license.
216
+
217
+ Please also respect the licenses of:
218
+
219
+ - This repo (GUI + backend) under AGPL-3.0
220
+ - `deepseek-ai/DeepSeek-OCR` and `mlx-community/DeepSeek-OCR-8bit` as published on Hugging Face
221
+
222
+ ---
223
+
224
+ ## 繁體中文說明
225
+
226
+ 🎯 **Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI**
227
+
228
+ 這是一個針對 **Apple Silicon (M1/M2/M3/M4)** 優化的本地 OCR 應用,
229
  基於 `deepseek-ai/DeepSeek-OCR` 與 MLX 生態,提供:
230
 
231
  - 📹 **影片截圖 + OCR**(從影片自動抽幀再做 OCR)
 
237
 
238
  > 權重不在本 repo 中,而是透過 `mlx-community/DeepSeek-OCR-8bit` 自動下載並快取到本機。
239
 
240
+ 在目前以 `deepseek-ai/DeepSeek-OCR` 為基底的專案中,本方案聚焦於
241
+ **Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面**,
242
+ 屬於偏應用層的整合解決方案,而非單純「只提供模型權重」的 repo。
243
 
244
  ---
245
 
 
252
 
253
  也就是說:
254
 
255
+ - 🧠 **模型能力**:沿用 DeepSeek-OCR 的構造與效果
256
  - 💾 **儲存體積**:使用 8bit 量化,適合 Mac 本地環境
257
  - ⚡ **執行效率**:在 Apple Silicon 上搭配 MLX + Metal GPU,加速推理
258
 
 
422
 
423
  - 本 repo(GUI + backend)的 AGPL-3.0
424
  - `deepseek-ai/DeepSeek-OCR` 與 `mlx-community/DeepSeek-OCR-8bit` 的授權條款