matica0902
/

MLX-Video-OCR-DeepSeek-Apple-Silicon

@@ -17,9 +17,215 @@ license: agpl-3.0
 # MLX-Video-OCR-DeepSeek-Apple-Silicon
-**🎯 Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI**
-這是一個針對 **Apple Silicon (M1/M2/M3/M4)** 優化的本地 OCR 應用，
 基於 `deepseek-ai/DeepSeek-OCR` 與 MLX 生態，提供：
 - 📹 **影片截圖 + OCR**（從影片自動抽幀再做 OCR）
@@ -31,7 +237,9 @@ license: agpl-3.0
 > 權重不在本 repo 中，而是透過 `mlx-community/DeepSeek-OCR-8bit` 自動下載並快取到本機。
-在目前以 `deepseek-ai/DeepSeek-OCR` 為基底的專案中，本方案聚焦於 **Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面**，屬於偏應用層的整合解決方案，而非單純「只提供模型權重」的 repo。
 ---
@@ -44,7 +252,7 @@ license: agpl-3.0
 也就是說：
-- 🧠 **模型能力**：沿用 DeepSeek-OCR 的結構與效果
 - 💾 **儲存體積**：使用 8bit 量化，適合 Mac 本地環境
 - ⚡ **執行效率**：在 Apple Silicon 上搭配 MLX + Metal GPU，加速推理
@@ -214,14 +422,3 @@ _model_instance, _processor_instance = load(model_path)
 - 本 repo（GUI + backend）的 AGPL-3.0
 - `deepseek-ai/DeepSeek-OCR` 與 `mlx-community/DeepSeek-OCR-8bit` 的授權條款
-{
-  "cells": [],
-  "metadata": {
-    "language_info": {
-      "name": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 2
-}

 # MLX-Video-OCR-DeepSeek-Apple-Silicon
+🎯 **One-click Mac deployment · 📹 Video / 📄 PDF / 🖼 Image 3-in-1 OCR · 🖥 Full local GUI**
+This is a local OCR application optimized for **Apple Silicon (M1/M2/M3/M4)**,
+built on top of `deepseek-ai/DeepSeek-OCR` and the MLX ecosystem. It provides:
+- 📹 **Video frame extraction + OCR** (automatically samples frames from videos, then runs OCR)
+- 📄 **PDF batch OCR** (supports multi-page PDFs, batch mode and single-page mode)
+- 🖼 **Image OCR** (documents, tables, handwriting, scene text)
+- 🎨 **Image pre-processing** (auto-rotation, enhancement, de-shadow, background removal)
+- 🖥 **Full Web GUI** (drag-and-drop upload, progress display, result preview)
+- 🍎 **One-click Mac deployment** (`./start.sh` automatically sets up the environment and dependencies)
+Weights are **not** re-uploaded in this repo. Instead, they are automatically downloaded and cached locally via `mlx-community/DeepSeek-OCR-8bit`.
+In the current ecosystem of projects based on `deepseek-ai/DeepSeek-OCR`, this solution focuses on
+**Mac Apple Silicon local deployment + unified Video/PDF/Image workflow + a complete GUI**,
+acting as an application-layer integration rather than “just another weights-only model repo”.
+---
+## 🧮 Precision & Weights (3B + 8bit)
+This project **does not re-upload any weights**, but directly uses:
+- Base model: `deepseek-ai/DeepSeek-OCR` (around **3B parameters**)
+- MLX quantized version: `mlx-community/DeepSeek-OCR-8bit`
+In practice, this means:
+- 🧠 **Model capability**: Leverages the original DeepSeek-OCR architecture and performance
+- 💾 **Storage footprint**: 8bit quantization makes it suitable for local Mac environments
+- ⚡ **Runtime efficiency**: Uses MLX + Metal GPU on Apple Silicon for accelerated inference
+If you need:
+- **Maximum precision / research use** → Use [`deepseek-ai/DeepSeek-OCR`](https://huggingface.co/deepseek-ai/DeepSeek-OCR) directly
+- **Practical Mac local tooling** → Use this project + `mlx-community/DeepSeek-OCR-8bit` to run the full Video/PDF/Image workflow via GUI
+---
+## 🔗 Project & Base Models
+- **Base model**: [`deepseek-ai/DeepSeek-OCR`](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
+- **MLX quantized version**: [`mlx-community/DeepSeek-OCR-8bit`](https://huggingface.co/mlx-community/DeepSeek-OCR-8bit)
+- **Local application source code (GUI + backend)**:
+  [`matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon`](https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon)
+This Hugging Face model card is mainly intended to:
+- Document this project as a **local GUI / deployment example** built on `deepseek-ai/DeepSeek-OCR`
+- Make it easy to discover this **Mac GUI solution** when searching for `base_model: deepseek-ai/DeepSeek-OCR`
+---
+## ✨ Features
+### 🎬 Video OCR
+- Automatically extracts key frames from videos (MP4 / AVI / MOV / MKV / WebM)
+- Sends all extracted frames to DeepSeek-OCR in batches
+- Supports:
+  - Frame preview
+  - Batch download of frames
+  - “Frames → OCR” one-click workflow
+### 📄 PDF OCR (Multi-page Batch)
+- Supports **multi-page PDF batch processing**
+- Two modes:
+  - **Batch mode**: process the document in batches of N pages
+  - **Single-page mode**: precisely select specific pages
+- Provides:
+  - PDF thumbnail preview
+  - Page selection, progress display, pause/resume/cancel controls
+### 🖼 Image OCR
+- Supports common formats: PNG / JPG / JPEG
+- Multiple scenarios:
+  - Documents, tables, academic content
+  - Handwriting
+  - Street signs / shop signs / product packaging
+- Output formats:
+  - Markdown
+  - LaTeX (math formulas)
+  - Plain text
+### 🎨 Image Pre-processing
+Built-in presets (scan optimize, photo enhance, background removal, etc.) including:
+- Auto-rotation (deskew)
+- Contrast enhancement + sharpening
+- Shadow removal
+- Binarization
+- Background removal (via `rembg` or an OpenCV-based fallback pipeline)
+Pre-processing can:
+- Batch process multiple images
+- Package processed results into a ZIP file for download
+- Send processed images **directly into the OCR pipeline** with one click
+---
+## 🖥 GUI Overview
+- ✅ Single-page Web GUI (Flask + vanilla JS + Tailwind)
+- ✅ Drag-and-drop upload
+- ✅ Thumbnails for images / PDFs
+- ✅ Batch progress bar and status text
+- ✅ Result panel supports:
+  - One-click copy
+  - Downloading result files
+- ✅ Responsive design: works well on desktops, laptops, and tablets
+---
+## 🍎 Mac One-click Deployment
+### Requirements
+- macOS 13.0+
+- Apple Silicon (M1 / M2 / M3 / M4)
+- Python 3.11+
+- Recommended RAM: ≥ 16GB
+### One-click Install & Run (Recommended)
+```bash
+# 1. Choose an install directory
+cd ~/Downloads          # or cd ~ / cd ~/Documents / any location you prefer
+# 2. Clone the project
+git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
+cd MLX-Video-OCR-DeepSeek-Apple-Silicon
+# 3. One-click start (creates venv, installs deps, finds a free port)
+./start.sh
+```
+After startup, open your browser at:
+- `http://localhost:5000` (or another port between 5000–5010 if 5000 is taken)
+---
+## ⚙️ Model Download & Caching
+Internally, the application does:
+```python
+os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
+model_path = "mlx-community/DeepSeek-OCR-8bit"
+_model_instance, _processor_instance = load(model_path)
+```
+This means:
+- On **first use**, it downloads `mlx-community/DeepSeek-OCR-8bit` from Hugging Face
+- Download location: `~/hf_cache/`
+- Subsequent runs, even from different project directories, **reuse the same local model cache** and do not re-download
+---
+## 🔒 Privacy & Local Execution
+- All inference (video frame extraction, PDF processing, image OCR) runs **entirely on your machine**
+- No documents or images are uploaded to any external servers
+- Weights and cache are stored under your user directory (e.g. `~/hf_cache/`)
+---
+## 📦 Use Cases
+This project is ideal if you want to run DeepSeek-OCR on **Mac + Apple Silicon** and:
+- Prefer a **visual GUI** instead of pure scripts
+- Want **one-click startup** without manual environment setup
+- Need to handle **Video + PDF + Image** in a single workflow
+- Require all data to remain **on-device**, with no cloud dependency
+---
+## 🧩 Development & Contributions
+Source code and issues:
+- GitHub: [`matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon`](https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon)
+- Issues: bug reports and feature requests are welcome
+---
+## 📜 License
+This application uses the **AGPL-3.0** license.
+Please also respect the licenses of:
+- This repo (GUI + backend) under AGPL-3.0
+- `deepseek-ai/DeepSeek-OCR` and `mlx-community/DeepSeek-OCR-8bit` as published on Hugging Face
+---
+## 繁體中文說明
+🎯 **Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI**
+這是一個針對 **Apple Silicon (M1/M2/M3/M4)** 優化的本地 OCR 應用，
 基於 `deepseek-ai/DeepSeek-OCR` 與 MLX 生態，提供：
 - 📹 **影片截圖 + OCR**（從影片自動抽幀再做 OCR）
 > 權重不在本 repo 中，而是透過 `mlx-community/DeepSeek-OCR-8bit` 自動下載並快取到本機。
+在目前以 `deepseek-ai/DeepSeek-OCR` 為基底的專案中，本方案聚焦於
+**Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面**，
+屬於偏應用層的整合解決方案，而非單純「只提供模型權重」的 repo。
 ---
 也就是說：
+- 🧠 **模型能力**：沿用 DeepSeek-OCR 的構造與效果
 - 💾 **儲存體積**：使用 8bit 量化，適合 Mac 本地環境
 - ⚡ **執行效率**：在 Apple Silicon 上搭配 MLX + Metal GPU，加速推理
 - 本 repo（GUI + backend）的 AGPL-3.0
 - `deepseek-ai/DeepSeek-OCR` 與 `mlx-community/DeepSeek-OCR-8bit` 的授權條款