MLX-Video-OCR-DeepSeek-Apple-Silicon

🎯 One-click Mac deployment · 📹 Video / 📄 PDF / 🖼 Image 3-in-1 OCR · 🖥 Full local GUI

This is a local OCR application optimized for Apple Silicon (M1/M2/M3/M4),
built on top of deepseek-ai/DeepSeek-OCR and the MLX ecosystem. It provides:

📹 Video frame extraction + OCR (automatically samples frames from videos, then runs OCR)
📄 PDF batch OCR (supports multi-page PDFs, batch mode and single-page mode)
🖼 Image OCR (documents, tables, handwriting, scene text)
🎨 Image pre-processing (auto-rotation, enhancement, de-shadow, background removal)
🖥 Full Web GUI (drag-and-drop upload, progress display, result preview)
🍎 One-click Mac deployment (./start.sh automatically sets up the environment and dependencies)

Weights are not re-uploaded in this repo. Instead, they are automatically downloaded and cached locally via mlx-community/DeepSeek-OCR-8bit.

In the current ecosystem of projects based on deepseek-ai/DeepSeek-OCR, this solution focuses on
Mac Apple Silicon local deployment + unified Video/PDF/Image workflow + a complete GUI,
acting as an application-layer integration rather than “just another weights-only model repo”.

🧮 Precision & Weights (3B + 8bit)

This project does not re-upload any weights, but directly uses:

Base model: deepseek-ai/DeepSeek-OCR (around 3B parameters)
MLX quantized version: mlx-community/DeepSeek-OCR-8bit

In practice, this means:

🧠 Model capability: Leverages the original DeepSeek-OCR architecture and performance
💾 Storage footprint: 8bit quantization makes it suitable for local Mac environments
⚡ Runtime efficiency: Uses MLX + Metal GPU on Apple Silicon for accelerated inference

If you need:

Maximum precision / research use → Use deepseek-ai/DeepSeek-OCR directly
Practical Mac local tooling → Use this project + mlx-community/DeepSeek-OCR-8bit to run the full Video/PDF/Image workflow via GUI

🔗 Project & Base Models

Base model: deepseek-ai/DeepSeek-OCR
MLX quantized version: mlx-community/DeepSeek-OCR-8bit
Local application source code (GUI + backend):
matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon

This Hugging Face model card is mainly intended to:

Document this project as a local GUI / deployment example built on deepseek-ai/DeepSeek-OCR
Make it easy to discover this Mac GUI solution when searching for base_model: deepseek-ai/DeepSeek-OCR

✨ Features

🎬 Video OCR

Automatically extracts key frames from videos (MP4 / AVI / MOV / MKV / WebM)
Sends all extracted frames to DeepSeek-OCR in batches
Supports:
- Frame preview
- Batch download of frames
- “Frames → OCR” one-click workflow

📄 PDF OCR (Multi-page Batch)

Supports multi-page PDF batch processing
Two modes:
- Batch mode: process the document in batches of N pages
- Single-page mode: precisely select specific pages
Provides:
- PDF thumbnail preview
- Page selection, progress display, pause/resume/cancel controls

🖼 Image OCR

Supports common formats: PNG / JPG / JPEG
Multiple scenarios:
- Documents, tables, academic content
- Handwriting
- Street signs / shop signs / product packaging
Output formats:
- Markdown
- LaTeX (math formulas)
- Plain text

🎨 Image Pre-processing

Built-in presets (scan optimize, photo enhance, background removal, etc.) including:

Auto-rotation (deskew)
Contrast enhancement + sharpening
Shadow removal
Binarization
Background removal (via rembg or an OpenCV-based fallback pipeline)

Pre-processing can:

Batch process multiple images
Package processed results into a ZIP file for download
Send processed images directly into the OCR pipeline with one click

🖥 GUI Overview

✅ Single-page Web GUI (Flask + vanilla JS + Tailwind)
✅ Drag-and-drop upload
✅ Thumbnails for images / PDFs
✅ Batch progress bar and status text
✅ Result panel supports:
- One-click copy
- Downloading result files
✅ Responsive design: works well on desktops, laptops, and tablets

🍎 Mac One-click Deployment

Requirements

macOS 13.0+
Apple Silicon (M1 / M2 / M3 / M4)
Python 3.11+
Recommended RAM: ≥ 16GB

One-click Install & Run (Recommended)

# 1. Choose an install directory
cd ~/Downloads          # or cd ~ / cd ~/Documents / any location you prefer

# 2. Clone the project
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
cd MLX-Video-OCR-DeepSeek-Apple-Silicon

# 3. One-click start (creates venv, installs deps, finds a free port)
./start.sh

After startup, open your browser at:

http://localhost:5000 (or another port between 5000–5010 if 5000 is taken)

⚙️ Model Download & Caching

Internally, the application does:

os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
model_path = "mlx-community/DeepSeek-OCR-8bit"
_model_instance, _processor_instance = load(model_path)

This means:

On first use, it downloads mlx-community/DeepSeek-OCR-8bit from Hugging Face
Download location: ~/hf_cache/
Subsequent runs, even from different project directories, reuse the same local model cache and do not re-download

🔒 Privacy & Local Execution

All inference (video frame extraction, PDF processing, image OCR) runs entirely on your machine
No documents or images are uploaded to any external servers
Weights and cache are stored under your user directory (e.g. ~/hf_cache/)

📦 Use Cases

This project is ideal if you want to run DeepSeek-OCR on Mac + Apple Silicon and:

Prefer a visual GUI instead of pure scripts
Want one-click startup without manual environment setup
Need to handle Video + PDF + Image in a single workflow
Require all data to remain on-device, with no cloud dependency

🧩 Development & Contributions

Source code and issues:

GitHub: matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon
Issues: bug reports and feature requests are welcome

📜 License

This application uses the AGPL-3.0 license.

Please also respect the licenses of:

This repo (GUI + backend) under AGPL-3.0
deepseek-ai/DeepSeek-OCR and mlx-community/DeepSeek-OCR-8bit as published on Hugging Face

繁體中文說明

🎯 Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片三合一 OCR · 🖥 完整本地 GUI

這是一個針對 Apple Silicon (M1/M2/M3/M4) 優化的本地 OCR 應用，
基於 deepseek-ai/DeepSeek-OCR 與 MLX 生態，提供：

📹 影片截圖 + OCR（從影片自動抽幀再做 OCR）
📄 PDF 批次 OCR（支援多頁 PDF、批次/單頁模式）
🖼 圖片 OCR（含文件、表格、手寫、場景文字）
🎨 照片前處理（自動旋轉、增強、去陰影、去背）
🖥 完整 Web GUI（拖放上傳、進度條、結果預覽）
🍎 Mac 一鍵部署（./start.sh 自動完成環境與依賴）

權重不在本 repo 中，而是透過 mlx-community/DeepSeek-OCR-8bit 自動下載並快取到本機。

在目前以 deepseek-ai/DeepSeek-OCR 為基底的專案中，本方案聚焦於
Mac Apple Silicon 本地部署 + 影片/PDF/圖片三合一工作流 + 完整 GUI 介面，
屬於偏應用層的整合解決方案，而非單純「只提供模型權重」的 repo。

🧮 精度與權重說明（3B + 8bit）

本專案並 不重新上傳權重，而是直接使用：

基底模型：deepseek-ai/DeepSeek-OCR（約 3B 參數）
MLX 量化版本：mlx-community/DeepSeek-OCR-8bit

也就是說：

🧠 模型能力：沿用 DeepSeek-OCR 的構造與效果
💾 儲存體積：使用 8bit 量化，適合 Mac 本地環境
⚡ 執行效率：在 Apple Silicon 上搭配 MLX + Metal GPU，加速推理

如果你需要：

最高精度 / 研究用途 → 建議直接使用 deepseek-ai/DeepSeek-OCR
實務應用 / Mac 本地工具 → 建議使用本專案 + mlx-community/DeepSeek-OCR-8bit，在 GUI 中完成影片/PDF/圖片工作流

🔗 專案與基底模型

Base model: deepseek-ai/DeepSeek-OCR
MLX 量化版本: mlx-community/DeepSeek-OCR-8bit
本地應用程式原始碼 (GUI + 後端):
matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon

本 Hugging Face model 卡主要用來：

說明此專案是基於 deepseek-ai/DeepSeek-OCR 的 本地 GUI / 部署範例
讓使用者在搜尋 base_model: deepseek-ai/DeepSeek-OCR 時，可以找到這個 Mac GUI 解決方案

✨ 功能特色

🎬 影片 OCR（Video OCR）

從影片（MP4 / AVI / MOV / MKV / WebM）中 自動抽取關鍵幀
以批次方式將所有截圖送入 DeepSeek-OCR 做文字辨識
支援：
- 幀預覽
- 批次下載截圖
- 截圖 → 直接送往 OCR 流程

📄 PDF OCR（多頁批次）

支援 多頁 PDF 批次處理
兩種模式：
- 批次模式：每批 N 頁，一次跑完整份文件
- 單頁模式：精準選擇特定頁面
提供：
- PDF 縮圖預覽
- 頁面選擇、進度顯示、暫停/繼續/取消

🖼 圖片 OCR（Image OCR）

支援 PNG / JPG / JPEG 等常見圖片格式
多場景：
- 文檔、表格、學術內容
- 手寫文字
- 街景 / 招牌 / 產品包裝
可輸出：
- Markdown
- LaTeX（數學公式）
- 純文字

🎨 照片前處理

內建多種前處理 preset（掃描優化、照片優化、去背等），包含：

自動旋轉（校正傾斜）
對比度增強 + 銳化
去陰影
二值化
去背景（rembg 或 fallback OpenCV pipeline）

前處理可以：

批次處理多張圖片
處理後打包成 ZIP 下載
一鍵「送到 OCR」直接進入識別流程

🖥 GUI 介面概覽

✅ 單一頁面 Web GUI（Flask + 原生 JS + Tailwind）
✅ 拖放上傳區塊（Drag & Drop）
✅ 圖片 / PDF 縮圖預覽
✅ 批次進度條與文字狀態
✅ 結果區支援：
- 一鍵複製
- 下載結果檔案
✅ 響應式設計：桌機 / 筆電 / 平板皆可舒適使用

🍎 Mac 一鍵部署

系統需求

macOS 13.0+
Apple Silicon（M1 / M2 / M3 / M4）
Python 3.11+
RAM 建議 ≥ 16GB

一鍵安裝與啟動（推薦）

# 1. 選擇安裝目錄
cd ~/Downloads          # 或 cd ~ / cd ~/Documents / 任何你習慣放專案的位置

# 2. 克隆專案
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
cd MLX-Video-OCR-DeepSeek-Apple-Silicon

# 3. 一鍵啟動（自動建立 venv、安裝依賴、尋找可用端口）
./start.sh

啟動成功後，瀏覽器打開：

http://localhost:5000（或自動選擇 5000–5010 之間的可用端口）

⚙️ 模型下載與快取行為

程式內部使用：

os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
model_path = "mlx-community/DeepSeek-OCR-8bit"
_model_instance, _processor_instance = load(model_path)

這代表：

第一次使用時，會從 Hugging Face 下載 mlx-community/DeepSeek-OCR-8bit
下載位置：~/hf_cache/
之後再次啟動或在不同專案目錄執行時，都會共用同一份本地模型快取，不會重複下載

🔒 隱私與本地運行

所有推理（影片截圖、PDF 處理、圖片 OCR）皆在本地完成
不會將你的文件或圖片上傳到伺服器
模型權重與快取均存在你的使用者目錄下（例如：~/hf_cache/）

📦 適用情境

想要在 Mac + Apple Silicon 上跑 DeepSeek-OCR，並且：
- 希望有 可視化 GUI
- 希望 一鍵啟動，不想手動配環境
- 希望同時處理 影片 / PDF / 圖片
- 希望所有資料留在本機，不上雲

🧩 開發與貢獻

原始碼與 issue 請參考 GitHub repo：

GitHub: matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon
Issues: 歡迎回報錯誤、功能建議與 PR

📜 License

本應用程式使用 AGPL-3.0 授權。
請同時遵守：

本 repo（GUI + backend）的 AGPL-3.0
deepseek-ai/DeepSeek-OCR 與 mlx-community/DeepSeek-OCR-8bit 的授權條款

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon

Base model

deepseek-ai/DeepSeek-OCR

Finetuned

(95)

this model