MLX-Video-OCR-DeepSeek-Apple-Silicon

🎯 One-click Mac deployment · 📹 Video / 📄 PDF / 🖼 Image 3-in-1 OCR · 🖥 Full local GUI

This is a local OCR application optimized for Apple Silicon (M1/M2/M3/M4),
built on top of deepseek-ai/DeepSeek-OCR and the MLX ecosystem. It provides:

  • 📹 Video frame extraction + OCR (automatically samples frames from videos, then runs OCR)
  • 📄 PDF batch OCR (supports multi-page PDFs, batch mode and single-page mode)
  • 🖼 Image OCR (documents, tables, handwriting, scene text)
  • 🎨 Image pre-processing (auto-rotation, enhancement, de-shadow, background removal)
  • 🖥 Full Web GUI (drag-and-drop upload, progress display, result preview)
  • 🍎 One-click Mac deployment (./start.sh automatically sets up the environment and dependencies)

Weights are not re-uploaded in this repo. Instead, they are automatically downloaded and cached locally via mlx-community/DeepSeek-OCR-8bit.

In the current ecosystem of projects based on deepseek-ai/DeepSeek-OCR, this solution focuses on
Mac Apple Silicon local deployment + unified Video/PDF/Image workflow + a complete GUI,
acting as an application-layer integration rather than “just another weights-only model repo”.


🧮 Precision & Weights (3B + 8bit)

This project does not re-upload any weights, but directly uses:

  • Base model: deepseek-ai/DeepSeek-OCR (around 3B parameters)
  • MLX quantized version: mlx-community/DeepSeek-OCR-8bit

In practice, this means:

  • 🧠 Model capability: Leverages the original DeepSeek-OCR architecture and performance
  • 💾 Storage footprint: 8bit quantization makes it suitable for local Mac environments
  • Runtime efficiency: Uses MLX + Metal GPU on Apple Silicon for accelerated inference

If you need:

  • Maximum precision / research use → Use deepseek-ai/DeepSeek-OCR directly
  • Practical Mac local tooling → Use this project + mlx-community/DeepSeek-OCR-8bit to run the full Video/PDF/Image workflow via GUI

🔗 Project & Base Models

This Hugging Face model card is mainly intended to:

  • Document this project as a local GUI / deployment example built on deepseek-ai/DeepSeek-OCR
  • Make it easy to discover this Mac GUI solution when searching for base_model: deepseek-ai/DeepSeek-OCR

✨ Features

🎬 Video OCR

  • Automatically extracts key frames from videos (MP4 / AVI / MOV / MKV / WebM)
  • Sends all extracted frames to DeepSeek-OCR in batches
  • Supports:
    • Frame preview
    • Batch download of frames
    • “Frames → OCR” one-click workflow

📄 PDF OCR (Multi-page Batch)

  • Supports multi-page PDF batch processing
  • Two modes:
    • Batch mode: process the document in batches of N pages
    • Single-page mode: precisely select specific pages
  • Provides:
    • PDF thumbnail preview
    • Page selection, progress display, pause/resume/cancel controls

🖼 Image OCR

  • Supports common formats: PNG / JPG / JPEG
  • Multiple scenarios:
    • Documents, tables, academic content
    • Handwriting
    • Street signs / shop signs / product packaging
  • Output formats:
    • Markdown
    • LaTeX (math formulas)
    • Plain text

🎨 Image Pre-processing

Built-in presets (scan optimize, photo enhance, background removal, etc.) including:

  • Auto-rotation (deskew)
  • Contrast enhancement + sharpening
  • Shadow removal
  • Binarization
  • Background removal (via rembg or an OpenCV-based fallback pipeline)

Pre-processing can:

  • Batch process multiple images
  • Package processed results into a ZIP file for download
  • Send processed images directly into the OCR pipeline with one click

🖥 GUI Overview

  • ✅ Single-page Web GUI (Flask + vanilla JS + Tailwind)
  • ✅ Drag-and-drop upload
  • ✅ Thumbnails for images / PDFs
  • ✅ Batch progress bar and status text
  • ✅ Result panel supports:
    • One-click copy
    • Downloading result files
  • ✅ Responsive design: works well on desktops, laptops, and tablets

🍎 Mac One-click Deployment

Requirements

  • macOS 13.0+
  • Apple Silicon (M1 / M2 / M3 / M4)
  • Python 3.11+
  • Recommended RAM: ≥ 16GB

One-click Install & Run (Recommended)

# 1. Choose an install directory
cd ~/Downloads          # or cd ~ / cd ~/Documents / any location you prefer

# 2. Clone the project
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
cd MLX-Video-OCR-DeepSeek-Apple-Silicon

# 3. One-click start (creates venv, installs deps, finds a free port)
./start.sh

After startup, open your browser at:

  • http://localhost:5000 (or another port between 5000–5010 if 5000 is taken)

⚙️ Model Download & Caching

Internally, the application does:

os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
model_path = "mlx-community/DeepSeek-OCR-8bit"
_model_instance, _processor_instance = load(model_path)

This means:

  • On first use, it downloads mlx-community/DeepSeek-OCR-8bit from Hugging Face
  • Download location: ~/hf_cache/
  • Subsequent runs, even from different project directories, reuse the same local model cache and do not re-download

🔒 Privacy & Local Execution

  • All inference (video frame extraction, PDF processing, image OCR) runs entirely on your machine
  • No documents or images are uploaded to any external servers
  • Weights and cache are stored under your user directory (e.g. ~/hf_cache/)

📦 Use Cases

This project is ideal if you want to run DeepSeek-OCR on Mac + Apple Silicon and:

  • Prefer a visual GUI instead of pure scripts
  • Want one-click startup without manual environment setup
  • Need to handle Video + PDF + Image in a single workflow
  • Require all data to remain on-device, with no cloud dependency

🧩 Development & Contributions

Source code and issues:


📜 License

This application uses the AGPL-3.0 license.

Please also respect the licenses of:

  • This repo (GUI + backend) under AGPL-3.0
  • deepseek-ai/DeepSeek-OCR and mlx-community/DeepSeek-OCR-8bit as published on Hugging Face

繁體中文說明

🎯 Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI

這是一個針對 Apple Silicon (M1/M2/M3/M4) 優化的本地 OCR 應用,
基於 deepseek-ai/DeepSeek-OCR 與 MLX 生態,提供:

  • 📹 影片截圖 + OCR(從影片自動抽幀再做 OCR)
  • 📄 PDF 批次 OCR(支援多頁 PDF、批次/單頁模式)
  • 🖼 圖片 OCR(含文件、表格、手寫、場景文字)
  • 🎨 照片前處理(自動旋轉、增強、去陰影、去背)
  • 🖥 完整 Web GUI(拖放上傳、進度條、結果預覽)
  • 🍎 Mac 一鍵部署./start.sh 自動完成環境與依賴)

權重不在本 repo 中,而是透過 mlx-community/DeepSeek-OCR-8bit 自動下載並快取到本機。

在目前以 deepseek-ai/DeepSeek-OCR 為基底的專案中,本方案聚焦於
Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面
屬於偏應用層的整合解決方案,而非單純「只提供模型權重」的 repo。


🧮 精度與權重說明(3B + 8bit)

本專案並 不重新上傳權重,而是直接使用:

  • 基底模型:deepseek-ai/DeepSeek-OCR(約 3B 參數
  • MLX 量化版本:mlx-community/DeepSeek-OCR-8bit

也就是說:

  • 🧠 模型能力:沿用 DeepSeek-OCR 的構造與效果
  • 💾 儲存體積:使用 8bit 量化,適合 Mac 本地環境
  • 執行效率:在 Apple Silicon 上搭配 MLX + Metal GPU,加速推理

如果你需要:

  • 最高精度 / 研究用途 → 建議直接使用 deepseek-ai/DeepSeek-OCR
  • 實務應用 / Mac 本地工具 → 建議使用本專案 + mlx-community/DeepSeek-OCR-8bit,在 GUI 中完成影片/PDF/圖片工作流

🔗 專案與基底模型

本 Hugging Face model 卡主要用來:

  • 說明此專案是基於 deepseek-ai/DeepSeek-OCR本地 GUI / 部署範例
  • 讓使用者在搜尋 base_model: deepseek-ai/DeepSeek-OCR 時,可以找到這個 Mac GUI 解決方案

✨ 功能特色

🎬 影片 OCR(Video OCR)

  • 從影片(MP4 / AVI / MOV / MKV / WebM)中 自動抽取關鍵幀
  • 以批次方式將所有截圖送入 DeepSeek-OCR 做文字辨識
  • 支援:
    • 幀預覽
    • 批次下載截圖
    • 截圖 → 直接送往 OCR 流程

📄 PDF OCR(多頁批次)

  • 支援 多頁 PDF 批次處理
  • 兩種模式
    • 批次模式:每批 N 頁,一次跑完整份文件
    • 單頁模式:精準選擇特定頁面
  • 提供:
    • PDF 縮圖預覽
    • 頁面選擇、進度顯示、暫停/繼續/取消

🖼 圖片 OCR(Image OCR)

  • 支援 PNG / JPG / JPEG 等常見圖片格式
  • 多場景:
    • 文檔、表格、學術內容
    • 手寫文字
    • 街景 / 招牌 / 產品包裝
  • 可輸出:
    • Markdown
    • LaTeX(數學公式)
    • 純文字

🎨 照片前處理

內建多種前處理 preset(掃描優化、照片優化、去背等),包含:

  • 自動旋轉(校正傾斜)
  • 對比度增強 + 銳化
  • 去陰影
  • 二值化
  • 去背景(rembg 或 fallback OpenCV pipeline)

前處理可以:

  • 批次處理多張圖片
  • 處理後打包成 ZIP 下載
  • 一鍵「送到 OCR」直接進入識別流程

🖥 GUI 介面概覽

  • ✅ 單一頁面 Web GUI(Flask + 原生 JS + Tailwind)
  • ✅ 拖放上傳區塊(Drag & Drop)
  • ✅ 圖片 / PDF 縮圖預覽
  • ✅ 批次進度條與文字狀態
  • ✅ 結果區支援:
    • 一鍵複製
    • 下載結果檔案
  • ✅ 響應式設計:桌機 / 筆電 / 平板 皆可舒適使用

🍎 Mac 一鍵部署

系統需求

  • macOS 13.0+
  • Apple Silicon(M1 / M2 / M3 / M4)
  • Python 3.11+
  • RAM 建議 ≥ 16GB

一鍵安裝與啟動(推薦)

# 1. 選擇安裝目錄
cd ~/Downloads          # 或 cd ~ / cd ~/Documents / 任何你習慣放專案的位置

# 2. 克隆專案
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
cd MLX-Video-OCR-DeepSeek-Apple-Silicon

# 3. 一鍵啟動(自動建立 venv、安裝依賴、尋找可用端口)
./start.sh

啟動成功後,瀏覽器打開:

  • http://localhost:5000(或自動選擇 5000–5010 之間的可用端口)

⚙️ 模型下載與快取行為

程式內部使用:

os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
model_path = "mlx-community/DeepSeek-OCR-8bit"
_model_instance, _processor_instance = load(model_path)

這代表:

  • 第一次使用時,會從 Hugging Face 下載 mlx-community/DeepSeek-OCR-8bit
  • 下載位置:~/hf_cache/
  • 之後再次啟動或在不同專案目錄執行時,都會共用同一份本地模型快取,不會重複下載

🔒 隱私與本地運行

  • 所有推理(影片截圖、PDF 處理、圖片 OCR)皆在本地完成
  • 不會將你的文件或圖片上傳到伺服器
  • 模型權重與快取均存在你的使用者目錄下(例如:~/hf_cache/

📦 適用情境

  • 想要在 Mac + Apple Silicon 上跑 DeepSeek-OCR,並且:
    • 希望有 可視化 GUI
    • 希望 一鍵啟動,不想手動配環境
    • 希望同時處理 影片 / PDF / 圖片
    • 希望所有資料留在本機,不上雲

🧩 開發與貢獻

原始碼與 issue 請參考 GitHub repo:


📜 License

本應用程式使用 AGPL-3.0 授權。
請同時遵守:

  • 本 repo(GUI + backend)的 AGPL-3.0
  • deepseek-ai/DeepSeek-OCRmlx-community/DeepSeek-OCR-8bit 的授權條款
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon

Finetuned
(95)
this model