--- license: apache-2.0 language: - de - en base_model: Qwen/Qwen2-VL-2B-Instruct tags: - ocr - german - vision - document-understanding - invoice - qwen2-vl pipeline_tag: image-text-to-text library_name: transformers datasets: - neuralabs/german-synth-ocr ---

German-OCR Logo

German-OCR

High-performance German document OCR using fine-tuned Qwen2-VL-2B & Qwen2.5-VL-3B vision-language model

## Model Description German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents. It outputs structured text in Markdown format. - **Base Model**: Qwen/Qwen2-VL-2B-Instruct - **Fine-tuning**: QLoRA (4-bit quantization) - **Training Data**: German invoices and business documents - **Output Format**: Markdown structured text ## Model Variants | Model | Size | Base | HuggingFace | |-------|------|------|-------------| | german-ocr | 4.4 GB | Qwen2-VL-2B | [Keyven/german-ocr](https://huggingface.co/Keyven/german-ocr) | | german-ocr-3b | 7.5 GB | Qwen2.5-VL-3B | [Keyven/german-ocr-3b](https://huggingface.co/Keyven/german-ocr-3b) | ## Usage ### Option 1: Python Package (Recommended) ```bash pip install german-ocr ``` ```python from german_ocr import GermanOCR # Using Ollama (fast, local) ocr = GermanOCR(backend="ollama") result = ocr.extract("document.png") print(result) # Using Transformers (more accurate) ocr = GermanOCR(backend="transformers") result = ocr.extract("document.png") print(result) ``` ### Option 2: Ollama [!WARNING] > **In Entwicklung** - Vision-Adapter Kompatibilität wird noch bearbeitet. Für stabile Nutzung: [HuggingFace-Version](https://huggingface.co/Keyven/german-ocr) empfohlen. ```bash ollama run Keyvan/german-ocr "Extrahiere den Text: image.png" ``` ### Option 3: Transformers ```python from transformers import Qwen2VLForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info from PIL import Image model = Qwen2VLForConditionalGeneration.from_pretrained( "Keyven/german-ocr", device_map="auto" ) processor = AutoProcessor.from_pretrained("Keyven/german-ocr") image = Image.open("document.png") messages = [{ "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "Extrahiere den Text aus diesem Dokument."} ] }] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt" ).to(model.device) output_ids = model.generate(**inputs, max_new_tokens=512) result = processor.batch_decode( output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True )[0] print(result) ``` ## Performance | Metric | Value | |--------|-------| | Base Model | Qwen2-VL-2B-Instruct | | Model Size | 4.4 GB | | VRAM (4-bit) | 1.5 GB | | Inference Time | ~15s (GPU) | ## Training - **Method**: QLoRA (4-bit quantization) - **Epochs**: 3 - **Learning Rate**: 2e-4 - **LoRA Rank**: 64 - **Target Modules**: All linear layers ## Limitations - Optimized for German documents - Best results with clear, high-resolution images - May struggle with handwritten text ## License Apache 2.0 ## Author **Keyvan Hardani** - Website: [keyvan.ai](https://keyvan.ai) - LinkedIn: [linkedin.com/in/keyvanhardani](https://www.linkedin.com/in/keyvanhardani/) - GitHub: [@Keyvanhardani](https://github.com/Keyvanhardani) ## Links - [GitHub](https://github.com/Keyvanhardani/german-ocr) - [Ollama](https://ollama.com/Keyvan/german-ocr) - [HuggingFace](https://huggingface.co/Keyven/german-ocr)