Historical Document Layout Detection Model (Co-DETR / DINO)

A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages.

This model is a more advanced successor to earlier Mask R-CNN-based approaches cdhu-uu/SweMPer-layout-lite, offering improved detection performance and robustness on complex layouts.

This model was developed as part of the research project:
Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011
(Project ID: IN22-0017), funded by Riksbankens Jubileumsfond.

Project page:
https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper

Model Details

  • Model type: Co-DINO (Vision Transformer backbone)
  • Framework: MMDetection
  • Fine-tuned for: Historical document layout analysis
  • Language of source documents: Swedish
  • Strengths: Improved detection Precision on complex layouts

Supported Labels

Label
Advertisement
Author
Header or Footer
Image
List
Page Number
Table
Text
Title

Evaluation Metrics

The evaluation metrics for this model are as follows:

AP AP50 AP75 APs APm APl
80.7 98.4 87.4 51.5 69.6 88.2

Usage

Installation

Find installation and finetuning instructions at:
https://github.com/Sense-X/Co-DETR?tab=readme-ov-file

Inference

import cv2
import layoutparser as lp
import matplotlib.pyplot as plt
from mmdet.apis import init_detector, inference_detector

# Configuration
config_file = "co_dino_5scale_vit_large_coco.py"
checkpoint_file = "SweMPer-layout.pth"
score_thr = 0.50
device = "cuda:0"

# Initialize model
model = init_detector(config_file, checkpoint_file, device=device)

# Get class names from model
def get_classes(model):
    m = getattr(model, "module", model)
    classes = getattr(m, "CLASSES", None)
    if classes:
        return list(classes)
    meta = getattr(m, "dataset_meta", None)
    if meta and isinstance(meta, dict) and "classes" in meta:
        return list(meta["classes"])
    return None

classes = get_classes(model)

# Convert MMDet results to LayoutParser layout
def mmdet_to_layout(result, classes, thr=0.50):
    bbox_result = result[0] if isinstance(result, tuple) else result
    blocks = []
    for cls_id, dets in enumerate(bbox_result):
        if dets is None or len(dets) == 0:
            continue
        cls_name = classes[cls_id].lower() if classes else str(cls_id)
        for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
            rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
            blocks.append(
                lp.TextBlock(block=rect, type=cls_name, score=float(score))
            )
    return lp.Layout(blocks)

# Run inference
image_path = "<path_to_image>"
result = inference_detector(model, image_path)
layout = mmdet_to_layout(result, classes, thr=score_thr)

# Print detected elements
for block in layout:
    print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")

# Visualize results
image = cv2.imread(image_path)[..., ::-1]  # BGR to RGB
viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
plt.figure(figsize=(12, 16))
plt.imshow(viz)
plt.axis("off")
plt.show()

Acknowledgements

This work was carried out within the project:
Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011
(Project ID: IN22-0017), funded by Riksbankens Jubileumsfond.

We gratefully acknowledge the support of the funder and project collaborators.

This model builds upon the excellent work of:

We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including cdhu-uu/SweMPer-layout