arxiv:2602.10377

Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs

Published on Feb 10

· Submitted by

Authors:

Abstract

A hardware-software co-design framework for on-device large language model deployment that establishes accuracy-latency relationships through training loss modeling and roofline analysis, enabling rapid architecture selection and improved performance.

AI-generated summary

Vision-Language-Action Models (VLAs) have emerged as a key paradigm of Physical AI and are increasingly deployed in autonomous vehicles, robots, and smart spaces. In these resource-constrained on-device settings, selecting an appropriate large language model (LLM) backbone is a critical challenge: models must balance accuracy with strict inference latency and hardware efficiency constraints. This makes hardware-software co-design a game-changing requirement for on-device LLM deployment, where each hardware platform demands a tailored architectural solution. We propose a hardware co-design law that jointly captures model accuracy and inference performance. Specifically, we model training loss as an explicit function of architectural hyperparameters and characterise inference latency via roofline modelling. We empirically evaluate 1,942 candidate architectures on NVIDIA Jetson Orin, training 170 selected models for 10B tokens each to fit a scaling law relating architecture to training loss. By coupling this scaling law with latency modelling, we establish a direct accuracy-latency correspondence and identify the Pareto frontier for hardware co-designed LLMs. We further formulate architecture search as a joint optimisation over precision and performance, deriving feasible design regions under industrial hardware and application budgets. Our approach reduces architecture selection from months to days. At the same latency as Qwen2.5-0.5B on the target hardware, our co-designed architecture achieves 19.42% lower perplexity on WikiText-2. To our knowledge, this is the first principled and operational framework for hardware co-design scaling laws in on-device LLM deployment. We will make the code and related checkpoints publicly available.

View arXiv page View PDF Add to collection

Community

daven3

Paper submitter about 16 hours ago

🚩 Hardware is like a box — different sizes of boxes can accommodate different sizes of objects. The same holds for large language models: different hardware platforms impose different architectural constraints and opportunities.

This paper introduces Hardware Co-Design Scaling Law — a principled framework for designing large language models under strict edge-device constraints.

Instead of treating model accuracy and system latency as separate problems, the paper connects:

1️⃣ Architecture -> Training Loss
2️⃣ Architecture -> Inference Latency (via roofline modelling)
3️⃣ And ultimately derive the accuracy-latency Pareto frontier.

This transforms architecture selection from heuristic trial-and-error into a hardware-aware optimisation problem, reducing design cycles from months to days.

As on-device AI becomes increasingly important for autonomous systems, embodied intelligence, and smart environments, modelling–system co-design is no longer optional, which is foundational.

librarian-bot

about 11 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.10377 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.10377 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.10377 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.