YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SmolVLA

Run SmolVLA optimized for Qualcomm Dragonwing IQ9 device's NPU with nexaSDK.

Quickstart

Install NexaSDK and create a free account at sdk.nexa.ai

Activate your device with your access token:

nexa config set license '<access_token>'

Run the model on Qualcomm NPU in one line:
```
nexa infer NexaAI/smolVLA-npu
```

Input: Enter input folder path,
Output: Returns result in npy file, or report error if any required input cannot be found

Model Description

SmolVLA is a lightweight Vision-Language-Action (VLA) model built for efficient multimodal understanding and real-time control.
Developed by the Hugging Face Smol team, it unifies vision, language, and action into one coherent model that can perceive, reason, and act — enabling autonomous agents and robotics to run entirely on local hardware.

Features

🧠 Unified Perception-to-Action — Combines visual understanding, natural language reasoning, and control generation.
⚡ Lightweight & Fast — Designed for real-time inference on laptops, edge boards, and NPUs.
👁️ Grounded Visual Reasoning — Links language instructions with specific visual elements and spatial context.
🧩 Zero-Shot Multimodal Tasks — Performs visual question answering, task planning, and grounding without retraining.
🔧 Extensible & Open — Compatible with robotics frameworks and multimodal datasets for custom fine-tuning.

Use Cases

Embodied AI: End-to-end perception-action loops for robotics and simulation.
On-Device Agents: Multimodal assistants that process camera feeds locally.
Autonomous Systems: Real-time visual reasoning in automotive or IoT devices.
Research: Alignment studies and grounded reasoning experiments.
Simulation Control: Vision-driven policy generation for digital twins or VR.

Inputs and Outputs

Input

Image(s) or video frames
Optional text instruction or query

Output

Action vector or control command
Optional textual reasoning or visual grounding map

License

This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact [email protected].

Downloads last month: 15

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/smolVLA-npu

Qualcomm NPU

Collection

Latest SOTA models supported on Qualcomm NPU. • 24 items • Updated 6 days ago • 3