YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SmolVLA

Run SmolVLA optimized for Qualcomm Dragonwing IQ9 device's NPU with nexaSDK.

Quickstart

  1. Install NexaSDK and create a free account at sdk.nexa.ai

  2. Activate your device with your access token:

    nexa config set license '<access_token>'
    
  3. Run the model on Qualcomm NPU in one line:

    nexa infer NexaAI/smolVLA-npu
    
  • Input: Enter input folder path,
  • Output: Returns result in npy file, or report error if any required input cannot be found

Model Description

SmolVLA is a lightweight Vision-Language-Action (VLA) model built for efficient multimodal understanding and real-time control.
Developed by the Hugging Face Smol team, it unifies vision, language, and action into one coherent model that can perceive, reason, and act — enabling autonomous agents and robotics to run entirely on local hardware.

Features

  • 🧠 Unified Perception-to-Action — Combines visual understanding, natural language reasoning, and control generation.
  • Lightweight & Fast — Designed for real-time inference on laptops, edge boards, and NPUs.
  • 👁️ Grounded Visual Reasoning — Links language instructions with specific visual elements and spatial context.
  • 🧩 Zero-Shot Multimodal Tasks — Performs visual question answering, task planning, and grounding without retraining.
  • 🔧 Extensible & Open — Compatible with robotics frameworks and multimodal datasets for custom fine-tuning.

Use Cases

  • Embodied AI: End-to-end perception-action loops for robotics and simulation.
  • On-Device Agents: Multimodal assistants that process camera feeds locally.
  • Autonomous Systems: Real-time visual reasoning in automotive or IoT devices.
  • Research: Alignment studies and grounded reasoning experiments.
  • Simulation Control: Vision-driven policy generation for digital twins or VR.

Inputs and Outputs

Input

  • Image(s) or video frames
  • Optional text instruction or query

Output

  • Action vector or control command
  • Optional textual reasoning or visual grounding map

License

This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact [email protected].

Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/smolVLA-npu