Spaces:
Runtime error
Runtime error
| license: mit | |
| title: SmolVLM2 Real-Time Captioning Demo | |
| sdk: gradio | |
| colorFrom: green | |
| colorTo: blue | |
| short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp | |
| sdk_version: 5.44.1 | |
| # SmolVLM2 Real-Time Captioning Demo | |
| This Hugging Face Spaces app uses **Gradio v5 Blocks** to capture your webcam feed every *N* milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame. | |
| ## Features | |
| * **CPU-only inference** via `llama-cpp-python` wrapping `llama.cpp`. | |
| * **Gradio live streaming** for low-latency, browser-native video input. | |
| * **Adjustable interval slider** (100 ms to 10 s) for frame capture frequency. | |
| * **Automatic GGUF model download** from Hugging Face Hub when missing. | |
| * **Debug logging** in the terminal for tracing each inference step. | |
| ## Setup | |
| 1. **Clone this repository** | |
| ```bash | |
| git clone <your-space-repo-url> | |
| cd <your-space-repo-name> | |
| ``` | |
| 2. **Install dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **(Optional) Pre-download model files** | |
| These will be automatically downloaded if absent: | |
| * `SmolVLM2-500M-Video-Instruct.Q8_0.gguf` | |
| * `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf` | |
| To skip downloads, place both GGUF files in the repo root. | |
| ## Usage | |
| 1. **Launch the app**: | |
| ```bash | |
| python app.py | |
| ``` | |
| 2. **Open your browser** at the URL shown in the terminal (e.g. `http://127.0.0.1:7860`). | |
| 3. **Allow webcam access** when prompted. | |
| 4. **Adjust the capture interval** using the slider in the UI. | |
| 5. **Live captions** will appear below each video frame. | |
| ## File Structure | |
| * `app.py` β Main Gradio v5 Blocks application. | |
| * `requirements.txt` β Python dependencies. | |
| * `.gguf` model files (auto-downloaded or user-provided). | |
| ## License |