Post
81
Try CUA GUI Operator 🖥️ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.
● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Other related multimodal spaces
● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en
I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.
To know more about it, visit the app page or the respective model page!
● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Other related multimodal spaces
● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en
I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.
To know more about it, visit the app page or the respective model page!