Post
143
Try CUA GUI Operator π₯οΈ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.
β CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
β Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Other related multimodal spaces
β Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
β Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
β Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en
I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.
To know more about it, visit the app page or the respective model page!
β CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
β Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Other related multimodal spaces
β Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
β Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
β Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en
I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.
To know more about it, visit the app page or the respective model page!