AI & ML interests
None defined yet.
prithivMLmods
posted
an
update
5 days ago
Post
2107
Dropping Image Edit (Object Manipulator): Add or remove specified objects/designs, with flexible support for both single-image and multi-image modes.
🤗 Demo: prithivMLmods/Qwen-Image-Edit-Object-Manipulator
Qwen-Image-Edit-2511-Object-Remover is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object removal from images.
⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Remover
Qwen-Image-Edit-2511-Object-Adder is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object addition to images.
⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Adder
🕹️ Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-object-manipulator
🕹️ github: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-Object-Manipulator
To learn more, visit the app page or the respective model pages.
🤗 Demo: prithivMLmods/Qwen-Image-Edit-Object-Manipulator
Qwen-Image-Edit-2511-Object-Remover is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object removal from images.
⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Remover
Qwen-Image-Edit-2511-Object-Adder is an adapter (LoRA) developed for Qwen’s Qwen-Image-Edit-2511 image-to-image model. It is specifically designed for precise object addition to images.
⭐ Model: prithivMLmods/Qwen-Image-Edit-2511-Object-Adder
🕹️ Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-object-manipulator
🕹️ github: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-Object-Manipulator
To learn more, visit the app page or the respective model pages.
Post
223
if you are interested in HUB (https://saemi410.github.io/HUB/ I recommend the fork I have created with some updates to make it smooth in running a smoke test [email protected]:javadtaghia/HUB.git) and you want to run the UCE (https://unified.baulab.info), please check:
- Model weights for UCE here: telcom/uce_NSFW
- Model weights for ESD here: telcom/esd_NSFW
- datasets and more download materials from: telcom/HUB_reference_dataset
Please read the notes in the model card.
- Model weights for UCE here: telcom/uce_NSFW
- Model weights for ESD here: telcom/esd_NSFW
- datasets and more download materials from: telcom/HUB_reference_dataset
Please read the notes in the model card.
prithivMLmods
posted
an
update
12 days ago
Post
4077
Update: TRELLIS.2 (Text to 3D, Image to 3D) Gradio with Rerun Embedded demo with improved visualization of the 3D model previewer is now available on Hugging Face. Generate assets and view them in the 3D viewer, powered and streamlined with Microsoft’s TRELLIS.2 and Tongyi-MAI’s Z-Image-Turbo models.
🤗 TRELLIS.2 (Demo): prithivMLmods/TRELLIS.2-Text-to-3D
🕹️ GitHub: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D-RERUN
🕹️ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
To know more about it, visit the app page or the respective model page!
🤗 TRELLIS.2 (Demo): prithivMLmods/TRELLIS.2-Text-to-3D
🕹️ GitHub: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D-RERUN
🕹️ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
13 days ago
Post
4170
Introducing the Qwen-Image-Edit-2511-LoRAs-Fast demo, featuring image property comparison and contrast, built on top of Gradio and the combined Rerun SDK. It supports single and multi-image edits with existing LoRAs that are lazily loaded. (Note: This is still an experimental Space for Qwen-Image-Edit-2511.)
⭐ Space Demo: prithivMLmods/Qwen-Image-Edit-2511-LoRAs-Fast
⭐ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-2511-LoRAs-Fast-Multi-Image-Rerun
⭐ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
To know more about it, visit the app page or the respective model page!
⭐ Space Demo: prithivMLmods/Qwen-Image-Edit-2511-LoRAs-Fast
⭐ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-2511-LoRAs-Fast-Multi-Image-Rerun
⭐ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
To know more about it, visit the app page or the respective model page!
Post
260
NVIDIA’s Groq deal ... I think, inference efficiency is becoming the main driver of profitability, and NVIDIA’s Groq deal is evidence the market is moving from “who can train biggest” to “who can serve cheapest and fastest at scale.” That points to a maturing phase of AI, not necessarily the end of a bubble, but definitely a correction in what “wins” long-term.
What do you think?
What do you think?
Post
180
CIFAR-10 your handing image dataset ...
CIFAR-10 is a small, standard computer-vision dataset used to quickly test and compare ideas.
- 60,000 color images, each 32×32 pixels, labeled into 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
- Label mapping (important):
- 0 airplane
- 1 automobile
- 2 bird
- 3 cat
- 4 deer
- 5 dog
- 6 frog
- 7 horse
- 8 ship
- 9 truck
- Split: 50,000 train and 10,000 test.
- Why people use it: fast benchmarking for image classifiers (small CNNs, ResNet, ViT), and quick experiments for training pipelines, augmentation, regularization, pruning, distillation, and demos.
- Sizes (downloads): Python version about 163 MB, binary about 162 MB. Hugging Face shows about 144 MB for the dataset files.
- Where to get it: the official CIFAR page (University of Toronto) and the Hugging Face CIFAR-10 dataset page.
uoft-cs/cifar10
If you want something more, check the table below
| Dataset | Resolution | Classes | Best For |
| ImageNet 1K | 224–256×256 | 1000 | Real-world large-scale classification |
| ImageNet-256. | 256×256 | 1000 | Direct high-res training |
| TinyImageNet | 64×64 | 200 | Mid-range benchmark |
| UC Merced Land Use | 256×256 | ~21 | Higher resolution small classification |
| MS COCO | >256×256 | ~80 objects | Detection / segmentation |
CIFAR-10 is a small, standard computer-vision dataset used to quickly test and compare ideas.
- 60,000 color images, each 32×32 pixels, labeled into 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
- Label mapping (important):
- 0 airplane
- 1 automobile
- 2 bird
- 3 cat
- 4 deer
- 5 dog
- 6 frog
- 7 horse
- 8 ship
- 9 truck
- Split: 50,000 train and 10,000 test.
- Why people use it: fast benchmarking for image classifiers (small CNNs, ResNet, ViT), and quick experiments for training pipelines, augmentation, regularization, pruning, distillation, and demos.
- Sizes (downloads): Python version about 163 MB, binary about 162 MB. Hugging Face shows about 144 MB for the dataset files.
- Where to get it: the official CIFAR page (University of Toronto) and the Hugging Face CIFAR-10 dataset page.
uoft-cs/cifar10
If you want something more, check the table below
| Dataset | Resolution | Classes | Best For |
| ImageNet 1K | 224–256×256 | 1000 | Real-world large-scale classification |
| ImageNet-256. | 256×256 | 1000 | Direct high-res training |
| TinyImageNet | 64×64 | 200 | Mid-range benchmark |
| UC Merced Land Use | 256×256 | ~21 | Higher resolution small classification |
| MS COCO | >256×256 | ~80 objects | Detection / segmentation |
Post
2049
arXiv CS endorsement
It's Javad, my Google Scholar Profile:
https://scholar.google.com/citations?user=bja6GwoAAAAJ&hl=en
I would like to share my articles with you on Hugging Face, I'm asking for endorsement* in Computer Science arxiv.org.
If you would like to endorse me, please visit the following URL:
https://arxiv.org/auth/endorse?x=NVUAPL
If that URL does not work for you, please visit
http://arxiv.org/auth/endorse.php
and enter the following six-digit alphanumeric string:
Endorsement Code: NVUAPL
Thanks you in advance.
Javad Taghia
* Who is qualified to endorse?
To endorse another user to submit to the cs.AI (Artificial Intelligence) subject class, an arXiv submitter must have submitted 3 papers to any of cs.AI, cs.AR, cs.CC, cs.CE, cs.CG, cs.CL, cs.CR, cs.CV, cs.CY, cs.DB, cs.DC, cs.DL, cs.DM, cs.DS, cs.ET, cs.FL, cs.GL, cs.GR, cs.GT, cs.HC, cs.IR, cs.IT, cs.LG, cs.LO, cs.MA, cs.MM, cs.MS, cs.NA, cs.NE, cs.NI, cs.OH, cs.OS, cs.PF, cs.PL, cs.RO, cs.SC, cs.SD, cs.SE, cs.SI or cs.SY earlier than three months ago and less than five years ago.
It's Javad, my Google Scholar Profile:
https://scholar.google.com/citations?user=bja6GwoAAAAJ&hl=en
I would like to share my articles with you on Hugging Face, I'm asking for endorsement* in Computer Science arxiv.org.
If you would like to endorse me, please visit the following URL:
https://arxiv.org/auth/endorse?x=NVUAPL
If that URL does not work for you, please visit
http://arxiv.org/auth/endorse.php
and enter the following six-digit alphanumeric string:
Endorsement Code: NVUAPL
Thanks you in advance.
Javad Taghia
* Who is qualified to endorse?
To endorse another user to submit to the cs.AI (Artificial Intelligence) subject class, an arXiv submitter must have submitted 3 papers to any of cs.AI, cs.AR, cs.CC, cs.CE, cs.CG, cs.CL, cs.CR, cs.CV, cs.CY, cs.DB, cs.DC, cs.DL, cs.DM, cs.DS, cs.ET, cs.FL, cs.GL, cs.GR, cs.GT, cs.HC, cs.IR, cs.IT, cs.LG, cs.LO, cs.MA, cs.MM, cs.MS, cs.NA, cs.NE, cs.NI, cs.OH, cs.OS, cs.PF, cs.PL, cs.RO, cs.SC, cs.SD, cs.SE, cs.SI or cs.SY earlier than three months ago and less than five years ago.
prithivMLmods
posted
an
update
20 days ago
Post
3689
Introducing demos for new SOTA models from AI2: SAGE-MM (Smart Any-Horizon Agents for Long-Video Reasoning) and Molmo-2, an open vision-language model that supports multi-image (QA and pointing) and video (QA, pointing, and tracking). The respective demo-related collections are listed below. 🎃🔥
✨ SAGE-MM [Video-Reasoning]: prithivMLmods/SAGE-MM-Video-Reasoning
✨ Molmo2 [Demo]: prithivMLmods/Molmo2-HF-Demo
🎃 GitHub[SAGE-MM]: https://github.com/PRITHIVSAKTHIUR/SAGE-MM-Video-Reasoning
🎃 GitHub[Molmo2]: https://github.com/PRITHIVSAKTHIUR/Molmo2-HF-Demo
🎃 Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
To know more about it, visit the app page or the respective model page!
✨ SAGE-MM [Video-Reasoning]: prithivMLmods/SAGE-MM-Video-Reasoning
✨ Molmo2 [Demo]: prithivMLmods/Molmo2-HF-Demo
🎃 GitHub[SAGE-MM]: https://github.com/PRITHIVSAKTHIUR/SAGE-MM-Video-Reasoning
🎃 GitHub[Molmo2]: https://github.com/PRITHIVSAKTHIUR/Molmo2-HF-Demo
🎃 Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
21 days ago
Post
2054
Introducing TRELLIS.2 Text-to-3D. The demo for the TRELLIS.2-4B (Image-to-3D) model is streamlined with the Z-Image Turbo image generation model to enable Text-to-3D functionality. There is no need for input assets, making a small leap forward for ideation. Optionally, it also includes default support for Image-to-3D inference using direct image assets. Find the demo and related collections below... 🤗🔥
✨ TRELLIS.2-Text-to-3D [Demo]: prithivMLmods/TRELLIS.2-Text-to-3D
✨ Multimodal Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Github: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D
To know more about it, visit the app page or the respective model page!
✨ TRELLIS.2-Text-to-3D [Demo]: prithivMLmods/TRELLIS.2-Text-to-3D
✨ Multimodal Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Github: https://github.com/PRITHIVSAKTHIUR/TRELLIS.2-Text-to-3D
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
23 days ago
Post
2017
Demo for Molmo2 on Hugging Face is live now, including Single/Multi-Image VQA, Visual Pointing/Grounding, Video VQA, and Video Point Tracking. Find the demo and related collections below. 🔥🤗
● Molmo2 HF Demo🖥️: prithivMLmods/Molmo2-HF-Demo
● Model Collection: https://huggingface.co/collections/allenai/molmo2
● Related Multimodal Space Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
To know more about it, visit the app page or the respective model page!
● Molmo2 HF Demo🖥️: prithivMLmods/Molmo2-HF-Demo
● Model Collection: https://huggingface.co/collections/allenai/molmo2
● Related Multimodal Space Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
24 days ago
Post
5545
Introducing the Z Image Turbo LoRA DLC App, a gallery space for plug-and-play Z-Image-Turbo LoRAs. It features a curated collection of impressive LoRAs for generating high-quality images. By default, it runs on the base model. Simply choose a LoRA, type your prompt, and generate images. You can find the app and more details below. 🤗🧪
● Space [Demo]: prithivMLmods/Z-Image-Turbo-LoRA-DLC
● Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
● Check the list of Z-Image LoRA's: https://huggingface.co/models?other=base_model:adapter:Tongyi-MAI/Z-Image-Turbo
● Github: https://github.com/PRITHIVSAKTHIUR/Z-Image-Turbo-LoRA-DLC
Other related image gen spaces:-
● FLUX-LoRA-DLC2: prithivMLmods/FLUX-LoRA-DLC2
● FLUX-LoRA-DLC: prithivMLmods/FLUX-LoRA-DLC
● Qwen-Image-LoRA-DLC: prithivMLmods/Qwen-Image-LoRA-DLC
● Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
● Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
& more...
To know more about it, visit the app page or the respective model page!
● Space [Demo]: prithivMLmods/Z-Image-Turbo-LoRA-DLC
● Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
● Check the list of Z-Image LoRA's: https://huggingface.co/models?other=base_model:adapter:Tongyi-MAI/Z-Image-Turbo
● Github: https://github.com/PRITHIVSAKTHIUR/Z-Image-Turbo-LoRA-DLC
Other related image gen spaces:-
● FLUX-LoRA-DLC2: prithivMLmods/FLUX-LoRA-DLC2
● FLUX-LoRA-DLC: prithivMLmods/FLUX-LoRA-DLC
● Qwen-Image-LoRA-DLC: prithivMLmods/Qwen-Image-LoRA-DLC
● Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
● Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
& more...
To know more about it, visit the app page or the respective model page!
Post
265
Recently I was playing with my model . What is your idea about "unlearning" since I need it 😀
telcom/deewaiREALCN, I have the original one on the main branch and trained version "cp550" and "n_680" on anther branch.
Both trained on telcom/deewaiREALCN-training.
I got three results when doing prompt:
"Athlete portrait, 26-year-old woman, post-training sweat, gym ambient light, chalk dust particles, intense gaze, crisp detail."
Apparently, model is sensitive to the word "old".
You can see the training on more faces improved from main, however, still not ideal...
I am working now on unlearning. I would like to hear about your opinion.
#unlearning
telcom/deewaiREALCN, I have the original one on the main branch and trained version "cp550" and "n_680" on anther branch.
Both trained on telcom/deewaiREALCN-training.
I got three results when doing prompt:
"Athlete portrait, 26-year-old woman, post-training sweat, gym ambient light, chalk dust particles, intense gaze, crisp detail."
Apparently, model is sensitive to the word "old".
You can see the training on more faces improved from main, however, still not ideal...
I am working now on unlearning. I would like to hear about your opinion.
#unlearning
prithivMLmods
posted
an
update
about 1 month ago
Post
2739
Introducing the D.Markdown Experimental Models, Proxima and Epsilon OCR models, built on top of Qwen3-VL and Qwen2.5-VL respectively. Proxima is optimized for Markdown generation and is capable of embedding inline programming code snippets and generating rich nodes such as HTML, XML, JSON, and YAML. Epsilon is optimized for reconstructing complex layouts including tables, forms, and mathematical content. 🌌✨
● proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
● epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
● proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
● epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF
● Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
● Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
👉 These models are stage progression models, and currently they may contain artifacts.
To know more about it, visit the app page or the respective model page!
● proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
● epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
● proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
● epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF
● Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
● Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
👉 These models are stage progression models, and currently they may contain artifacts.
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
about 1 month ago
Post
1136
Try CUA GUI Operator 🖥️ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.
● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Other related multimodal spaces
● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en
I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.
To know more about it, visit the app page or the respective model page!
● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Other related multimodal spaces
● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en
I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
about 1 month ago
Post
3571
One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. 🗣️🔥
🤗 Vision-to-VibeVoice-en [Demo]: prithivMLmods/Vision-to-VibeVoice-en
✨ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Speech [VibeVoice-Realtime-0.5B]: microsoft/VibeVoice-Realtime-0.5B
✨ Vision [Qwen2.5-VL]: Qwen/Qwen2.5-VL-7B-Instruct
To know more about it, visit the app page or the respective model page!
🤗 Vision-to-VibeVoice-en [Demo]: prithivMLmods/Vision-to-VibeVoice-en
✨ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Speech [VibeVoice-Realtime-0.5B]: microsoft/VibeVoice-Realtime-0.5B
✨ Vision [Qwen2.5-VL]: Qwen/Qwen2.5-VL-7B-Instruct
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
about 1 month ago
Post
3721
Hello everyone,
The
strangerzonehf
[HF] Community / Organization Page, which is maintained by me, has reached the Top 10 Developer Pages ranking at 6th place, contributing 3.4% in the calendar cycle from August 2024 to August 2025. It is also the only South Asia / Indian page in the list. I could not be more proud to be doing things for the community. ❤️🤗
Source: https://www.dataprovenance.org/economies-of-open-intelligence.pdf
It is a pleasure to be a part of it.
Thank you!
@prithivMLmods
The
Source: https://www.dataprovenance.org/economies-of-open-intelligence.pdf
It is a pleasure to be a part of it.
Thank you!
@prithivMLmods
prithivMLmods
posted
an
update
about 1 month ago
Post
10698
Introducing the Super-OCRs Demo, a comparison of state-of-the-art multimodal OCR VLMs, including HunyuanOCR, DeepSeekOCR, Dots, and Nanonets in one space for performing OCR, rendering LaTeX and Markdown, and visual grounding (layout). Find the related Spaces and models below.🤗🔥
✨Super-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
✨Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨GitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo
⭐ Models Used:
✦ HunyuanOCR: tencent/HunyuanOCR
✦ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
✦ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
✦ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B
⭐ Some Other Relevant Apps:
✦ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✦ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
✦ Multimodal-OCR: prithivMLmods/Multimodal-OCR
✦ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
✦ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
✦ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental
To know more about it, visit the app page or the respective model page!
✨Super-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
✨Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨GitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo
⭐ Models Used:
✦ HunyuanOCR: tencent/HunyuanOCR
✦ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
✦ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
✦ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B
⭐ Some Other Relevant Apps:
✦ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✦ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
✦ Multimodal-OCR: prithivMLmods/Multimodal-OCR
✦ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
✦ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
✦ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental
To know more about it, visit the app page or the respective model page!
prithivMLmods
posted
an
update
about 2 months ago
Post
3234
Introducing the advanced sketch-board editor "Nano-Banana-Pro-Sketch-Board" powered by the Gemini 2.5 Flash Image and Gemini 3 Pro Preview Image models through the Gemini API. This version includes more features than the Nano-Banana-AIO app for drawing and prompt-based concept transformation of freestyle sketches. 🔥🍌
✨Nano-Banana-Pro-Sketch-Board: prithivMLmods/Nano-Banana-Pro-Sketch-Board
✨Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
✨Github: https://github.com/PRITHIVSAKTHIUR/Nano-Banana-Pro-Sketch-Board
✨Model-Garden: https://tinyurl.com/4xxs9dvy
Some Other Relevant Apps [OSS]
⭐Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⭐Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⭐Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⭐Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2
Note: The Nano-Banana-Pro-Sketch-Board demo requires a Gemini API key for the editing process. Your API key will be removed when the app is reloaded or closed. Your key remains safe and will not be exposed to any medium. Also, the Gemini 3 Pro Preview Image model may require a paid API key from a Google Cloud project with billing enabled.
To know more about it, visit the app info section or the respective Model Garden page!
✨Nano-Banana-Pro-Sketch-Board: prithivMLmods/Nano-Banana-Pro-Sketch-Board
✨Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
✨Github: https://github.com/PRITHIVSAKTHIUR/Nano-Banana-Pro-Sketch-Board
✨Model-Garden: https://tinyurl.com/4xxs9dvy
Some Other Relevant Apps [OSS]
⭐Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⭐Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⭐Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⭐Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2
Note: The Nano-Banana-Pro-Sketch-Board demo requires a Gemini API key for the editing process. Your API key will be removed when the app is reloaded or closed. Your key remains safe and will not be exposed to any medium. Also, the Gemini 3 Pro Preview Image model may require a paid API key from a Google Cloud project with billing enabled.
To know more about it, visit the app info section or the respective Model Garden page!
prithivMLmods
posted
an
update
about 2 months ago
Post
1340
Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.
⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Some relevant Spaces
⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
Check out the other spaces in the multimodal implementation collection.
To know more about it, visit the app page or the respective model page!
⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
Some relevant Spaces
⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
Check out the other spaces in the multimodal implementation collection.
To know more about it, visit the app page or the respective model page!