A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens Paper • 2604.04913 • Published 17 days ago • 10
PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders Paper • 2603.25398 • Published 28 days ago • 3
DeltaTok Collection DeltaTok tokenizer, DeltaWorld predictor, and evaluation heads. https://github.com/amazon-far/deltatok • 7 items • Updated 15 days ago • 8
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Paper • 2602.17807 • Published Feb 19 • 7
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Paper • 2602.17807 • Published Feb 19 • 7
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published Sep 17, 2024 • 30
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published Sep 17, 2024 • 30