-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2406.07550
-
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 60 -
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework
Paper • 2407.20729 • Published • 28
-
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 24 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Paper • 2502.20256 • Published -
High-Resolution Building and Road Detection from Sentinel-2
Paper • 2310.11622 • Published
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 17 -
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Paper • 2406.04314 • Published • 30 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 60 -
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Paper • 2406.07546 • Published • 9
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 24 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Paper • 2502.20256 • Published -
High-Resolution Building and Road Detection from Sentinel-2
Paper • 2310.11622 • Published
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 60 -
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework
Paper • 2407.20729 • Published • 28
-
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 17 -
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Paper • 2406.04314 • Published • 30 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 60 -
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Paper • 2406.07546 • Published • 9