Add model card for GVFDiffusion
#1
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-to-3d
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
|
| 6 |
+
|
| 7 |
+
This repository contains the model and code for the paper [Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis](https://huggingface.co/papers/2507.23785).
|
| 8 |
+
|
| 9 |
+
This work presents a novel framework for video-to-4D generation that creates high-quality dynamic 3D content from single video inputs. It introduces a *Direct 4DMesh-to-GS Variation Field VAE* to encode canonical Gaussian Splats (GS) and their temporal variations into a compact latent space. Building on this, a *Gaussian Variation Field diffusion model* is trained with a temporal-aware Diffusion Transformer, conditioned on input videos and canonical GS. The model demonstrates superior generation quality and remarkable generalization to in-the-wild video inputs.
|
| 10 |
+
|
| 11 |
+
Project Page: [https://gvfdiffusion.github.io/](https://gvfdiffusion.github.io/)
|
| 12 |
+
Code: [https://github.com/ForeverFancy/GVFDiffusion](https://github.com/ForeverFancy/GVFDiffusion)
|
| 13 |
+
|
| 14 |
+
## Abstract
|
| 15 |
+
We present a novel framework for video-to-4D generation that creates high-quality dynamic 3D content from single video inputs. Direct 4D diffusion modeling is extremely challenging due to costly data construction and the high-dimensional nature of jointly representing 3D shape, appearance, and motion. We address these challenges by introducing a Direct 4DMesh-to-GS Variation Field VAE that directly encodes canonical Gaussian Splats (GS) and their temporal variations from 3D animation data without per-instance fitting, and compresses high-dimensional animations into a compact latent space. Building upon this efficient representation, we train a Gaussian Variation Field diffusion model with temporal-aware Diffusion Transformer conditioned on input videos and canonical GS. Trained on carefully-curated animatable 3D objects from the Objaverse dataset, our model demonstrates superior generation quality compared to existing methods. It also exhibits remarkable generalization to in-the-wild video inputs despite being trained exclusively on synthetic data, paving the way for generating high-quality animated 3D content.
|
| 16 |
+
|
| 17 |
+
## Installation and Quick Start
|
| 18 |
+
|
| 19 |
+
For detailed installation instructions and how to run a minimal inference example, please refer to the [GitHub repository](https://github.com/ForeverFancy/GVFDiffusion).
|
| 20 |
+
|
| 21 |
+
```bash
|
| 22 |
+
# Clone the repository
|
| 23 |
+
git clone https://github.com/ForeverFancy/GVFDiffusion.git
|
| 24 |
+
cd GVFDiffusion
|
| 25 |
+
|
| 26 |
+
# Setup environment and dependencies
|
| 27 |
+
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
|
| 28 |
+
|
| 29 |
+
# Run a minimal inference example
|
| 30 |
+
accelerate launch --num_processes 1 inference_dpm_latent.py --batch_size 1 --exp_name /path/to/your/output --config configs/diffusion.yml --start_idx 0 --end_idx 2 --txt_file ./assets/in_the_wild.txt --use_fp16 --num_samples 2 --adaptive --data_dir ./assets/ --num_timesteps 32 --download_assets --in_the_wild
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
## Citation
|
| 34 |
+
If you find the work useful, please consider citing:
|
| 35 |
+
```bibtex
|
| 36 |
+
@misc{zhang2025gaussianvariationfielddiffusion,
|
| 37 |
+
title={Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis},
|
| 38 |
+
author={Bowen Zhang and Sicheng Xu and Chuxin Wang and Jiaolong Yang and Feng Zhao and Dong Chen and Baining Guo},
|
| 39 |
+
year={2025},
|
| 40 |
+
eprint={2507.23785},
|
| 41 |
+
archivePrefix={arXiv},
|
| 42 |
+
primaryClass={cs.CV},
|
| 43 |
+
url={https://arxiv.org/abs/2507.23785},
|
| 44 |
+
}
|
| 45 |
+
```
|