---
license: other
license_name: nvidia-source-code-license-v1
license_link: https://huggingface.co/nvidia/finite-difference-flow-optimization/resolve/main/LICENSE.txt
pipeline_tag: text-to-image
library_name: diffusers
base_model: stabilityai/stable-diffusion-3.5-medium
inference: false
tags:
- text-to-image
- post-training
- reinforcement-learning
- stable-diffusion
- diffusers
language:
- en
---
# FDFO: Finite Difference Flow Optimization
This repository contains the official pretrained checkpoints for FDFO, a method for fine-tuning flow-based diffusion models using finite difference gradient estimation. We fine-tune [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) using reward signals from VLM-based scoring and/or PickScore. Please see the [GitHub repository](https://github.com/NVlabs/finite-difference-flow-optimization) for further details.
# Model overview
## Description
Finite Difference Flow Optimization is an algorithm for reinforcement learning (RL) post-training of text-to-image diffusion models to improve output quality. We provide a set of Low-Rank Adaptation (LoRA) checkpoints that modify the behavior of the off-the-shelf Stable Diffusion 3.5 Medium text-to-image model.
This model is for research and development only.
## License
The FDFO source code and pretrained checkpoints are licensed under the [NVIDIA Source Code License v1 (Non-Commercial)](LICENSE.txt). Copyright © 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
The Stable Diffusion 3.5 Medium Model is licensed under the [Stability AI Community License](https://stability.ai/community-license-agreement). Copyright © Stability AI Ltd. All rights reserved. Powered by Stability AI.
## Deployment geography
Global
## Use case
The checkpoints are intended for academic researchers who want to reproduce the results from the following NVIDIA research paper.
**Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models**
David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine
https://arxiv.org/abs/TODO.TODO
## Release date
TODO
## References
**Research paper:** https://arxiv.org/abs/TODO.TODO
**Source code:** https://github.com/NVlabs/finite-difference-flow-optimization
**Checkpoints:** https://huggingface.co/nvidia/finite-difference-flow-optimization
## Model architecture
**Architecture type:** Transformer LoRA
**Network architecture:** Low-rank adapter for Stable Diffusion 3.5 Medium
**Number of model parameters:** 1.9*10^7
The low-rank adapter was initialized to zero and trained using Finite Difference Flow Optimization for 1000 RL epochs, where one RL epoch corresponds to 864 reward evaluations. See the associated [research paper](https://arxiv.org/abs/TODO.TODO) for further details.
## Input
**Input type:** Text
**Input format:** String
**Input parameters:** One-dimensional (1D)
**Other properties related to input:** Typically around 50-200 letters, up to 2K tokens
## Output
**Output type:** Image
**Output format:** Red, green, blue (RGB)
**Output parameters:** Two-dimensional (2D)
**Other properties related to output:** 512x512, 24 bits per pixel
## Software integration
**Runtime engine:** PyTorch 2.7.0
**Supported hardware microarchitecture compatibility:** NVIDIA Ampere, NVIDIA Hopper
**Preferred/supported operating system:** Linux
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
## Model version
1.0
## Training and evaluation datasets
**Training dataset:** Pick-a-Pic
**Link:** https://stability.ai/research/pick-a-pic
**Data modality:** Text prompts (25432 prompts)
**Data collection method:** Human
Web-based human preference collection run by Stability AI via the Pick-a-Pic interface with explicit user consent.
**Text training data size:** Less than a Billion Tokens
**Labeling method:** Not applicable
**Evaluation dataset:** HPDv2
**Link:** https://huggingface.co/datasets/ymhao/HPDv2
**Data collection method:** Hybrid (Human, Synthetic)
The data consists of text prompts from MS COCO and DiffusionDB, cleaned by ChatGPT. The dataset collection method is described in detail in the [Human Preference Score v2](https://arxiv.org/abs/2306.09341) paper.
**Properties:** 3200 text prompts (evaluation set)
**Labeling method:** Not applicable
## Inference
**Acceleration engine:** PyTorch 2.7.0
**Test hardware:** NVIDIA Hopper
## Ethical considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).