Fine-tuned Stable Diffusion 2 Model for Time Series Image Inpainting
This directory contains a fine-tuned Stable Diffusion 2 inpainting model specialized for reconstructing mathematical time series visualizations (GAF, MTF, RP, Spectrogram).
π Model Overview
| Property | Value |
|---|---|
| Base Model | stabilityai/stable-diffusion-2-inpainting |
| Specialized For | Mathematical time series image reconstruction |
| Image Types | GAF, MTF, RP, Spectrogram |
| Input Size | 512Γ512 pixels |
| Architecture | UNet2DConditionModel (fine-tuned) |
| Training Method | Cross-validation with early stopping |
| Best Validation Loss | 0.03623 |
| Training Date | September 2024 |
π― Purpose
This model was fine-tuned to perform inpainting on time series visualizations. It can:
- Reconstruct missing regions in GAF (Gramian Angular Field) images
- Fill gaps in MTF (Markov Transition Field) images
- Complete RP (Recurrence Plot) images
- Restore Spectrogram images
The model is specifically trained on synthetic time series data with controlled missing patterns (random, block, periodic, edge).
π Model Structure
best_model/
βββ README.md # This file
βββ model_index.json # Model configuration
βββ scheduler/ # DDPM Scheduler
βββ text_encoder/ # CLIP Text Encoder (frozen)
βββ tokenizer/ # CLIP Tokenizer
βββ unet/ # Fine-tuned UNet (main trainable component)
βββ vae/ # VAE Encoder/Decoder (frozen)
Note: Only the UNet component was fine-tuned. All other components (VAE, text encoder) remain frozen from the base Stable Diffusion 2 model.
π Quick Start
Installation
pip install torch torchvision
pip install diffusers transformers accelerate
pip install pillow numpy
Basic Usage
from diffusers import StableDiffusionInpaintPipeline
import torch
from PIL import Image
# Load the fine-tuned model
model_path = "models/stable_diffusion_2_all_4/best_model"
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16
).to("cuda")
# Load images
image = Image.open("missing_image.png").convert("RGB").resize((512, 512))
mask = Image.open("mask.png").convert("L").resize((512, 512))
# Inpaint with appropriate prompt
prompt = "high quality gramian angular field mathematical visualization"
result = pipeline(
prompt=prompt,
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]
result.save("reconstructed_image.png")
Image Type-Specific Prompts
Use these prompts for best results:
prompts = {
"gaf": "high quality gramian angular field mathematical visualization",
"mtf": "high quality markov transition field mathematical visualization",
"rp": "high quality recurrence plot mathematical visualization",
"spec": "high quality spectrogram mathematical visualization"
}
ποΈ Training Details
Training Configuration
| Parameter | Value |
|---|---|
| Training Samples | ~3,000 per fold |
| Validation Samples | ~1,000 per fold |
| Batch Size | 4 |
| Learning Rate | 1e-5 |
| Optimizer | AdamW (weight_decay=0.01) |
| Training Runs | 2 folds |
| Early Stopping | Patience = 5 epochs |
| Mixed Precision | No (FP32) |
| Epochs Trained | 9-13 per fold |
Training Results
Fold 1:
- Best Validation Loss: 0.03802
- Epochs: 13
- Final Training Loss: 0.03784
Fold 2 (Best Model):
- Best Validation Loss: 0.03623 β
- Epochs: 9
- Final Training Loss: 0.03872
Overall Performance:
- Mean Validation Loss: 0.03713 Β± 0.00089
- This model represents Fold 2 (best performer)
Memory Optimizations Used
- Attention slicing enabled
- Gradient checkpointing
- Only UNet fine-tuned (VAE and text encoder frozen)
π Training Dataset
The model was trained on synthetically generated time series with:
- 2,000 base samples = 8,000 images (4 types each)
- Pattern types: sine, cosine, trend, seasonal, noise, spikes, mixed
- Missing data types: random, block, periodic, edge
- Missing rates: 5% - 30%
- Time series lengths: 100 - 1,000 points
Dataset generation script: generate_training_dataset.py
π» System Requirements
Minimum Requirements
- GPU: NVIDIA GPU with 8+ GB VRAM
- CUDA: 11.0+
- RAM: 16 GB
- Storage: ~20 GB for model
Recommended Requirements
- GPU: NVIDIA GPU with 16+ GB VRAM (RTX 3090, A5000, or better)
- CUDA: 11.8+
- RAM: 32 GB
- Storage: 50 GB (for model + datasets)
Tested On
- GPU: NVIDIA TITAN RTX (24 GB GDDR6)
- Driver: 525.147.05
- CUDA: 12.0
- OS: Ubuntu 18.04.1 LTS
π¨ Example Use Cases
1. Recovering Missing Time Series Data
# Load corrupted time series image
corrupted_img = load_corrupted_timeseries_image()
mask = create_missing_data_mask()
# Reconstruct using fine-tuned model
reconstructed = pipeline(
prompt="high quality gramian angular field mathematical visualization",
image=corrupted_img,
mask_image=mask,
guidance_scale=7.5,
num_inference_steps=50
).images[0]
2. Batch Processing Multiple Images
import glob
from pathlib import Path
# Process all GAF images
for img_path in glob.glob("data/missing/*.png"):
image = Image.open(img_path).resize((512, 512))
mask = generate_mask_from_image(image)
result = pipeline(
prompt="high quality gramian angular field mathematical visualization",
image=image,
mask_image=mask
).images[0]
output_path = Path("data/reconstructed") / Path(img_path).name
result.save(output_path)
3. Integration with Forecasting Pipeline
# 1. Time series β Image (GAF/MTF/RP/SPEC)
image = time_series_to_gaf(corrupted_series)
# 2. Create mask for missing regions
mask = create_mask_from_nan(corrupted_series)
# 3. Inpaint image
reconstructed_image = pipeline(
prompt="high quality gramian angular field mathematical visualization",
image=image,
mask_image=mask
).images[0]
# 4. Image β Time series
reconstructed_series = gaf_to_time_series(reconstructed_image)
# 5. Use for forecasting
forecast = xgboost_model.predict(reconstructed_series)
π§ Advanced Configuration
Inference Parameters
result = pipeline(
prompt=prompt,
image=image,
mask_image=mask,
# Quality settings
num_inference_steps=50, # 20-100, higher = better quality
guidance_scale=7.5, # 1-20, higher = closer to prompt
# Generation settings
num_images_per_prompt=1,
generator=torch.manual_seed(42), # For reproducibility
# Advanced
eta=0.0, # DDIM eta parameter
output_type="pil" # "pil" or "np"
)
Memory Management
# For limited VRAM
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()
# Use CPU offloading
pipeline.enable_sequential_cpu_offload()
# Lower precision
pipeline = pipeline.to(torch_dtype=torch.float16)
π Performance Metrics
Validation Performance
| Metric | Value |
|---|---|
| Best Val Loss | 0.03623 |
| Mean Val Loss | 0.03713 Β± 0.00089 |
| Training Stability | High (early stopping at epoch 9) |
Inference Speed (TITAN RTX)
| Inference Steps | Time per Image |
|---|---|
| 20 steps | ~2.5 seconds |
| 50 steps | ~5.5 seconds |
| 100 steps | ~11 seconds |
π¬ Technical Details
Model Architecture
Base: Stable Diffusion 2 Inpainting
UNet: 859M parameters (fine-tuned)
- Channels: 320, 640, 1280, 1280
- Attention layers: Self-attention + Cross-attention
- Conditioning: Text embeddings (CLIP)
VAE: 83M parameters (frozen)
- Encoder: 512Γ512 β 64Γ64 latents
- Decoder: 64Γ64 latents β 512Γ512
- Latent channels: 4
Text Encoder: CLIP (frozen)
- OpenCLIP-ViT-H/14
- 354M parameters
Training Objective
The model minimizes MSE loss in latent space:
L = E[||Ξ΅ - Ξ΅_ΞΈ(z_t, t, c)||Β²]
Where:
- Ξ΅: True noise
- Ξ΅_ΞΈ: Model prediction
- z_t: Noisy latents at timestep t
- c: Text conditioning (prompt)
π How to Retrain or Fine-tune Further
Re-training from Scratch
python finetune_stable_diffusion.py \
--data_dir stdiff_training_data \
--output_dir models/my_new_model \
--max_samples 4000 \
--batch_size 4 \
--learning_rate 1e-5 \
--max_epochs 300 \
--n_folds 2 \
--train_ratio 0.75 \
--early_stop_patience 5
Continuing Training from This Model
# Load this model and continue training
python finetune_stable_diffusion.py \
--data_dir stdiff_training_data \
--output_dir models/stable_diffusion_2_all_4_continued \
--resume_fold 2 \
--max_epochs 50
Training on Custom Dataset
- Generate your dataset:
python generate_training_dataset.py \
--samples 5000 \
--output my_custom_data
- Train model:
python finetune_stable_diffusion.py \
--data_dir my_custom_data \
--output_dir models/custom_model
π Troubleshooting
Out of Memory (OOM)
# Reduce batch size, use attention slicing
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()
# Or use float16
pipeline = pipeline.to(torch_dtype=torch.float16)
Poor Reconstruction Quality
- Increase inference steps: 20 β 50 or 100
- Adjust guidance scale: Try 5.0 - 10.0
- Check prompt: Use correct image type prompt
- Mask quality: Ensure mask accurately covers missing regions
Slow Inference
# Use fewer inference steps (trade quality for speed)
result = pipeline(..., num_inference_steps=20)
# Use DPM++ Sampler (faster)
from diffusers import DPMSolverMultistepScheduler
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
pipeline.scheduler.config
)
π Related Files
- Training Script:
finetune_stable_diffusion.py - Dataset Generator:
generate_training_dataset.py - Integration Script:
integrate_custom_model.py - Main Experiment:
iterative_experiment.py - Image Encoders:
ts_image_inpainting.py
π References
Base Model
@article{rombach2022high,
title={High-resolution image synthesis with latent diffusion models},
author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj{\"o}rn},
journal={CVPR},
year={2022}
}
Stable Diffusion 2
- HuggingFace: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting
- Paper: https://arxiv.org/abs/2112.10752
π Integration with Experimental Pipeline
This model can be used in the main experimental pipeline:
# In iterative_experiment.py or ts_image_inpainting.py
from models.stdiff import StableDiffusion2MathInpainter
# Initialize inpainter with this model
inpainter = StableDiffusion2MathInpainter(
model_path="models/stable_diffusion_2_all_4/best_model"
)
# Use for inpainting
reconstructed = inpainter.inpaint(
image=corrupted_gaf_image,
mask=missing_mask,
enc_name="gaf"
)
βοΈ License
This model is a fine-tuned version of Stable Diffusion 2, which is released under the CreativeML OpenRAIL-M license.
Base Model License: CreativeML OpenRAIL-M
Fine-tuned Weights: Same as base model
See: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/LICENSE
π€ Citation
If you use this model in your research, please cite:
@misc{ts_sd2_finetuned_2024,
title={Fine-tuned Stable Diffusion 2 for Time Series Image Inpainting},
author={[Dariusz Kobiela, JarosΕaw Kobiela, Adam Kurowski, Agnieszka Landowska]},
year={2025},
howpublished={Trained on synthetic time series dataset},
note={Fine-tuned from stabilityai/stable-diffusion-2-inpainting}
}
π§ Support
For questions or issues:
- Check the troubleshooting section above
- Review training logs in
../cross_validation_results.json - Consult
finetune_stable_diffusion.pydocumentation - Check original Stable Diffusion 2 documentation
Model Version: 1.0
Last Updated: 1.12.2025
Status: Production Ready β
- Downloads last month
- 27
Model tree for Daro77/stable-diffusion-2-inpainting-gaf-mtf-rp-spec
Base model
stabilityai/stable-diffusion-2-inpainting