✅ Minimalist T2I Workflow for WAN Video 2.2
✅ this one → Minimalist First-Last Frame to Video Workflow for WAN Video 2.2
🔜 ~~Minimalist FMLF (First-Middle-Last Frame) + Multi Frame Ref To Video Workflow for WAN Video 2.2~~ (Coming Soon)
🔜 ~~Join Videos (Snippets) – Track Operations, Python Scripting: Be a Storyteller - Seamless Narrative Chain~~ (Coming Soon)

Minimalist First-Last Frame to Video Workflow for WAN Video 2.2

A streamlined image-to-video workflow utilizing WAN Video 2.2's I2V High and Low Noise models (14B fp8) with the specialized "WanFirstLastFrameToVideo" node. This workflow generates smooth video transitions between two static images (first and last frame).

First-Last-Frame.json

Workflow Structure

Frame Conditioning: Load two images (first and last frame) via LoadImage nodes, "WanFirstLastFrameToVideo" node processes both frames and creates conditioned latent space
Text Conditioning: Positive and negative prompts guide the transition behavior
Optional Post-Processing: Upscaling → RIFE Frame Interpolation → Video Combine → Audio integration

Key Features

First-Last Frame conditioning: Uses the "WanFirstLastFrameToVideo" node to create seamless transitions between two input images. Default 81 frames (~5s at 16fps).
Dual-stage sampling:
- Sequential processing with High Noise → Low Noise models using Lighting LoRAs (4-step)
  - High Noise model (4 steps, start_at_step: 0, end_at_step: 2) - Low Noise model (4 steps, start_at_step: 2, end_at_step: 10000)
Precise control: ModelSamplingSD3 nodes (shift parameter: 3-10) with KSampler Advanced 4 steps, CFG 1.0, Euler sampler with simple scheduler for fast generation.

Perfect for creating morphing transitions, emotional expression changes, or any transformation between two carefully crafted frames.
Optional Post-processing pipeline:
- 4x upscaling with UltraSharp model (or any upscale model), optional: scale down to the desired size
- RIFE frame interpolation (2x multiplier, doubles frame count 81 → 162 frames for smoother motion, ~5s at 30fps)
- Video Combine an image sequence into a video (frame_rate 30)
- Audio integration via "LoadAudio" node

Usage Tips

Image Preparation

Both images should have the same resolution
Ensure consistent composition and framing between first and last frame
Consider upscaling input images beforehand for better quality

Prompting for WAN Video 2.2 (First-Last Frame)

Single Positive Prompt → Must describe both frames (start and end) within one coherent scene – not two separate scenes
Dynamics through Verbs & Motion Descriptions → Use active, temporal verbs: "rising, expanding, swirling, rotating, billowing, flickering, zooming in" → Avoid static terms!
Clear Development from Beginning to End → Describe the transition: (Start state), then (End state) → e.g.: "transforms from dark void into...", "emotional expression shift from neutral to gentle smile"
Focus on Central Animated Elements → What should move? (hair, eyes, lips?) → Explicitly name and animate these
Emotion & Atmosphere as Guiding Motif → Mood (e.g. dark fantasy horror, apocalyptic, cinematic) ties everything together – not just objects
Negative Prompt as Safeguard → Block static, unwanted styles

Core Rule: Do not describe what happens – describe how the scene changes.

A good prompt allows the model run to infer the animation itself, by linking the initial state and the final state in one flowing, visually coherent sentence.

Hel Portrait | First-Last-Frame 1664x2496px - Dare to click — opens fixed-size copy.

Hello Jay | 1664x2496px 30fps

Pure image-to-image animation / optical flow

Core Rule II: descripe nothing - an empty prompt. 😉

To work with an empty prompt (pure image-to-image animation / optical flow), both frames must have:

identical composition and perspective,
consistent style,
a clear visual difference,
and a well-defined transformation or motion.

Dare to click — opens fixed-size copy, First-Last-Frame 3144x1375px

🔗 External link (right-click to open in new tab)
→ show Banshee (HR Ginger) 3144x1375px, 30fps

Be a Storyteller - Seamless Narrative Chain

A series of videos linked so that the last frame becomes the first of the next, forming a continuous visual story.

This point is meant only as a supplement; it will (at some point) become Part 4, “Join Videos (Snippet) – Track Operations,” of this small how-to.

Core Rule: Every video must begin where the previous one ends — in composition, perspective, light, and narrative state.

As a checklist for each new segment: Before generating Video n+1, ask yourself:

Is its first frame identical to the last frame of Video n?
Is the direction of motion consistent?
Does the mood remain intact?
Does the composition (zoom, framing, perspective) stay stable?

🔗 External link (right-click to open in new tab)
→ Hel unhallowed 1664x2496px, 30fps
There's an artificial pause of 0.23 hundredths of a second between each video snippet.

Requirements

⚠️ Note: All model links below are direct download links. Clicking them will immediately start downloading the files.

Models

Optional (Post-processing)

🔗 External links (right-click to open in new tab):

Installation

Download all required model files (see Requirements section)
Place files in their respective ComfyUI directories:

   ComfyUI/
   ├── models/
   │   ├── diffusion_models/
   │   │   ├── wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors
   │   │   └── wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors
   │   ├── loras/
   │   │   ├── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors
   │   │   └── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors
   │   ├── text_encoders/
   │   │   └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
   │   ├── vae/
   │   │   └── wan_2.1_vae.safetensors
   │   ├── upscale_models/ (optional)
   │   │   └── 4x-UltraSharp.pth
   │   └── RIFE/ (optional)
   │   │   └── rife47.pth
   │   └── Video Helper Suite/ (optional)
   │   │   └── video combine

Load the workflow JSON file in ComfyUI
Load your first and last frame images
Adjust resolution in the "WanFirstLastFrameToVideo" node based on your VRAM

Performance

⚠️ Using the Lighting LoRAs significantly reduces generation time but may result in reduced video dynamics compared to the base models, but without LoRA's, it's unbearable 😬

24GB VRAM (Tested on RTX 3090|4090)

Standard resolutions: (portrait)
Portrait: 720x1280px, 832x1216px, 832x1248px, 1024x1536px
Landscape: 1280x720px, 1216x832px, 1248x832px, 1536x1024px
Ultra-wide (21:9): 1392x592px, 1536x672px

Credits

WAN Video 2.2 models by Alibaba Group
Post-processing: UltraSharp by Kim2091, ComfyUI-Frame-Interpolation by Fannovel16, ComfyUI-VideoHelperSuite by Kosinkadink

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for GegenDenTag/comfyui-wan-video-2.2-t2v-first-last-frame-workflow

Base model

Wan-AI/Wan2.2-I2V-A14B

Finetuned

(35)

this model