- β Minimalist T2I Workflow for WAN Video 2.2
- β this one β Minimalist First-Last Frame to Video Workflow for WAN Video 2.2
-
π
Minimalist FMLF (First-Middle-Last Frame) + Multi Frame Ref To Video Workflow for WAN Video 2.2(Coming Soon) -
π
Join Videos (Snippets) β Track Operations, Python Scripting: Be a Storyteller - Seamless Narrative Chain(Coming Soon)
Minimalist First-Last Frame to Video Workflow for WAN Video 2.2
A streamlined image-to-video workflow utilizing WAN Video 2.2's I2V High and Low Noise models (14B fp8) with the specialized "WanFirstLastFrameToVideo" node. This workflow generates smooth video transitions between two static images (first and last frame).
Workflow Structure
- Frame Conditioning: Load two images (first and last frame) via LoadImage nodes, "WanFirstLastFrameToVideo" node processes both frames and creates conditioned latent space
- Text Conditioning: Positive and negative prompts guide the transition behavior
- Optional Post-Processing: Upscaling β RIFE Frame Interpolation β Video Combine β Audio integration
Key Features
First-Last Frame conditioning: Uses the "WanFirstLastFrameToVideo" node to create seamless transitions between two input images. Default 81 frames (~5s at 16fps).
Dual-stage sampling:
- Sequential processing with High Noise β Low Noise models using Lighting LoRAs (4-step)
- High Noise model (4 steps, start_at_step: 0, end_at_step: 2) - Low Noise model (4 steps, start_at_step: 2, end_at_step: 10000)
- Sequential processing with High Noise β Low Noise models using Lighting LoRAs (4-step)
Precise control: ModelSamplingSD3 nodes (shift parameter: 3-10) with KSampler Advanced 4 steps, CFG 1.0, Euler sampler with simple scheduler for fast generation.
Perfect for creating morphing transitions, emotional expression changes, or any transformation between two carefully crafted frames.
Optional Post-processing pipeline:
- 4x upscaling with UltraSharp model (or any upscale model), optional: scale down to the desired size
- RIFE frame interpolation (2x multiplier, doubles frame count 81 β 162 frames for smoother motion, ~5s at 30fps)
- Video Combine an image sequence into a video (frame_rate 30)
- Audio integration via "LoadAudio" node
Usage Tips
Image Preparation
- Both images should have the same resolution
- Ensure consistent composition and framing between first and last frame
- Consider upscaling input images beforehand for better quality
Prompting for WAN Video 2.2 (First-Last Frame)
- Single Positive Prompt β Must describe both frames (start and end) within one coherent scene β not two separate scenes
- Dynamics through Verbs & Motion Descriptions β Use active, temporal verbs: "rising, expanding, swirling, rotating, billowing, flickering, zooming in" β Avoid static terms!
- Clear Development from Beginning to End β Describe the transition: (Start state), then (End state) β e.g.: "transforms from dark void into...", "emotional expression shift from neutral to gentle smile"
- Focus on Central Animated Elements β What should move? (hair, eyes, lips?) β Explicitly name and animate these
- Emotion & Atmosphere as Guiding Motif β Mood (e.g. dark fantasy horror, apocalyptic, cinematic) ties everything together β not just objects
- Negative Prompt as Safeguard β Block static, unwanted styles
Core Rule: Do not describe what happens β describe how the scene changes.
A good prompt allows the model run to infer the animation itself, by linking the initial state and the final state in one flowing, visually coherent sentence.
Pure image-to-image animation / optical flow
Core Rule II: descripe nothing - an empty prompt. π
To work with an empty prompt (pure image-to-image animation / optical flow), both frames must have:
- identical composition and perspective,
- consistent style,
- a clear visual difference,
- and a well-defined transformation or motion.
π External link (right-click to open in new tab)
β show Banshee (HR Ginger) 3144x1375px, 30fps
Be a Storyteller - Seamless Narrative Chain
A series of videos linked so that the last frame becomes the first of the next, forming a continuous visual story.
This point is meant only as a supplement; it will (at some point) become Part 4, βJoin Videos (Snippet) β Track Operations,β of this small how-to.
Core Rule: Every video must begin where the previous one ends β in composition, perspective, light, and narrative state.
As a checklist for each new segment: Before generating Video n+1, ask yourself:
- Is its first frame identical to the last frame of Video n?
- Is the direction of motion consistent?
- Does the mood remain intact?
- Does the composition (zoom, framing, perspective) stay stable?
π External link (right-click to open in new tab)
β Hel unhallowed 1664x2496px, 30fps
There's an artificial pause of 0.23 hundredths of a second between each video snippet.
Requirements
β οΈ Note: All model links below are direct download links. Clicking them will immediately start downloading the files.
Models
- wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors
- wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors
- wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors
- wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors
- wan_2.1_vae.safetensors
- umt5_xxl_fp8_e4m3fn_scaled.safetensors
Optional (Post-processing)
π External links (right-click to open in new tab):
- Upscaling: 4x-UltraSharp.pth - OpenModelDB
- Frame Interpolation: https://github.com/Fannovel16/ComfyUI-Frame-Interpolation
- Video Combine: https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite
Installation
- Download all required model files (see Requirements section)
- Place files in their respective ComfyUI directories:
ComfyUI/
βββ models/
β βββ diffusion_models/
β β βββ wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors
β β βββ wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors
β βββ loras/
β β βββ wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors
β β βββ wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors
β βββ text_encoders/
β β βββ umt5_xxl_fp8_e4m3fn_scaled.safetensors
β βββ vae/
β β βββ wan_2.1_vae.safetensors
β βββ upscale_models/ (optional)
β β βββ 4x-UltraSharp.pth
β βββ RIFE/ (optional)
β β βββ rife47.pth
β βββ Video Helper Suite/ (optional)
β β βββ video combine
- Load the workflow JSON file in ComfyUI
- Load your first and last frame images
- Adjust resolution in the "WanFirstLastFrameToVideo" node based on your VRAM
Performance
β οΈ Using the Lighting LoRAs significantly reduces generation time but may result in reduced video dynamics compared to the base models, but without LoRA's, it's unbearable π¬
24GB VRAM (Tested on RTX 3090|4090)
- Standard resolutions: (portrait)
- Portrait: 720x1280px, 832x1216px, 832x1248px, 1024x1536px
- Landscape: 1280x720px, 1216x832px, 1248x832px, 1536x1024px
- Ultra-wide (21:9): 1392x592px, 1536x672px
Credits
- WAN Video 2.2 models by Alibaba Group
- Post-processing: UltraSharp by Kim2091, ComfyUI-Frame-Interpolation by Fannovel16, ComfyUI-VideoHelperSuite by Kosinkadink
Model tree for GegenDenTag/comfyui-wan-video-2.2-t2v-first-last-frame-workflow
Base model
Wan-AI/Wan2.2-I2V-A14B