On February 25, 2025, Alibaba Cloud shook up the AI landscape by open-sourcing Wan 2.1, a cutting-edge video generation model from the acclaimed Tongyi series. This breakthrough technology transforms text prompts into mesmerizing videos, capturing intricate movements and spatial details with outstanding precision. With a VBench score of 84.7%, robust multilingual support, and free access, Wan 2.1 is rapidly earning its place among industry heavyweights such as OpenAI’s Sora, Minimax, Kling from Kuaishou, and Google’s Veo 2.
Adding to the excitement, ComfyUI recently announced native support for Wan 2.1. As shared on Twitter, the update brings a range of options—including 14B and 1.3B model variants—capable of high-quality 720p generation on systems with 40GB VRAM, and optimized down to 15GB VRAM for the 1.3B model, all while maintaining a strong success rate. Get ready for a new wave of open video model releases that promise to redefine what’s possible in AI video creation.
If you don't want to go through all this hassle, try Anakin AI. This all-in-one platform lets you jump straight into AI video creation, and not only supports WAN 2.1 but also models like Runway ML, Minimax, LTX Video, and more. Whether you're in the mood for a quick start or eager to experiment with different models, Anakin AI streamlines everything so you can generate stunning videos right away. If you prefer a hands-on approach, this guide will walk you through using WAN 2.1 with Comfy UI on Mac, Windows, and Linux—from installation and configuration to advanced video generation techniques. Join me as we explore the exciting future of AI-driven video creation!
Let's Talk About The System Preparations
Before you start, ensure your system meets the necessary hardware and software requirements.
Hardware Specifications
Minimum:
GPU: NVIDIA GTX 1080 (8GB VRAM) or Apple M1
RAM: 16GB DDR4
Storage: 15GB SSD space for models and dependencies
Recommended:
GPU: NVIDIA RTX 4090 (24GB VRAM) or Apple M3 Max
RAM: 32GB DDR5
Storage: NVMe SSD with at least 50GB capacity
Software Dependencies
Python: Versions 3.10 to 3.11 (3.11.6 works best for Apple Silicon)
PyTorch: Version 2.2+ with CUDA 12.1 (for Windows/Linux) or Metal support (for macOS)
FFmpeg: Version 6.1 for video encoding/decoding
Drivers: NVIDIA Studio Drivers 550+ for Windows/Linux
Installing ComfyUI on Different Platforms
Follow these detailed steps to set up ComfyUI, an essential part of learning how to use WAN 2.1 with Comfy UI.
Windows Installation
Method A: ComfyUI Desktop (Official Beta)
-
Download: Get the
ComfyUI_Desktop_Windows_0.9.3b.exe
from comfyui.org/downloads. - Run Installer: Execute the installer, ensuring NVIDIA GPU acceleration is enabled.
-
Verification: Open a command prompt and run:
.\run_nvidia_gpu.bat
Method B: Manual Build
- Clone the Repository:
git clone https://github.com/comfyanonymous/ComfyUI cd ComfyUI
-
Setup Virtual Environment:
python -m venv venv venv\Scripts\activate
-
Install PyTorch:
pip install torch==2.2.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
-
Install Requirements:
pip install -r requirements.txt
macOS Installation (M1/M2/M3.....)
- Install Homebrew (if needed): /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
Install Python & FFmpeg:
brew install python@3.11 ffmpeg
-
Clone and Setup ComfyUI:
git clone https://github.com/comfyanonymous/ComfyUI cd ComfyUI python3.11 -m pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/torch_stable.html pip3 install -r requirements.txt
Linux Installation (Native/WSL2)
For WSL2:
-
Install WSL2 with Ubuntu 22.04:
wsl --install -d Ubuntu-22.04
-
Update and Upgrade:
sudo apt update && sudo apt full-upgrade -y
Deploying ComfyUI:
-
Clone the Repository:
git clone https://github.com/comfyanonymous/ComfyUI
-
Setup Conda Environment (Recommended):
conda create -n comfy python=3.10 conda activate comfy
-
Install PyTorch with CUDA:
pip install torch==2.2.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
-
Install Requirements:
pip install -r requirements.txt
Integrating the WAN 2.1 Model
With ComfyUI up and running, it's time to integrate the WAN 2.1 model.
Model Acquisition and Setup
- Download Weights:
-
wan_2.1_base.safetensors
(approx. 8.4GB) -
wan_2.1_vae.pth
(approx. 1.2GB) Use your preferred download method (e.g.,wget
). - File Placement:
- Place
wan_2.1_base.safetensors
inComfyUI/models/checkpoints/
- Place
wan_2.1_vae.pth
inComfyUI/models/vae/
Custom Nodes Installation
Enhance your workflow by installing essential custom nodes:
-
Navigate to the Custom Nodes Directory:
cd ComfyUI/custom_nodes
-
Clone Essential Extensions:
git clone https://github.com/WASasquatch/was-node-suite-comfyui git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite
These extensions add features like video frame interpolation and batch processing, streamlining your video creation process.
Configuring Your Workflow for WAN 2.1
Learn how to build the ideal pipeline for generating stunning videos with WAN 2.1.
Setting Up the Text-to-Video Pipeline
A simplified pipeline might include:
- Load Checkpoint Node: Loads your WAN 2.1 model weights.
- CLIPTextEncode Node: Converts text prompts (e.g., “A cybernetic dragon soaring through nebula clouds”) into conditioning data.
- WANSampler Node: Samples the latent space with parameters such as:
- Resolution: 1024×576 frames
- Frames: 48 (adjustable)
- Motion Scale: Typically between 1.2 and 2.5 for smooth transitions.
- VAEDecode Node: Decodes the latent data into a final video output.
Parameter Tweaks & Optimization
- Motion Scale: Experiment with values around 1.8 for balanced transitions.
- Temporal Attention: Settings between 0.85 and 0.97 maintain consistency.
- Noise Schedule & Frame Interpolation: Utilize options like Karras and FilmNet to reduce artifacts.
- Hybrid Inputs: Combine reference images and depth maps to enhance style transfer and create a pseudo-3D effect.
Advanced Video Generation Techniques
Elevate your projects with these advanced methods:
Multi-Image Referencing
- Style Transfer: Use multiple reference images to alter the art style.
- Depth Map Conditioning: Introduce depth maps to simulate a 3D effect.
- ControlNet & Pose Estimation: Guide the model using human poses or object positioning for refined results.
Camera Motion Simulation
Simulate dynamic camera movements using the CameraController
node:
- Orbit Speed: e.g., 0.12
- Dolly Zoom: e.g., -0.05
- Roll Variance: e.g., 2.7 These settings add a cinematic flair to your videos, making them truly captivating.
Performance Optimization & Troubleshooting
VRAM Management Techniques
Keep your system efficient:
-
Frame Caching: Enable by setting
enable_offload_technique = True
and using an aggressive VRAM optimization mode. -
Mixed Precision: Boost performance with:
torch.set_float32_matmul_precision('medium')
Troubleshooting Common Issues
-
Black Frame Output: Verify that the VAE file (
wan_2.1_vae.pth
) matches your model version and check your temporal attention settings. -
VRAM Overflow: Launch ComfyUI with flags like
--medvram
and--xformers
to ease memory usage. -
Log Analysis: Inspect
comfy.log
for any ERROR or CRITICAL messages to diagnose problems quickly.
Platform-Specific Installation Differences
Understanding the nuances across platforms is key when exploring how to use WAN 2.1 with Comfy UI.
Windows
Traditional Method:
- Involves a portable ZIP extraction, manual Python environment setup, and batch file execution (e.g.,
run_nvidia_gpu.bat
). - Requires a separate 7‑Zip installation and manual CUDA toolkit configuration. #### V1 Desktop App:
- A one-click installer (approximately a 200MB bundled package) that automates dependency resolution and setup.
macOS
Traditional Method:
- Uses Homebrew to install core packages and requires manual Python/MPS configuration.
- Launched via Terminal with a mandatory Python 3.11+ for optimal Apple Silicon performance. #### V1 Desktop App:
- Delivered as a universal .dmg package with an integrated Python environment, simplifying the installation process.
Linux
Traditional Method:
- Relies on terminal-based cloning, conda or pip management, and manual installation of NVIDIA/AMD drivers.
- May require adjustments for AppArmor/SELinux policies. #### V1 Desktop App:**
- Offers code-signed binaries (via AppImage/DEB packages) that streamline dependency management and updates.
The V1 Desktop App significantly reduces cross-platform installation headaches by providing automatic dependency resolution and a unified model library.
If you don't want to go through all this hassle, try Anakin AI. This all-in-one platform lets you jump straight into AI video creation, supporting not just WAN 2.1 but also models like Runway ML, Minimax, LTX Video, and more. Whether you're looking for a quick start or want to experiment with multiple AI video generation models, Anakin AI has you covered. For those who prefer a hands-on approach, this guide details how to use WAN 2.1 with Comfy UI on Mac, Windows, and Linux—from installation and configuration to advanced video generation techniques. Join me as we explore the future of AI-driven video creation!
Final Thoughts
WAN 2.1, paired with Comfy UI, opens up a world of creative possibilities for AI-driven video generation. Whether you're setting up on Windows, macOS, or Linux, this guide has provided everything you need—from hardware requirements to advanced techniques—to help you create stunning videos. Embrace this breakthrough technology, experiment with various models, and transform your creative visions into reality. Happy video making!
Top comments (5)
Very useful content, Thank you Amdad I learned a lot!
Thx for the useful tutorial, Amdad!
Glad you like it
Awesome article!
glad you like it