Amdadul Haque Milon

Posted on Feb 27

How to Use WAN 2.1 with Comfy UI on Mac, Windows, and Linux: A Comprehensive Guide

On February 25, 2025, Alibaba Cloud shook up the AI landscape by open-sourcing Wan 2.1, a cutting-edge video generation model from the acclaimed Tongyi series. This breakthrough technology transforms text prompts into mesmerizing videos, capturing intricate movements and spatial details with outstanding precision. With a VBench score of 84.7%, robust multilingual support, and free access, Wan 2.1 is rapidly earning its place among industry heavyweights such as OpenAI’s Sora, Minimax, Kling from Kuaishou, and Google’s Veo 2.

Adding to the excitement, ComfyUI recently announced native support for Wan 2.1. As shared on Twitter, the update brings a range of options—including 14B and 1.3B model variants—capable of high-quality 720p generation on systems with 40GB VRAM, and optimized down to 15GB VRAM for the 1.3B model, all while maintaining a strong success rate. Get ready for a new wave of open video model releases that promise to redefine what’s possible in AI video creation.

If you don't want to go through all this hassle, try Anakin AI. This all-in-one platform lets you jump straight into AI video creation, and not only supports WAN 2.1 but also models like Runway ML, Minimax, LTX Video, and more. Whether you're in the mood for a quick start or eager to experiment with different models, Anakin AI streamlines everything so you can generate stunning videos right away. If you prefer a hands-on approach, this guide will walk you through using WAN 2.1 with Comfy UI on Mac, Windows, and Linux—from installation and configuration to advanced video generation techniques. Join me as we explore the exciting future of AI-driven video creation!

Wan 2.1 Text to Video AI Video Generator | Free AI tool | Anakin

Wan 2.1 Text to Video AI Video Generator is an innovative app that transforms written text into dynamic, high-quality videos using advanced AI, enabling users to create professional visual content in minutes with customizable templates, styles, and voiceovers.

app.anakin.ai

Let's Talk About The System Preparations

Before you start, ensure your system meets the necessary hardware and software requirements.

Hardware Specifications

Minimum:
GPU: NVIDIA GTX 1080 (8GB VRAM) or Apple M1
RAM: 16GB DDR4
Storage: 15GB SSD space for models and dependencies
Recommended:
GPU: NVIDIA RTX 4090 (24GB VRAM) or Apple M3 Max
RAM: 32GB DDR5
Storage: NVMe SSD with at least 50GB capacity

Software Dependencies

Python: Versions 3.10 to 3.11 (3.11.6 works best for Apple Silicon)
PyTorch: Version 2.2+ with CUDA 12.1 (for Windows/Linux) or Metal support (for macOS)
FFmpeg: Version 6.1 for video encoding/decoding
Drivers: NVIDIA Studio Drivers 550+ for Windows/Linux

Installing ComfyUI on Different Platforms

Follow these detailed steps to set up ComfyUI, an essential part of learning how to use WAN 2.1 with Comfy UI.

Windows Installation

Method A: ComfyUI Desktop (Official Beta)

Download: Get the ComfyUI_Desktop_Windows_0.9.3b.exe from comfyui.org/downloads.
Run Installer: Execute the installer, ensuring NVIDIA GPU acceleration is enabled.
Verification: Open a command prompt and run:.\run_nvidia_gpu.bat

Method B: Manual Build

Clone the Repository:

git clone https://github.com/comfyanonymous/ComfyUI cd ComfyUI

Setup Virtual Environment: python -m venv venv venv\Scripts\activate
Install PyTorch: pip install torch==2.2.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
Install Requirements: pip install -r requirements.txt

macOS Installation (M1/M2/M3.....)

Install Homebrew (if needed): /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Python & FFmpeg: brew install python@3.11 ffmpeg
Clone and Setup ComfyUI: git clone https://github.com/comfyanonymous/ComfyUI cd ComfyUI python3.11 -m pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/torch_stable.html pip3 install -r requirements.txt

Linux Installation (Native/WSL2)

For WSL2:

Install WSL2 with Ubuntu 22.04: wsl --install -d Ubuntu-22.04
Update and Upgrade: sudo apt update && sudo apt full-upgrade -y

Deploying ComfyUI:

Clone the Repository: git clone https://github.com/comfyanonymous/ComfyUI
Setup Conda Environment (Recommended): conda create -n comfy python=3.10 conda activate comfy
Install PyTorch with CUDA: pip install torch==2.2.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
Install Requirements: pip install -r requirements.txt

Integrating the WAN 2.1 Model

With ComfyUI up and running, it's time to integrate the WAN 2.1 model.

Model Acquisition and Setup

Download Weights:
wan_2.1_base.safetensors (approx. 8.4GB)
wan_2.1_vae.pth (approx. 1.2GB) Use your preferred download method (e.g., wget).
File Placement:
Place wan_2.1_base.safetensors in ComfyUI/models/checkpoints/
Place wan_2.1_vae.pth in ComfyUI/models/vae/

Custom Nodes Installation

Enhance your workflow by installing essential custom nodes:

Navigate to the Custom Nodes Directory: cd ComfyUI/custom_nodes
Clone Essential Extensions: git clone https://github.com/WASasquatch/was-node-suite-comfyui git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite

These extensions add features like video frame interpolation and batch processing, streamlining your video creation process.

Configuring Your Workflow for WAN 2.1

Learn how to build the ideal pipeline for generating stunning videos with WAN 2.1.

Setting Up the Text-to-Video Pipeline

A simplified pipeline might include:

Load Checkpoint Node: Loads your WAN 2.1 model weights.
CLIPTextEncode Node: Converts text prompts (e.g., “A cybernetic dragon soaring through nebula clouds”) into conditioning data.
WANSampler Node: Samples the latent space with parameters such as:
Resolution: 1024×576 frames
Frames: 48 (adjustable)
Motion Scale: Typically between 1.2 and 2.5 for smooth transitions.
VAEDecode Node: Decodes the latent data into a final video output.

Parameter Tweaks & Optimization

Motion Scale: Experiment with values around 1.8 for balanced transitions.
Temporal Attention: Settings between 0.85 and 0.97 maintain consistency.
Noise Schedule & Frame Interpolation: Utilize options like Karras and FilmNet to reduce artifacts.
Hybrid Inputs: Combine reference images and depth maps to enhance style transfer and create a pseudo-3D effect.

Advanced Video Generation Techniques

Elevate your projects with these advanced methods:

Multi-Image Referencing

Style Transfer: Use multiple reference images to alter the art style.
Depth Map Conditioning: Introduce depth maps to simulate a 3D effect.
ControlNet & Pose Estimation: Guide the model using human poses or object positioning for refined results.

Camera Motion Simulation

Simulate dynamic camera movements using the CameraController node:

Orbit Speed: e.g., 0.12
Dolly Zoom: e.g., -0.05
Roll Variance: e.g., 2.7 These settings add a cinematic flair to your videos, making them truly captivating.

Performance Optimization & Troubleshooting

VRAM Management Techniques

Keep your system efficient:

Frame Caching: Enable by setting enable_offload_technique = True and using an aggressive VRAM optimization mode.
Mixed Precision: Boost performance with: torch.set_float32_matmul_precision('medium')

Troubleshooting Common Issues

Black Frame Output: Verify that the VAE file (wan_2.1_vae.pth) matches your model version and check your temporal attention settings.
VRAM Overflow: Launch ComfyUI with flags like --medvram and --xformers to ease memory usage.
Log Analysis: Inspect comfy.log for any ERROR or CRITICAL messages to diagnose problems quickly.

Platform-Specific Installation Differences

Understanding the nuances across platforms is key when exploring how to use WAN 2.1 with Comfy UI.

Windows

Traditional Method:

Involves a portable ZIP extraction, manual Python environment setup, and batch file execution (e.g., run_nvidia_gpu.bat).
Requires a separate 7‑Zip installation and manual CUDA toolkit configuration. #### V1 Desktop App:
A one-click installer (approximately a 200MB bundled package) that automates dependency resolution and setup.

macOS

Traditional Method:

Uses Homebrew to install core packages and requires manual Python/MPS configuration.
Launched via Terminal with a mandatory Python 3.11+ for optimal Apple Silicon performance. #### V1 Desktop App:
Delivered as a universal .dmg package with an integrated Python environment, simplifying the installation process.

Linux

Traditional Method:

Relies on terminal-based cloning, conda or pip management, and manual installation of NVIDIA/AMD drivers.
May require adjustments for AppArmor/SELinux policies. #### V1 Desktop App:**
Offers code-signed binaries (via AppImage/DEB packages) that streamline dependency management and updates.

The V1 Desktop App significantly reduces cross-platform installation headaches by providing automatic dependency resolution and a unified model library.

If you don't want to go through all this hassle, try Anakin AI. This all-in-one platform lets you jump straight into AI video creation, supporting not just WAN 2.1 but also models like Runway ML, Minimax, LTX Video, and more. Whether you're looking for a quick start or want to experiment with multiple AI video generation models, Anakin AI has you covered. For those who prefer a hands-on approach, this guide details how to use WAN 2.1 with Comfy UI on Mac, Windows, and Linux—from installation and configuration to advanced video generation techniques. Join me as we explore the future of AI-driven video creation!

Final Thoughts

WAN 2.1, paired with Comfy UI, opens up a world of creative possibilities for AI-driven video generation. Whether you're setting up on Windows, macOS, or Linux, this guide has provided everything you need—from hardware requirements to advanced techniques—to help you create stunning videos. Embrace this breakthrough technology, experiment with various models, and transform your creative visions into reality. Happy video making!