AOpen
Wan 2.1
Alibaba (Tongyi Lab)
The foundation that started it all. 1.3B variant runs on virtually any GPU. First open model to beat closed-source across benchmarks.
AOpen
LTX-2
Lightricks
The predecessor to 2.3. 19B params, native 4K + audio, 30 camera moves. Still excellent and widely deployed.
Pick Wan 2.1 if…
You want consumer GPU workflows, academic research, or Chinese + English text-in-video.
Pick LTX-2 if…
You want production video pipelines, camera-controlled generation, or depth/pose-driven workflows.
Specifications
Maker
Alibaba (Tongyi Lab)
Lightricks
Source Type
Open Source
Open Source
License
Apache 2.0
Apache 2.0 (<$10M rev)
Architecture
Flow Matching DiT + 3D Causal VAE
DiT + 3D Causal VAE
Parameters
14B (also 1.3B variant)
19B (14B video + 5B audio)
Max Resolution
720p
4K
Max Duration
5s
20s
FPS
24
Up to 50
Native Audio
No
Yes
ComfyUI Support
Yes
Yes
Fine-tunable
Yes
Yes
Min VRAM
8.19GB (1.3B) / 24GB+ (14B)
12GB+ (distilled) / 24GB+ (full)
Cost / Second
Self-host
$0.04
Inputs
T2V (14B/1.3B), I2V (14B), FLF2V, VACE, V2A
T2V, I2V, V2V, Audio-to-Video, Depth, OpenPose, Camera Control
On Floyo
Yes
Yes
Strengths & Trade-offs
Wan 2.1
Strengths
- +SOTA open-source at launch
- +1.3B model runs on any consumer GPU (8.19GB VRAM)
- +first video model with Chinese + English text generation
- +Wan-VAE encodes unlimited-length 1080P
- +T2V/I2V/Video Editing/T2I/V2A all supported
Trade-offs
- -720p max
- -5s duration
- -1.3B quality limited
- -no native audio generation
- -superseded by 2.2 on quality
Best For
- →Budget local deployment
- →consumer GPU workflows
- →academic research
- →Chinese + English text-in-video
LTX-2
Strengths
- +19B params (14B video + 5B audio)
- +native 4K at 50fps
- +first open model with unified audio-video
- +30 cinematic camera moves
- +depth-aware generation
Trade-offs
- -Superseded by 2.3 on detail and audio quality
- -LoRAs not compatible with 2.3
- -texture drift every 8-10 frames
- -in-scene text issues
Best For
- →Production video pipelines
- →camera-controlled generation
- →depth/pose-driven workflows
- →budget 4K content