Floyo Model Compare — Find the right AI video model

AOpen

Wan 2.1

Alibaba (Tongyi Lab)

The foundation that started it all. 1.3B variant runs on virtually any GPU. First open model to beat closed-source across benchmarks.

AOpen

Wan 2.2

Alibaba (Tongyi Lab)

MoE architecture with 27B total params but only 14B active. Trained on 65% more images and 83% more video than 2.1. Outperforms leading closed-source models on Wan-Bench 2.0.

Pick Wan 2.1 if…

You want consumer GPU workflows, academic research, or Chinese + English text-in-video.

Pick Wan 2.2 if…

You want cinematic style control, speech-to-video, or consumer GPU deployment (TI2V-5B).

Specifications

Maker

Alibaba (Tongyi Lab)

Source Type

Open Source

License

Apache 2.0

Architecture

Flow Matching DiT + 3D Causal VAE

DiT + MoE (2-expert: high-noise + low-noise)

Parameters

14B (also 1.3B variant)

27B total (14B active per step, 2x14B experts)

Max Resolution

720p

Max Duration

10-15s

FPS

Native Audio

ComfyUI Support

Yes

Fine-tunable

Yes

Min VRAM

8.19GB (1.3B) / 24GB+ (14B)

8GB (small) / 24GB (full)

Cost / Second

Self-host

Inputs

T2V (14B/1.3B), I2V (14B), FLF2V, VACE, V2A

T2V (A14B), I2V (A14B), TI2V (5B), S2V (14B)

On Floyo

Yes

Strengths & Trade-offs

Wan 2.1

Strengths

+SOTA open-source at launch
+1.3B model runs on any consumer GPU (8.19GB VRAM)
+first video model with Chinese + English text generation
+Wan-VAE encodes unlimited-length 1080P
+T2V/I2V/Video Editing/T2I/V2A all supported

Trade-offs

-720p max
-5s duration
-1.3B quality limited
-no native audio generation
-superseded by 2.2 on quality

Best For

→Budget local deployment
→consumer GPU workflows
→academic research
→Chinese + English text-in-video

Wan 2.2

Strengths

+First MoE in video diffusion
+27B total but only 14B active per step
+high-noise expert for layout + low-noise for detail
++65.6% more images and +83.2% more video training data vs 2.1
+cinematic aesthetic control (lighting, composition, contrast, color tone)

Trade-offs

-720p cap
-MoE needs careful threshold tuning (SNR-based)
-no native audio in base model (S2V is separate)
-newer ecosystem than 2.1

Best For

→Self-hosted production
→cinematic style control
→speech-to-video
→consumer GPU deployment (TI2V-5B)

Run these models on Floyo

Browser-based ComfyUI. No setup, no GPU required.

PreprocessingWan 2.2

Wan 2.2 Animate Preprocess (Kijai)

V2VWan 2.2

Wan 2.2 + Qwen V2V Restyle

T2VWan 2.2

Wan 2.2 T2V with UnifiedRew

CharacterWan 2.2

Wan 2.2 Animate Character Replacement

Open Floyo →