Floyo Model Compare — Find the right AI video model

AOpen

LTX-2

Lightricks

The predecessor to 2.3. 19B params, native 4K + audio, 30 camera moves. Still excellent and widely deployed.

AOpen

Wan 2.1

Alibaba (Tongyi Lab)

The foundation that started it all. 1.3B variant runs on virtually any GPU. First open model to beat closed-source across benchmarks.

Pick LTX-2 if…

You want production video pipelines, camera-controlled generation, or depth/pose-driven workflows.

Pick Wan 2.1 if…

You want consumer GPU workflows, academic research, or Chinese + English text-in-video.

Specifications

Maker

Lightricks

Alibaba (Tongyi Lab)

Source Type

Open Source

License

Apache 2.0 (<$10M rev)

Apache 2.0

Architecture

DiT + 3D Causal VAE

Flow Matching DiT + 3D Causal VAE

Parameters

19B (14B video + 5B audio)

14B (also 1.3B variant)

Max Resolution

720p

Max Duration

20s

FPS

Up to 50

Native Audio

Yes

ComfyUI Support

Yes

Fine-tunable

Yes

Min VRAM

12GB+ (distilled) / 24GB+ (full)

8.19GB (1.3B) / 24GB+ (14B)

Cost / Second

$0.04

Self-host

Inputs

T2V, I2V, V2V, Audio-to-Video, Depth, OpenPose, Camera Control

T2V (14B/1.3B), I2V (14B), FLF2V, VACE, V2A

On Floyo

Yes

Strengths & Trade-offs

LTX-2

Strengths

+19B params (14B video + 5B audio)
+native 4K at 50fps
+first open model with unified audio-video
+30 cinematic camera moves
+depth-aware generation

Trade-offs

-Superseded by 2.3 on detail and audio quality
-LoRAs not compatible with 2.3
-texture drift every 8-10 frames
-in-scene text issues

Best For

→Production video pipelines
→camera-controlled generation
→depth/pose-driven workflows
→budget 4K content

Wan 2.1

Strengths

+SOTA open-source at launch
+1.3B model runs on any consumer GPU (8.19GB VRAM)
+first video model with Chinese + English text generation
+Wan-VAE encodes unlimited-length 1080P
+T2V/I2V/Video Editing/T2I/V2A all supported

Trade-offs

-720p max
-5s duration
-1.3B quality limited
-no native audio generation
-superseded by 2.2 on quality

Best For

→Budget local deployment
→consumer GPU workflows
→academic research
→Chinese + English text-in-video

Run these models on Floyo

Browser-based ComfyUI. No setup, no GPU required.

T2VLTX-2

LTX 2 19B Fast Text to Video

3.6k runs

Open Floyo →