Floyo Model Compare — Find the right AI video model

SClosed

Veo 3.1

Google DeepMind

The most complete video generation system. Native audio, ingredients-to-video, style matching, character consistency, scene extension, object manipulation, camera/motion controls. State-of-art on MovieGenBench. The Swiss army knife of AI video, but at a premium.

SClosed

Wan 2.6

Alibaba (Tongyi Lab)

Closed-source evolution of Wan. Adds reference-to-video for character consistency, multi-shot narratives, 5 aspect ratios, and 15s duration. Native audio with synced dialogue carried over from 2.5.

Pick Veo 3.1 if…

You want audio-native cinematic content. Dialogue scenes with natural lip sync. Sound design-heavy pieces. Cinematic one-takes with ambient audio. Style-matched content (reference image). Character-consistent series. Professional 4K deliverables. VFX (add/remove objects, outpainting)..

Pick Wan 2.6 if…

You want cross-platform content (all aspect ratios), character-consistent narratives (ref-to-video), or audio-synced social content.

Specifications

Maker

Google DeepMind

Alibaba (Tongyi Lab)

Source Type

Closed Source

License

Commercial (subscription)

Alibaba Commercial

Architecture

Proprietary (state-of-art T2V, I2V, T2A+V)

DiT + MoE (evolved)

Parameters

Undisclosed

Max Resolution

1080p and 4K

720p / 1080p

Max Duration

8s base (extendable via scene extension)

Up to 15s

FPS

24-30

Native Audio

Yes

ComfyUI Support

Yes

Fine-tunable

Min VRAM

Cloud only

Cloud / API

Cost / Second

$0.20

$0.05

Inputs

T2V, I2V (ingredients-to-video), Style Reference, Character Reference, Scene Extension, First+Last Frame, Outpainting, Add/Remove Object, Camera Controls, Motion Controls, Character Controls (body/face/voice drive)

T2V, I2V, Reference-to-Video (1-3 refs via @Video1/@Video2/@Video3)

On Floyo

Yes

Strengths & Trade-offs

Veo 3.1

Strengths

+Best native audio (dialogue + SFX + ambient + music, generated natively in same pass). State-of-art T2V per Meta MovieGenBench. Ingredients-to-video (1-3 reference images for scene/character/object). Style reference (match aesthetic from reference image). Character consistency across scenes. Scene extension with visual+audio consistency. First+last frame transitions. Outpainting for aspect ratio adaptation. Add/remove objects with physics-aware placement. Camera controls (dolly, zoom, pan). Motion controls (draw object paths). Character controls (body+face+voice drive animation). 1080p and 4K output. SynthID watermarking.

Trade-offs

-8s base clips (needs scene extension for longer). Most expensive per second ($0.20). Short speech segments still being refined. Cloud-only (no self-hosting). No open weights. Limited to Google ecosystem (Gemini, Flow, AI Studio, Vertex AI).

Best For

→Audio-native cinematic content. Dialogue scenes with natural lip sync. Sound design-heavy pieces. Cinematic one-takes with ambient audio. Style-matched content (reference image). Character-consistent series. Professional 4K deliverables. VFX (add/remove objects, outpainting).

Wan 2.6

Strengths

+Fastest inference
+native audio with synced dialogue
+reference-to-video for character consistency (1-3 video refs)
+multi-shot with structured prompt syntax [0-3s]/[3-5s]
+expanded aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4)

Trade-offs

-Closed source (not self-hostable)
-reference-to-video limited to 5/10s (no 15s)
-800 char prompt limit
-multi-shot timing depends on prompt expansion quality
-check regional license terms

Best For

→Cross-platform content (all aspect ratios)
→character-consistent narratives (ref-to-video)
→audio-synced social content
→multilingual production

Run these models on Floyo

Browser-based ComfyUI. No setup, no GPU required.

I2VVeo 3.1

Veo 3.1 Image to Video (First + Last Frame)

1.5k runs

Ref-to-VideoWan 2.6

Wan 2.6 Reference to Video

Open Floyo →