We basically have two types of Upscaling options available to us:

Think of it as StreamDiffusion v1 (SISR) vs StreamDiffusion v2 (VSR)

For the specific use case of Real-Time AI Video Streaming, the open-source solutions have two options: High Fidelity (Too Slow/Expensive) or High Speed (Low Stability).

Note: Most of the research/models out there are for 4x scaling rather than 2x. And also BasicVSR++, RealBasicVSR and other realtime VSR solutions out there are not being maintained and have proven to be extremely difficult to setup/install as the last commit is almost 3-4 years old.

Why FlashVSR is worth it

THE MATH

If we have to upscale with quality, we would ideally have two options (For this comparison let's take 512x512 → 1024x1024 and LongLive pipeline):

Case 1:

For 512x512 generation on a H100 with LongLive pipeline we get about 25 FPS, for the ease of understanding let's assume the latency per frame is 40 ms

FlashVSR to upscale from 512x512 to 1024x1024 at 31.2 FPS, again for ease the latency will be 32

So we end up with about 72 ms latency per frame which is 13.9 FPS for 1024x1024.

[LongLive 512x512 output

](attachment:4fed76ca-00d2-474c-a480-844dec3db79c:output_512x512.mp4)

LongLive 512x512 output