Matiur Rahman Minar1, Seunghun Oh2, Ganghyeon Jeong2, Unsang Park1,2
1Department of Computer Science and Engineering, Sogang University 2Department of Artificial Intelligence, Sogang University
Evaluating Steady-Forcing across fixed-camera real-world streams at 60s, 120s, and 240s durations.
A 60-second fixed-camera street food scene with steady background composition.
A calm 60-second sea beach stream with minimal motion and strong spatial persistence.
A 60-second mountain stream showcasing continuous river motion under a fixed viewpoint.
A 120-second forest river sequence with sustained water motion and static framing.
A 120-second waterfall stream demonstrating continuous fluid dynamics from a fixed camera.
A 120-second nighttime rainstorm stream with wet urban details and stationary framing.
A full 240-second fixed-camera smoke sequence with continuous motion and solid spatial consistency.
A 240-second forest river with a consistent steady view and continuous water motion.
A 240-second fixed-camera ocean wave sequence with persistent continuity.
Standard autoregressive video generation models suffer from a fundamental trade-off over extended evaluation horizons: they either suffer from severe background drift or experience complete motion collapse. Steady-Forcing breaks this bottleneck by decoupling spatial persistence from motion continuity. Operating entirely at inference time without retraining, it leverages a structural dual-memory protocol to successfully lock down static background anchors while simultaneously sustaining continuous, natural fluid motion dynamics.
Baseline rollout suffers from gradual structural warping and background element drifting.
Our dual-memory pipeline maintains rigorous background stability alongside ongoing fluid animation.
Qualitative evaluations demonstrating Steady-Forcing's ability to maintain high fidelity, static structural layouts and non-decaying natural motion paths.
Showcasing long-horizon generative streams that preserve background geometry across hundreds of frames.
Fine-grained fluid and motion transitions applied seamlessly over stationary background compositions.
Coordinated landscape rollouts demonstrating multi-component conditioning, managing localized motion boundaries, and synchronized ecosystem behaviors.
Demonstrations of introducing progressive atmospheric variations and environmental states onto an unyielding background layout.
Modulating global illumination, dynamic cloud occlusion, and time-of-day progression across continuous nature rollouts.
New user-uploaded fixed-tripod scenes demonstrating stable background persistence across real-world coastal, urban, and forest environments.
@article{minar2025steady,
title={Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion},
author={Minar, Matiur Rahman and Oh, Seunghun and Jeong, Ganghyeon and Park, Unsang},
journal={arXiv preprint arXiv:2606.7661673},
year={2026}
}