this post was submitted on 06 Jun 2026
2 points (100.0% liked)

Stable Diffusion

5678 readers
3 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 3 years ago
MODERATORS
 

Abstract

Long video generation still suffers from error accumulation, weak temporal coherence, and prohibitive latency, limiting its applicability to interactive scenarios. We present JoyAI-Echo, a framework that breaks these barriers through four key advances. Central to its performance, a cross-modal audio-visual memory bank preserves character appearance and voice timbre consistently over five-minute videos, while a post-training pipeline combines memory-based reinforcement learning with distribution matching distillation for a 7.5× speedup to substantially boost visual quality and alignment. Empowered by these two components, JoyAI-Echo decisively outperforms HappyOyster (directing mode) on long-form generation and even surpasses the short-video specialist Wan 2.6 on human-centric tasks. Beyond raw generation quality, an interactive agent enables real-time user editing through conversational instructions, and a lightweight super-resolution module maintains high definition under streaming latency, further elevating the overall experience and delivering instantly editable, conversation-speed video creation. For the first time, JoyAI-Echo simultaneously achieves long-range cross-modal consistency, real-time inference for minute-long video, conversational interactivity, and high-resolution output — without compromise, inaugurating a new era of interactive video generation. Codes and weights will be open-sourced.

Paper: https://www.researchgate.net/publication/405770309_JoyAI-Echo_Pushing_the_Frontier_of_Long_Audio-Visual_Generation

Code: https://github.com/jd-opensource/JoyAI-Echo

Hugging Face: https://huggingface.co/jdopensource/JoyAI-Echo

Project Page: https://echo-team-joy-future-academy-jd.github.io/Echo-LongVideo-Page/

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here