Stable Diffusion

5677 readers

16 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Stable Diffusion Art (See its sidebar for more GenAI Art comms)
!aihorde@lemmy.dbzer0.com

Other communities

founded 2 years ago

MODERATORS

db0@lemmy.dbzer0.com

Even_Adder@lemmy.dbzer0.com

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers (arxiv.org)

submitted 5 hours ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink hide all child comments

Abstract

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

Paper: https://arxiv.org/abs/2605.22668

Hugging Face: https://huggingface.co/papers/2605.22668

Blog:

Project Page: https://rajabi2001.github.io/sega/

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here