this post was submitted on 23 May 2026
2 points (100.0% liked)

Stable Diffusion

5677 readers
16 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 2 years ago
MODERATORS
 

Abstract

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

Paper: https://arxiv.org/abs/2605.22668

Hugging Face: https://huggingface.co/papers/2605.22668

Blog:

Project Page: https://rajabi2001.github.io/sega/

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here