Stable Diffusion

5678 readers

3 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Stable Diffusion Art (See its sidebar for more GenAI Art comms)
!aihorde@lemmy.dbzer0.com

Other communities

founded 3 years ago

MODERATORS

db0@lemmy.dbzer0.com

Even_Adder@lemmy.dbzer0.com

PixelDiT: Pixel Diffusion Transformers for Image Generation (arxiv.org)

submitted 4 days ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink hide all child comments

Abstract

Latent-space modeling has been the standard for Diffusion Transformers (DiTs). However, it relies on a two-stage pipeline where the pretrained autoencoder introduces lossy reconstruction, leading to error accumulation while hindering joint optimization. To address these issues, we propose PixelDiT, a single-stage, end-to-end model that eliminates the need for the autoencoder and learns the diffusion process directly in the pixel space. PixelDiT adopts a fully transformer-based architecture shaped by a dual-level design: a patch-level DiT that captures global semantics and a pixel-level DiT that refines texture details, enabling efficient training of a pixel-space diffusion model while preserving fine details. PixelDiT achieves 1.61 FID on ImageNet 256 and 1.81 FID on ImageNet 512, surpassing existing pixel generative models. We further extend PixelDiT to text-to-image generation and pretrain it at the 10242resolution in pixel space. It achieves 0.74 on GenEval and 83.5 on DPG-bench, approaching the best latent diffusion models. Code: this https URL

Project page: https://pixeldit.github.io/

Paper: https://arxiv.org/abs/2511.20645

Github page: https://github.com/NVlabs/PixelDiT

HuggingFace (diffusers): https://huggingface.co/nvidia/PixelDiT-1300M-1024px ComfyUI version: https://huggingface.co/Comfy-Org/PixelDiT

Workflow: https://github.com/Comfy-Org/ComfyUI/pull/14103 (first comment)

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here