PSG-Adapter: Controllable Planning Scene Graph for Improving Text-to-Image Diffusion DOI

Yi Gao

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 205 - 221

Опубликована: Дек. 7, 2024

Язык: Английский

Efficient Diffusion Transformer with Step-Wise Dynamic Attention Mediators DOI
Yifan Pu, Zhuofan Xia, Jiayi Guo

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 424 - 441

Опубликована: Ноя. 21, 2024

Язык: Английский

Процитировано

1

Camera Settings as Tokens: Modeling Photography on Latent Diffusion Models DOI Creative Commons
I-Sheng Fang, Yue-Hua Han, Jun-Cheng Chen

и другие.

Опубликована: Дек. 3, 2024

Text-to-image models have revolutionized content creation, enabling users to generate images from natural language prompts. While recent advancements in conditioning these offer more control over the generated results, photography—a significant artistic domain—remains inadequately integrated into systems. Our research identifies critical gaps modeling camera settings and photographic terms within text-to-image synthesis. Vision-language (VLMs) like CLIP OpenCLIP, which typically drive text conditions through cross-attention mechanisms of conditional diffusion models, struggle represent numerical data effectively their textual space. To address challenges, we present CameraSettings20k, a new dataset aggregated RAISE [Dang-Nguyen et al. 2015], DDPD [Abuolaim Brown 2020], PPR10K [Liang 2021]. curated offers normalized for 20,000 raw-format images, providing equivalent values standardized full-frame sensor. Furthermore, introduce Camera Settings as Tokens, an embedding approach leveraging LoRA adapter Latent Diffusion Models (LDMs) numerically image generation based on principles focal length, aperture, film speed, exposure time. experimental results demonstrate effectiveness proposed promising synthesized obeying given specified settings. our work not only bridges gap between user-friendly synthesis but also sets stage future explorations physics-aware generative models.

Язык: Английский

Процитировано

0

PSG-Adapter: Controllable Planning Scene Graph for Improving Text-to-Image Diffusion DOI

Yi Gao

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 205 - 221

Опубликована: Дек. 7, 2024

Язык: Английский

Процитировано

0