Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 205 - 221
Опубликована: Дек. 7, 2024
Язык: Английский
Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 205 - 221
Опубликована: Дек. 7, 2024
Язык: Английский
Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 424 - 441
Опубликована: Ноя. 21, 2024
Язык: Английский
Процитировано
1Опубликована: Дек. 3, 2024
Text-to-image models have revolutionized content creation, enabling users to generate images from natural language prompts. While recent advancements in conditioning these offer more control over the generated results, photography—a significant artistic domain—remains inadequately integrated into systems. Our research identifies critical gaps modeling camera settings and photographic terms within text-to-image synthesis. Vision-language (VLMs) like CLIP OpenCLIP, which typically drive text conditions through cross-attention mechanisms of conditional diffusion models, struggle represent numerical data effectively their textual space. To address challenges, we present CameraSettings20k, a new dataset aggregated RAISE [Dang-Nguyen et al. 2015], DDPD [Abuolaim Brown 2020], PPR10K [Liang 2021]. curated offers normalized for 20,000 raw-format images, providing equivalent values standardized full-frame sensor. Furthermore, introduce Camera Settings as Tokens, an embedding approach leveraging LoRA adapter Latent Diffusion Models (LDMs) numerically image generation based on principles focal length, aperture, film speed, exposure time. experimental results demonstrate effectiveness proposed promising synthesized obeying given specified settings. our work not only bridges gap between user-friendly synthesis but also sets stage future explorations physics-aware generative models.
Язык: Английский
Процитировано
0Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 205 - 221
Опубликована: Дек. 7, 2024
Язык: Английский
Процитировано
0