
Transactions of the Association for Computational Linguistics, Journal Year: 2025, Volume and Issue: 13, P. 142 - 166
Published: Jan. 1, 2025
Abstract Text-To-Image (TTI) models, such as DALL-E and StableDiffusion, have demonstrated remarkable prompt-based image generation capabilities. Multilingual encoders may a substantial impact on the cultural agency of these language is conduit culture. In this study, we explore perception embedded in TTI models by characterizing culture across three tiers: dimensions, domains, concepts. Based ontology, derive prompt templates to unlock knowledge propose comprehensive suite evaluation techniques, including intrinsic evaluations using CLIP space, extrinsic with Visual-Question-Answer human assessments, evaluate content TTI-generated images. To bolster our research, introduce CulText2I dataset, based six diverse spanning ten languages. Our experiments provide insights regarding Do, What, Which, How research questions about nature encoding paving way for cross-cultural applications models.1
Language: Английский