Cited by RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

Parameter Efficient Fine-Tuning for Multi-modal Generative Vision Models with Möbius-Inspired Transformation DOI

Haoran Duan, Shuai Shao,

Bing Zhai

и другие.

International Journal of Computer Vision, Год журнала: 2025, Номер unknown

Опубликована: Март 13, 2025

Язык: Английский

Процитировано

LEAD: Latent Realignment for Human Motion Diffusion DOI

N. Andreou, Xi Wang, Victoria Abrevaya

и другие.

Computer Graphics Forum, Год журнала: 2025, Номер unknown

Опубликована: Апрель 9, 2025

Abstract Our goal is to generate realistic human motion from natural language. Modern methods often face a trade‐off between model expressiveness and text‐to‐motion (T2M) alignment. Some align text latent spaces but sacrifice expressiveness; others rely on diffusion models producing impressive motions lacking semantic meaning in their space. This may compromise realism, diversity applicability. Here, we address this by combining with realignment mechanism, novel, semantically structured space that encodes the semantics of Leveraging capability, introduce task textual inversion capture novel concepts few examples. For synthesis, evaluate LEAD HumanML3D KIT‐ML show comparable performance state‐of‐the‐art terms text‐motion consistency. qualitative analysis user study reveal our synthesised are sharper, more human‐like comply better compared modern methods. (MTI), method demonstrates improvements capturing out‐of‐distribution characteristics comparison traditional VAEs.

Язык: Английский

Процитировано

Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation DOI

Jinpeng Liu,

Wenxun Dai,

Chunyu Wang

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 445 - 463

Опубликована: Ноя. 2, 2024

Язык: Английский

Процитировано

Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models DOI

Dionis Totsila,

Quentin Rouxel, Jean-Baptiste Mouret

и другие.

Опубликована: Ноя. 22, 2024

This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision models. Our method is key component for language-assisted teleoperation human-robot cooperation, where human operators can instruct the robots to place their support contacts before whole-body reaching or manipulation using natural language. Words2Contact transforms verbal instructions of operator into contact predictions; it also deals with iterative corrections, until satisfied location identified in robot's field view. We benchmark state-of-the-art LLMs VLMs size performance prediction. demonstrate effectiveness correction process, showing that users, even naive, quickly learn how system obtain accurate locations. Finally, we validate real-world experiments Talos humanoid robot, instructed by on different locations surfaces avoid falling when distant objects.

Язык: Английский

Процитировано

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models DOI

Bowen Zhang,

Yiji Cheng,

Chunyu Wang

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 465 - 483

Опубликована: Дек. 4, 2024

Язык: Английский

Процитировано