Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 465 - 483
Опубликована: Дек. 4, 2024
Язык: Английский
Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 465 - 483
Опубликована: Дек. 4, 2024
Язык: Английский
International Journal of Computer Vision, Год журнала: 2025, Номер unknown
Опубликована: Март 13, 2025
Язык: Английский
Процитировано
0Computer Graphics Forum, Год журнала: 2025, Номер unknown
Опубликована: Апрель 9, 2025
Abstract Our goal is to generate realistic human motion from natural language. Modern methods often face a trade‐off between model expressiveness and text‐to‐motion (T2M) alignment. Some align text latent spaces but sacrifice expressiveness; others rely on diffusion models producing impressive motions lacking semantic meaning in their space. This may compromise realism, diversity applicability. Here, we address this by combining with realignment mechanism, novel, semantically structured space that encodes the semantics of Leveraging capability, introduce task textual inversion capture novel concepts few examples. For synthesis, evaluate LEAD HumanML3D KIT‐ML show comparable performance state‐of‐the‐art terms text‐motion consistency. qualitative analysis user study reveal our synthesised are sharper, more human‐like comply better compared modern methods. (MTI), method demonstrates improvements capturing out‐of‐distribution characteristics comparison traditional VAEs.
Язык: Английский
Процитировано
0Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 445 - 463
Опубликована: Ноя. 2, 2024
Язык: Английский
Процитировано
3Опубликована: Ноя. 22, 2024
This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision models. Our method is key component for language-assisted teleoperation human-robot cooperation, where human operators can instruct the robots to place their support contacts before whole-body reaching or manipulation using natural language. Words2Contact transforms verbal instructions of operator into contact predictions; it also deals with iterative corrections, until satisfied location identified in robot's field view. We benchmark state-of-the-art LLMs VLMs size performance prediction. demonstrate effectiveness correction process, showing that users, even naive, quickly learn how system obtain accurate locations. Finally, we validate real-world experiments Talos humanoid robot, instructed by on different locations surfaces avoid falling when distant objects.
Язык: Английский
Процитировано
1Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 465 - 483
Опубликована: Дек. 4, 2024
Язык: Английский
Процитировано
0