Knowledge-enhanced Agents for Interactive Text Games DOI Creative Commons
Prateek Chhikara, Jiarui Zhang, Filip Ilievski

и другие.

arXiv (Cornell University), Год журнала: 2023, Номер unknown

Опубликована: Янв. 1, 2023

Communication via natural language is a key aspect of machine intelligence, and it requires computational models to learn reason about world concepts, with varying levels supervision. Significant progress has been made on fully-supervised non-interactive tasks, such as question-answering procedural text understanding. Yet, various sequential interactive in text-based games, have revealed limitations existing approaches terms coherence, contextual awareness, their ability effectively from the environment. In this paper, we propose knowledge-injection framework for improved functional grounding agents games. Specifically, consider two forms domain knowledge that inject into learning-based agents: memory previous correct actions affordances relevant objects Our supports representative model classes: reinforcement learning agents. Furthermore, devise multiple injection strategies above types agent architectures, including graphs augmentation input encoding strategies. We experiment four 10 tasks ScienceWorld game environment, illustrate impact configurations challenging task settings. findings provide crucial insights interplay between properties, contexts.

Язык: Английский

FIRE: Food Image to REcipe generation DOI
Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang

и другие.

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Год журнала: 2024, Номер unknown, С. 8169 - 8179

Опубликована: Янв. 3, 2024

Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal food is to develop end-to-end intelligent systems capable autonomously producing recipe information for image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size diversity, well quality learned embeddings. Meanwhile, emergence powerful attention-based vision language models presents promising avenue accurate generalizable generation, which yet be extensively explored. This paper proposes FIRE, novel multimodal methodology tailored generation domain, generates title, ingredients, cooking instructions based input images. FIRE leverages BLIP model generate titles, utilizes Vision Transformer with decoder ingredient extraction, employs T5 recipes incorporating titles ingredients inputs. We showcase two practical applications that can benefit from integrating large prompting: customization fit user preferences recipe-to-code transformation enable automated processes. Our experimental findings validate efficacy our proposed approach, underscoring its potential future advancements widespread adoption computing.

Язык: Английский

Процитировано

10

Knowledge-enhanced Agents for Interactive Text Games DOI Creative Commons
Prateek Chhikara, Jiarui Zhang, Filip Ilievski

и другие.

arXiv (Cornell University), Год журнала: 2023, Номер unknown

Опубликована: Янв. 1, 2023

Communication via natural language is a key aspect of machine intelligence, and it requires computational models to learn reason about world concepts, with varying levels supervision. Significant progress has been made on fully-supervised non-interactive tasks, such as question-answering procedural text understanding. Yet, various sequential interactive in text-based games, have revealed limitations existing approaches terms coherence, contextual awareness, their ability effectively from the environment. In this paper, we propose knowledge-injection framework for improved functional grounding agents games. Specifically, consider two forms domain knowledge that inject into learning-based agents: memory previous correct actions affordances relevant objects Our supports representative model classes: reinforcement learning agents. Furthermore, devise multiple injection strategies above types agent architectures, including graphs augmentation input encoding strategies. We experiment four 10 tasks ScienceWorld game environment, illustrate impact configurations challenging task settings. findings provide crucial insights interplay between properties, contexts.

Язык: Английский

Процитировано

0