FIRE: Food Image to REcipe generation
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV),
Год журнала:
2024,
Номер
unknown, С. 8169 - 8179
Опубликована: Янв. 3, 2024
Food
computing
has
emerged
as
a
prominent
multidisciplinary
field
of
research
in
recent
years.
An
ambitious
goal
food
is
to
develop
end-to-end
intelligent
systems
capable
autonomously
producing
recipe
information
for
image.
Current
image-to-recipe
methods
are
retrieval-based
and
their
success
depends
heavily
on
the
dataset
size
diversity,
well
quality
learned
embeddings.
Meanwhile,
emergence
powerful
attention-based
vision
language
models
presents
promising
avenue
accurate
generalizable
generation,
which
yet
be
extensively
explored.
This
paper
proposes
FIRE,
novel
multimodal
methodology
tailored
generation
domain,
generates
title,
ingredients,
cooking
instructions
based
input
images.
FIRE
leverages
BLIP
model
generate
titles,
utilizes
Vision
Transformer
with
decoder
ingredient
extraction,
employs
T5
recipes
incorporating
titles
ingredients
inputs.
We
showcase
two
practical
applications
that
can
benefit
from
integrating
large
prompting:
customization
fit
user
preferences
recipe-to-code
transformation
enable
automated
processes.
Our
experimental
findings
validate
efficacy
our
proposed
approach,
underscoring
its
potential
future
advancements
widespread
adoption
computing.
Язык: Английский
Knowledge-enhanced Agents for Interactive Text Games
arXiv (Cornell University),
Год журнала:
2023,
Номер
unknown
Опубликована: Янв. 1, 2023
Communication
via
natural
language
is
a
key
aspect
of
machine
intelligence,
and
it
requires
computational
models
to
learn
reason
about
world
concepts,
with
varying
levels
supervision.
Significant
progress
has
been
made
on
fully-supervised
non-interactive
tasks,
such
as
question-answering
procedural
text
understanding.
Yet,
various
sequential
interactive
in
text-based
games,
have
revealed
limitations
existing
approaches
terms
coherence,
contextual
awareness,
their
ability
effectively
from
the
environment.
In
this
paper,
we
propose
knowledge-injection
framework
for
improved
functional
grounding
agents
games.
Specifically,
consider
two
forms
domain
knowledge
that
inject
into
learning-based
agents:
memory
previous
correct
actions
affordances
relevant
objects
Our
supports
representative
model
classes:
reinforcement
learning
agents.
Furthermore,
devise
multiple
injection
strategies
above
types
agent
architectures,
including
graphs
augmentation
input
encoding
strategies.
We
experiment
four
10
tasks
ScienceWorld
game
environment,
illustrate
impact
configurations
challenging
task
settings.
findings
provide
crucial
insights
interplay
between
properties,
contexts.
Язык: Английский