Research Square (Research Square),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 9, 2024
Abstract
Recent
advancements
in
vision-enabled
large
language
models
have
prompted
a
renewed
interest
evaluating
their
capabilities
and
limitations
when
interpreting
complex
visual
data.
The
current
research
employs
ImageNet-A,
dataset
specifically
designed
with
adversarially
selected
images
that
challenge
standard
AI
models,
to
test
the
processing
robustness
of
three
prominent
models:
GPT-4
Vision,
Google
Gemini
1.5,
Anthropic
Claude
3.
Quantitative
analyses
revealed
notable
disparities
misclassification
rates
types
errors
among
these
indicating
variation
ability
handle
adversarial
inputs
effectively.
Vision
demonstrated
commendable
robustness,
whereas
1.5
excelled
speed
efficiency.
3,
while
showing
intermediate
accuracy
levels,
displayed
significant
propensity
for
contextual
misinterpretations.
Qualitative
evaluations
further
assessed
relevance
plausibility
models'
hallucinations,
uncovering
challenges
achieving
human-like
understanding
ambiguous
or
scenes.
findings
emphasize
necessity
improvements
semantic
understanding.
Future
directions
include
enhancing
refining
evaluation
metrics
better
capture
qualitative
aspects
understanding,
fostering
interdisciplinary
collaborations
develop
systems
more
nuanced
interpretive
abilities.
study
underscores
ongoing
journey
towards
can
match
human
perceptual
skills,
highlighting
both
progress
made
considerable
remain.
Research Square (Research Square),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 7, 2024
Abstract
Natural
language
processing
has
seen
substantial
progress
with
the
development
of
highly
sophisticated
models
capable
understanding
and
generating
human-like
text.
However,
a
persistent
challenge
remains
in
enhancing
accuracy
these
when
dealing
domain-specific
knowledge,
particularly
avoiding
hallucinations
or
plausible
but
incorrect
information.
The
dynamic
domain
knowledge
injection
mechanism
introduced
this
research
represents
significant
advancement
by
allowing
continuous
integration
prioritisation
specialised
information,
thereby
improving
model's
performance
reliability.
By
dynamically
adjusting
hidden
weights
GPT-Neo
based
on
relevance
accuracy,
modified
model
achieved
higher
precision,
recall,
F1-scores,
exhibited
reduced
hallucination
rates
across
diverse
domains
such
as
cybersecurity,
medical
financial
data,
legal
documents.
A
comprehensive
evaluation
framework,
including
benchmark
creation
metrics,
validated
effectiveness
approach,
demonstrating
that
can
substantially
enhance
utility
large
fields.
results
highlight
transformative
potential
method,
offering
robust
pathway
for
more
accurate
contextually
aware
models.
Detailed
analysis
ablation
studies
further
elucidate
contributions
each
component
within
modification
process,
providing
critical
insights
into
optimisation
future
applications
innovative
approach.
The
evaluation
of
visual
hallucinations
in
multimodal
AI
models
is
novel
and
significant
because
it
addresses
a
critical
gap
understanding
how
systems
interpret
deceptive
inputs.
study
systematically
assessed
ChatGPT's
performance
on
synthetic
dataset
visually
non-deceptive
images,
employing
both
quantitative
qualitative
analysis.
Results
revealed
that
while
ChatGPT
achieved
high
accuracy
standard
recognition
tasks,
its
diminished
when
faced
with
highlighting
areas
for
further
improvement.
analysis
provided
insights
into
the
model's
underlying
mechanisms,
such
as
extensive
pretraining
sophisticated
integration
capabilities,
which
contribute
to
robustness
against
deceptions.
study's
findings
have
important
implications
development
more
reliable
robust
technologies,
offering
benchmark
future
evaluations
practical
guidelines
enhancing
systems.
Evaluating
the
intelligence
of
multimodal
large
language
models
(LLMs)
using
adapted
human
IQ
tests
poses
unique
challenges
and
opportunities
for
understanding
AI
capabilities.By
applying
Wechsler
Adult
Intelligence
Scale
(WAIS),
customized
to
assess
cognitive
functions
LLMs
such
as
Baidu
Benie,
Google
Gemini,
Anthropic
Claude,
significant
insights
into
complex
intellectual
landscape
these
systems
were
revealed.The
study
demonstrates
that
can
exhibit
sophisticated
abilities,
performing
tasks
requiring
advanced
verbal
comprehension,
perceptual
reasoning,
problemsolving-traditionally
considered
within
purview
cognition.The
research
also
highlights
distinct
profiles
each
model,
reflecting
their
specialized
architectures
training.However,
acknowledges
inherent
limitations
in
human-oriented
assessment,
emphasizing
need
ongoing
refinement
testing
methodologies
keep
pace
with
development.Future
directions
include
creation
dynamic
adaptive
frameworks
better
align
capabilities
evolving
systems,
ensuring
integration
societal
remains
aligned
values
safety
standards.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: April 21, 2025
Large
language
models
(LLMs)
are
artificial
intelligence
(AI)
based
computational
designed
to
understand
and
generate
human
like
text.
With
billions
of
training
parameters,
LLMs
excel
in
identifying
intricate
patterns,
enabling
remarkable
performance
across
a
variety
natural
processing
(NLP)
tasks.
After
the
introduction
transformer
architectures,
they
impacting
industry
with
their
text
generation
capabilities.
play
an
innovative
role
various
industries
by
automating
NLP
In
healthcare,
assist
diagnosing
diseases,
personalizing
treatment
plans,
managing
patient
data.
provide
predictive
maintenance
automotive
industry.
recommendation
systems,
consumer
behavior
analyzers.
facilitates
researchers
offer
personalized
learning
experiences
education.
finance
banking,
used
for
fraud
detection,
customer
service
automation,
risk
management.
driving
significant
advancements
tasks,
improving
accuracy,
providing
deeper
insights.
Despite
these
advancements,
face
challenges
such
as
ethical
concerns,
biases
data,
resource
requirements,
which
must
be
addressed
ensure
impartial
sustainable
deployment.
This
study
provides
comprehensive
analysis
LLMs,
evolution,
diverse
applications
industries,
offering
valuable
insights
into
transformative
potential
accompanying
limitations.
Research Square (Research Square),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 7, 2024
Abstract
The
ability
to
generate
coherent
and
contextually
relevant
text
is
increasingly
important
in
a
variety
of
applications,
prompting
the
need
for
more
sophisticated
language
models.
Our
novel
approach
next-phrase
prediction
within
Llama
2
model
architecture
significantly
enhances
both
accuracy
efficiency
generation,
setting
it
apart
from
traditional
next-word
methods.
Through
implementation
dual-stage
encoder-decoder
framework,
integrated
attention
mechanisms,
reinforcement
learning
techniques,
modified
achieves
substantial
improvements
BLEU
ROUGE
scores,
as
well
reductions
perplexity,
latency,
computational
resource
usage.
Extensive
evaluations
across
diverse
datasets
demonstrate
model's
robustness
generalizability,
showing
its
potential
advance
applications
reliant
on
advanced
modeling
capabilities.
research
highlights
importance
continual
innovation
optimizing
architectures
training
methodologies
meet
growing
demands
various
natural
processing
tasks.
By
systematically
addressing
limitations
existing
approaches,
study
contributes
valuable
insights
field,
paving
way
efficient
accurate
models
real-time
applications.
Research Square (Research Square),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 16, 2024
Abstract
The
complex
nature
of
logographic
writing
systems,
characterized
by
their
visually
intricate
characters
and
context-dependent
meanings,
presents
unique
challenges
for
computational
models
designed
primarily
alphabetic
scripts.
Understanding
the
ability
LLMs
to
process
scripts
across
visual
textual
input
modalities
is
essential
advancing
application
in
multilingual
contexts.
novel
approach
presented
this
study
systematically
compares
performance
when
interpreting
as
both
data,
offering
new
insights
into
semantic
consistency
accuracy
model
outputs
these
modalities.
findings
reveal
critical
disparities
performance,
particularly
highlighting
models'
tendency
favor
inputs,
which
suggests
need
further
refinement
multimodal
processing
capabilities.
Through
detailed
analysis
error
patterns,
similarity,
complexity,
research
demonstrates
importance
developing
more
robust
versatile
LLM
architectures
capable
effectively
managing
inherent
complexities
systems.
conclusions
drawn
from
not
only
provide
a
deeper
understanding
limitations
current
but
also
set
stage
future
innovations
field,
aiming
enhance
generalize
diverse
linguistic
structures
types.
Authorea (Authorea),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 3, 2024
The
growing
reliance
on
AI-generated
content
across
various
industries
necessitates
robust
methods
for
controlling
the
outputs
of
language
models
to
ensure
quality,
relevance,
and
adherence
ethical
guidelines.Introducing
a
novel
gametheoretic
framework,
this
research
establishes
structured
approach
controllable
text
generation,
enabling
strategic
manipulation
model
through
adaptive
prompt
interventions.The
study
employed
Mistral
model,
utilizing
concepts
Nash
equilibrium
feedback
loops
dynamically
adjust
strategies,
optimizing
balance
between
alignment,
diversity,
coherence.Experimental
results
demonstrated
that
different
strategies
distinctly
influenced
generated
text,
with
direct
prompts
enhancing
relevance
interrogative
promoting
creative
expression.Case
studies
further
illustrated
practical
applications
showcasing
its
adaptability
generation
tasks.The
comparative
analysis
against
traditional
control
highlighted
superiority
game-theoretic
in
achieving
high-quality,
controlled
outputs.These
findings
demonstrate
framework's
potential
enhance
AIdriven
offering
significant
implications
human-AI
collaboration,
automated
creation,
deployment
AI
technologies.
Artificial
intelligence
continues
to
revolutionize
various
domains,
with
large
language
models
(LLMs)
pushing
the
boundaries
of
what
machines
can
understand
and
generate.
Evaluating
intellectual
linguistic
capabilities
LLMs
using
standardized
tests
like
Wechsler
Adult
Intelligence
Scale
(WAIS)
provides
a
novel
significant
approach
understanding
their
cognitive
strengths
limitations.
This
research
presents
comprehensive
evaluation
Baidu
Ernie
OpenAI
ChatGPT,
comparing
performance
in
IQ
Chinese
tasks.
The
assessments
revealed
that
ChatGPT
achieved
marginally
higher
composite
score,
excelling
particularly
verbal
comprehension
working
memory.
demonstrated
superior
cultural
appropriateness
accuracy,
reflecting
its
strong
alignment
context.
study
involved
translating
WAIS
into
Chinese,
integrating
multimodal
inputs,
applying
rigorous
statistical
analyses
ensure
robust
reliable
results.
findings
demonstrate
distinct
each
model,
showing
versatility
handling
diverse
textual
data
culturally
relevant
grammatically
precise
responses.
implications
for
future
development
emphasize
importance
contextually
training
integration
enhance
performance.
framework
offers
valuable
insights
advancing
artificial
intelligence,
guiding
towards
more
intelligent,
adaptable,
aware
models.
In
academic
writing,
citations
play
an
essential
role
in
ensuring
the
attribution
of
ideas,
supporting
scholarly
claims,
and
enabling
traceability
knowledge
across
disciplines.
However,
manual
process
citation
generation
is
often
time-consuming
prone
to
errors,
leading
inconsistencies
that
can
undermine
credibility
work.
The
novel
approach
explored
this
study
leverages
advanced
machine
learning
techniques
automate
process,
offering
a
significant
improvement
both
accuracy
efficiency.
Through
integration
contextual
semantic
features,
model
demonstrates
superior
ability
replicate
complex
patterns,
adapt
various
disciplines,
generate
contextually
appropriate
with
high
precision.
results
rigorous
experiments
reveal
not
only
outperforms
traditional
tools
but
also
exhibits
robust
scalability,
making
it
well-suited
for
large-scale
applications.
This
research
contributes
field
automated
providing
powerful
tool
enhances
quality
integrity
communication.