Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: March 22, 2025
YouTube
has
become
a
dominant
source
of
medical
information
and
health-related
decision-making.
Yet,
many
videos
on
this
platform
contain
inaccurate
or
biased
information.
Although
expert
reviews
could
help
mitigate
situation,
the
vast
number
daily
uploads
makes
solution
impractical.
In
study,
we
explored
potential
Large
Language
Models
(LLMs)
to
assess
quality
content
YouTube.
We
collected
set
previously
evaluated
by
experts
prompted
twenty
models
rate
their
using
DISCERN
instrument.
then
analyzed
inter-rater
agreement
between
language
models'
experts'
ratings
Brennan–Prediger's
(BP)
Kappa.
found
that
LLMs
exhibited
wide
range
agreements
with
(ranging
from
−1.10
0.82).
All
tended
give
higher
scores
than
human
experts.
The
individual
questions
be
lower,
some
showing
significant
disagreement
Including
scoring
guidelines
in
prompt
improved
model
performance.
conclude
are
capable
evaluating
videos.
If
used
as
stand-alone
systems
embedded
into
traditional
recommender
systems,
these
can
issue
online
Advances in educational technologies and instructional design book series,
Journal Year:
2025,
Volume and Issue:
unknown, P. 137 - 154
Published: Jan. 17, 2025
This
chapter
examines
the
transformative
impact
of
Artificial
Intelligence
(AI)
on
education
and
workforce
preparation.
It
delves
into
how
AI
technologies
are
redefining
teaching
methods,
learning
experiences,
skillsets
necessary
in
today's
job
market.
The
explores
AI's
potential
to
personalize
learning,
boost
student
engagement,
develop
critical
thinking
problem-solving
skills.
also
addresses
challenges
opportunities
integrating
education,
including
need
for
comprehensive
educator
training,
literacy
programs,
adaptive
regulatory
frameworks.
Ethical
considerations
related
use
educational
settings
discussed.
Emphasizing
importance
balancing
technological
advancement
with
core
values,
advocates
approaches
that
nurture
essential
skills
while
leveraging
capabilities.
concludes
by
underscoring
continued
research
adaptation
ensure
integration
prepares
students
an
AI-driven
future
preserving
fundamental
objectives
education.
Professional Discourse & Communication,
Journal Year:
2025,
Volume and Issue:
7(1), P. 70 - 88
Published: March 17, 2025
The
article
aims
to
explore
the
potential
of
generative
artificial
intelligence
(AI)
for
assessing
written
work
and
providing
feedback
on
it.
goal
this
research
is
determine
possibilities
limitations
AI
when
used
evaluating
students’
production
feedback.
To
accomplish
aim,
a
systematic
review
twenty-two
original
studies
was
conducted.
selected
were
carried
out
in
both
Russian
international
contexts,
with
results
published
between
2022
2025.
It
found
that
criteria-based
assessments
made
by
models
align
those
instructors,
surpasses
human
evaluators
its
ability
assess
language
argumentation.
However,
reliability
evaluation
negatively
affected
instability
sequential
assessments,
hallucinations
models,
their
limited
account
contextual
nuances.
Despite
detailisation
constructive
nature
from
AI,
it
often
insufficiently
specific
overly
verbose,
which
can
hinder
student
comprehension.
Feedback
primarily
targets
local
deficiencies,
while
pay
attention
global
issues,
such
as
incomplete
alignment
content
assigned
topic.
Unlike
provides
template-based
feedback,
avoiding
indirect
phrasing
leading
questions
contributing
development
self-regulation
skills.
Nevertheless,
these
shortcomings
be
addressed
through
subsequent
queries
model.
also
students
are
open
receiving
AI;
however,
they
prefer
receive
instructors
peers.
discussed
context
using
formulating
foreign
instructors.
conclusion
emphasises
necessity
critical
approach
assessment
importance
training
effective
interaction
technologies.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: March 22, 2025
YouTube
has
become
a
dominant
source
of
medical
information
and
health-related
decision-making.
Yet,
many
videos
on
this
platform
contain
inaccurate
or
biased
information.
Although
expert
reviews
could
help
mitigate
situation,
the
vast
number
daily
uploads
makes
solution
impractical.
In
study,
we
explored
potential
Large
Language
Models
(LLMs)
to
assess
quality
content
YouTube.
We
collected
set
previously
evaluated
by
experts
prompted
twenty
models
rate
their
using
DISCERN
instrument.
then
analyzed
inter-rater
agreement
between
language
models'
experts'
ratings
Brennan–Prediger's
(BP)
Kappa.
found
that
LLMs
exhibited
wide
range
agreements
with
(ranging
from
−1.10
0.82).
All
tended
give
higher
scores
than
human
experts.
The
individual
questions
be
lower,
some
showing
significant
disagreement
Including
scoring
guidelines
in
prompt
improved
model
performance.
conclude
are
capable
evaluating
videos.
If
used
as
stand-alone
systems
embedded
into
traditional
recommender
systems,
these
can
issue
online