Research on Intelligent Grading of Physics Problems Based on Large Language Models
Yanan Wei,
No information about this author
Rui Zhang,
No information about this author
Jianwei Zhang
No information about this author
et al.
Education Sciences,
Journal Year:
2025,
Volume and Issue:
15(2), P. 116 - 116
Published: Jan. 21, 2025
The
automation
of
educational
and
instructional
assessment
plays
a
crucial
role
in
enhancing
the
quality
teaching
management.
In
physics
education,
calculation
problems
with
intricate
problem-solving
ideas
pose
challenges
to
intelligent
grading
tests.
This
study
explores
automatic
through
combination
large
language
models
prompt
engineering.
By
comparing
performance
four
strategies
(one-shot,
few-shot,
chain
thought,
tree
thought)
within
two
model
frameworks,
namely
ERNIEBot-4-turbo
GPT-4o.
finds
that
thought
can
better
assess
complex
(N
=
100,
ACC
≥
0.9,
kappa
>
0.8)
reduce
gap
between
different
models.
research
provides
valuable
insights
for
assessments
education.
Language: Английский
Mapping artificial intelligence models in emergency medicine: A scoping review on artificial intelligence performance in emergency care and education
Turkish Journal of Emergency Medicine,
Journal Year:
2025,
Volume and Issue:
25(2), P. 67 - 91
Published: April 1, 2025
Artificial
intelligence
(AI)
is
increasingly
improving
the
processes
such
as
emergency
patient
care
and
medicine
education.
This
scoping
review
aims
to
map
use
performance
of
AI
models
in
regarding
concepts.
The
findings
show
that
AI-based
medical
imaging
systems
provide
disease
detection
with
85%-90%
accuracy
techniques
X-ray
computed
tomography
scans.
In
addition,
AI-supported
triage
were
found
be
successful
correctly
classifying
low-
high-urgency
patients.
education,
large
language
have
provided
high
rates
evaluating
exams.
However,
there
are
still
challenges
integration
into
clinical
workflows
model
generalization
capacity.
These
demonstrate
potential
updated
models,
but
larger-scale
studies
needed.
Language: Английский
AI-Driven Information for Relatives of Patients with Malignant Middle Cerebral Artery Infarction: A Preliminary Validation Study Using GPT-4o
Brain Sciences,
Journal Year:
2025,
Volume and Issue:
15(4), P. 391 - 391
Published: April 11, 2025
Purpose:
This
study
examines
GPT-4o’s
ability
to
communicate
effectively
with
relatives
of
patients
undergoing
decompressive
hemicraniectomy
(DHC)
after
malignant
middle
cerebral
artery
infarction
(MMCAI).
Methods:
GPT-4o
was
asked
25
common
questions
from
patients’
about
DHC
for
MMCAI,
twice
over
a
7-day
interval.
Responses
were
rated
accuracy,
clarity,
relevance,
completeness,
sourcing,
and
usefulness
by
board-certified
intensivist*
(one),
neurologists,
neurosurgeons
using
the
Quality
Analysis
Medical
AI
(QAMAI)
tool.
Interrater
reliability
stability
measured
ICC
Pearson’s
correlation.
Results:
The
total
QAMAI
scores
22.32
±
3.08
intensivist,
24.68
2.8
neurologist,
23.36
2.86
26.32
2.91
neurosurgeons,
representing
moderate-to-high
accuracy.
evaluators
reported
moderate
(0.631,
95%
CI:
0.321–0.821).
highest
subscores
categories
relevance
while
poorest
associated
usefulness,
sourcing.
did
not
systematically
provide
references
their
responses.
analysis
stability.
readability
assessment
revealed
an
FRE
score
7.23,
FKG
15.87
GF
index
18.15.
Conclusions:
provides
quality
information
related
strengths
in
relevance.
However,
limitations
may
impact
its
effectiveness
patient
or
relatives’
education.
Language: Английский
Inferring Drug–Gene Relationships in Cancer Using Literature-Augmented Large Language Models
Cancer Research Communications,
Journal Year:
2025,
Volume and Issue:
5(4), P. 706 - 718
Published: April 1, 2025
Abstract
Understanding
drug–gene
relationships
is
essential
for
advancing
targeted
cancer
therapies
and
drug
repurposing
strategies.
However,
the
vast
volume
of
biomedical
literature
poses
significant
challenges
in
efficiently
extracting
relevant
insights.
In
this
study,
we
developed
an
automated
pipeline
that
leverages
retrieval-augmented
large
language
models
(LLM)
to
infer
interactions
using
most
up-to-date
literature.
By
integrating
PubMed
state-of-the-art
LLMs,
our
generates
accurate,
evidence-based
inferences
while
addressing
limitations
static
such
as
outdated
knowledge
risk
producing
misleading
results.
We
systematically
validated
pipeline’s
performance
curated
databases
demonstrated
its
ability
accurately
identify
both
well-established
emerging
targets.
Using
pipeline,
constructed
a
pan-cancer
interaction
network
among
hundreds
FDA-approved
drugs
key
oncogenes.
case
study
on
liver
cancer,
identified
association
between
CTNNB1
mutations
enhanced
sensitivity
sorafenib,
highlighting
potential
therapeutic
strategy
challenging
mutation.
To
facilitate
broad
accessibility,
GeneRxGPT,
user-friendly
web
application
enables
researchers
utilize
without
programming
expertise
or
extensive
computational
resources.
It
provides
intuitive
modules
inference
visualization,
streamlining
exploration
interpretation
relationships.
anticipate
GeneRxGPT
will
empower
accelerate
discovery
development,
making
it
valuable
resource
research
community.
Significance:
This
presents
novel
approach
integrates
LLMs
with
real-time
uncover
relationships,
transforming
how
targets,
repurpose
drugs,
interpret
complex
molecular
interactions.
tool,
leverage
requiring
expertise.
Language: Английский
Feasibility of real-time compression frequency and compression depth assessment in CPR using a “machine-learning” artificial intelligence tool
Hannes Ecker,
No information about this author
Niels-Benjamin Adams,
No information about this author
Michael Schmitz
No information about this author
et al.
Resuscitation Plus,
Journal Year:
2024,
Volume and Issue:
20, P. 100825 - 100825
Published: Nov. 5, 2024
Language: Английский
AI-Powered clinical assessments: GPT-4o’s role in standardizing CPR skill evaluations
Resuscitation,
Journal Year:
2024,
Volume and Issue:
204, P. 110411 - 110411
Published: Oct. 10, 2024
Language: Английский
Assessing the ability of GPT-4o to visually recognize medications and provide patient education
Scientific Reports,
Journal Year:
2024,
Volume and Issue:
14(1)
Published: Nov. 5, 2024
Various
studies
have
investigated
the
ability
of
ChatGPT
(OpenAI)
to
provide
medication
information;
however,
a
new
promising
feature
has
now
been
added,
which
allows
visual
input
and
is
yet
be
evaluated.
Here,
we
aimed
qualitatively
assess
its
visually
recognize
medications,
through
picture
input,
patient
education
via
written
output.
The
responses
were
evaluated
by
accuracy,
precision
clarity
using
4-point
Likert-like
scale.
In
regards
handling
providing
responses,
GPT-4o
was
able
all
20
tested
medications
from
packaging
pictures,
even
with
blurring,
retrieve
their
active
ingredients,
identify
formulations
dosage
forms
detailed,
concise
enough,
in
an
almost
completely
accurate,
precise
clear
manner
score
3.55
±
0.605
(85%).
contrast,
output
generated
images
illustrating
usage
instructions
contained
many
errors
that
would
either
hinder
effectiveness
or
cause
direct
harm
poor
1.5
0.577
(16.7%).
conclusion,
capable
identifying
pictures
exhibits
contrasting
performance
between
very
impressive
scores,
respectively.
Language: Английский
A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis
Junxiu Zhang,
No information about this author
Yao Ma,
No information about this author
Rong Zhang
No information about this author
et al.
Scientific Reports,
Journal Year:
2024,
Volume and Issue:
14(1)
Published: Dec. 5, 2024
Artificial
intelligence
(AI),
particularly
large
language
models
like
GPT-4o,
holds
promise
for
enhancing
diagnostic
accuracy
in
healthcare.
This
study
evaluates
the
performance
of
GPT-4o
compared
to
human
ophthalmologists
glaucoma
cases.
A
prospective,
observational
was
conducted
at
a
tertiary
care
ophthalmology
center.
Twenty-six
cases,
including
both
primary
and
secondary
types,
were
selected
from
publicly
available
databases
institutional
records.
The
cases
analyzed
by
three
with
varying
levels
experience.
completeness
differential
diagnoses
assessed
using
10-point
6-point
Likert
scales,
respectively.
Statistical
analyses
performed
nonparametric
methods,
Kruskal–Wallis
Mann–Whitney
U
tests.
significantly
less
accurate
diagnosis
ophthalmologists.
Specifically,
achieved
mean
score
5.500
(p
<
0.001)
Doctor
C,
who
had
highest
8.038
0.001).
Completeness
scores
3.077
also
lower
than
B,
lowest
3.615
among
However,
diagnosis,
(7.577)
showed
comparable
(7.615)
C
(7.673)
0.0001)
while
achieving
(4.096),
outperforming
(3.846),
(2.923),
B
(2.808)
0.0001).
AI,
is
currently
not
an
acceptable
standalone
method
diagnosing
due
its
clinicians.
These
findings
suggest
that
could
serve
as
valuable
adjunct
clinical
practice,
complex
but
should
replace
expertise,
especially
initial
diagnoses.
Future
improvements
AI
enhance
their
utility
ophthalmology.
Language: Английский