Voice EHR: introducing multimodal audio data for health
James Anibal,
No information about this author
Hannah Huth,
No information about this author
Ming Li
No information about this author
et al.
Frontiers in Digital Health,
Journal Year:
2025,
Volume and Issue:
6
Published: Jan. 28, 2025
Introduction
Artificial
intelligence
(AI)
models
trained
on
audio
data
may
have
the
potential
to
rapidly
perform
clinical
tasks,
enhancing
medical
decision-making
and
potentially
improving
outcomes
through
early
detection.
Existing
technologies
depend
limited
datasets
collected
with
expensive
recording
equipment
in
high-income
countries,
which
challenges
deployment
resource-constrained,
high-volume
settings
where
a
profound
impact
health
equity.
Methods
This
report
introduces
novel
protocol
for
collection
corresponding
application
that
captures
information
guided
questions.
Results
To
demonstrate
of
Voice
EHR
as
biomarker
health,
initial
experiments
quality
multiple
case
studies
are
presented
this
report.
Large
language
(LLMs)
were
used
compare
transcribed
(from
same
patients)
conventional
techniques
like
choice
Information
contained
samples
was
consistently
rated
equally
or
more
relevant
evaluation.
Discussion
The
HEAR
facilitates
an
electronic
record
(“Voice
EHR”)
contain
complex
biomarkers
from
voice/respiratory
features,
speech
patterns,
spoken
semantic
meaning
longitudinal
context–potentially
compensating
typical
limitations
unimodal
datasets.
Language: Английский
Vowel segmentation impact on machine learning classification for chronic obstructive pulmonary disease
Alper Idrisoglu,
No information about this author
Ana Luiza Dallora Moraes,
No information about this author
Abbas Cheddad
No information about this author
et al.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: March 22, 2025
Abstract
Vowel-based
voice
analysis
is
gaining
attention
as
a
potential
non-invasive
tool
for
COPD
classification,
offering
insights
into
phonatory
function.
The
growing
need
data
has
necessitated
the
adoption
of
various
techniques,
including
segmentation,
to
augment
existing
datasets
training
comprehensive
Machine
Learning
(ML)
modelsThis
study
aims
investigate
possible
effects
segmentation
utterance
vowel
"a"
on
performance
ML
classifiers
CatBoost
(CB),
Random
Forest
(RF),
and
Support
Vector
(SVM).
This
research
involves
individual
models
using
three
distinct
dataset
constructions:
full-sequence,
segment-wise,
group-wise,
derived
from
which
consists
1058
recordings
belonging
48
participants.
approach
comprehensively
analyzes
how
each
categorization
impacts
model's
results.
A
nested
cross-validation
(nCV)
was
implemented
with
grid
search
hyperparameter
optimization.
rigorous
methodology
employed
minimize
overfitting
risks
maximize
model
performance.
Compared
full-sequence
dataset,
findings
indicate
that
second
segment
yielded
higher
results
within
four-segment
category.
Specifically,
CB
achieved
superior
accuracy,
attaining
97.8%
84.6%
validation
test
sets,
respectively.
same
category
also
demonstrated
best
balance
regarding
true
positive
rate
(TPR)
negative
(TNR),
making
it
most
clinically
effective
choice.
These
suggest
time-sensitive
properties
in
production
are
important
classification
can
aid
capturing
these
properties.
Despite
promising
results,
size
demographic
homogeneity
limit
generalizability,
highlighting
areas
future
research.
Trial
registration
registered
clinicaltrials.gov
ID:
NCT06160674.
Language: Английский
The Bridge2AI-voice application: initial feasibility study of voice data acquisition through mobile health
Elijah Moothedan,
No information about this author
Micah Boyer,
No information about this author
Stephanie Watts
No information about this author
et al.
Frontiers in Digital Health,
Journal Year:
2025,
Volume and Issue:
7
Published: April 15, 2025
Introduction
Bridge2AI-Voice,
a
collaborative
multi-institutional
consortium,
aims
to
generate
large-scale,
ethically
sourced
voice,
speech,
and
cough
database
linked
health
metadata
in
order
support
AI-driven
research.
A
novel
smartphone
application,
the
Bridge2AI-Voice
app,
was
created
collect
standardized
recordings
of
acoustic
tasks,
validated
patient
questionnaires,
reported
outcomes.
Before
broad
data
collection,
feasibility
study
undertaken
assess
viability
app
clinical
setting
through
task
performance
metrics
participant
feedback.
Materials
&
methods
Participants
were
recruited
from
tertiary
academic
voice
center.
instructed
complete
series
tasks
application
on
an
iPad.
The
Plan-Do-Study-Act
model
for
quality
improvement
implemented.
Data
collected
included
demographics
including
time
completion,
successful
task/recording
need
assistance.
Participant
feedback
measured
by
qualitative
interview
adapted
Mobile
App
Rating
Scale.
Results
Forty-seven
participants
enrolled
(61%
female,
92%
primary
language
English,
mean
age
58.3
years).
All
owned
smart
devices,
with
49%
using
mobile
apps.
Overall
completion
rate
68%,
successfully
recorded
41%
cases.
requested
assistance
completed
challenges
mainly
related
design
instruction
understandability.
Interview
responses
reflected
favorable
perception
voice-screening
apps
their
features.
Conclusion
Findings
suggest
that
is
promising
tool
acquisition
setting.
However,
development
improved
User
Interface/User
Experience
broader,
diverse
studies
are
needed
usable
tool.
Level
evidence
:
3.
Language: Английский