Exploring the latent space of transcriptomic data with topic modeling
NAR Genomics and Bioinformatics,
Journal Year:
2025,
Volume and Issue:
7(2)
Published: March 29, 2025
Abstract
The
availability
of
high-dimensional
transcriptomic
datasets
is
increasing
at
a
tremendous
pace,
together
with
the
need
for
suitable
computational
tools.
Clustering
and
dimensionality
reduction
methods
are
popular
go-to
to
identify
basic
structures
in
these
datasets.
At
same
time,
different
topic
modeling
techniques
have
been
developed
organize
deluge
available
data
natural
language
using
their
latent
topical
structure.
This
paper
leverages
statistical
analogies
between
text
compare
when
applied
gene
expression
data.
Specifically,
we
test
accuracy
specific
task
discovering
reconstructing
tissue
structure
human
transcriptome
distinguishing
healthy
from
cancerous
tissues.
We
examine
properties
space
recovered
by
methods,
highlight
differences,
pros
cons
across
tasks.
focus
particular
on
how
priors
can
affect
results
interpretability.
Finally,
show
that
be
useful
low-dimensional
embedding
space,
where
neural
network
classifier
annotate
profiles
high
accuracy.
Language: Английский
Exploring the latent space of transcriptomic data with topic modeling
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 3, 2024
Abstract
The
availability
of
high-dimensional
transcriptomic
datasets
is
increasing
at
a
tremendous
pace,
together
with
the
need
for
suitable
computational
tools.
Clustering
and
dimensionality
reduction
methods
are
popular
go-to
to
identify
basic
structures
in
these
datasets.
At
same
time,
different
topic
modeling
techniques
have
been
developed
organize
deluge
available
data
natural
language
using
their
latent
topical
structure.
This
paper
leverages
statistical
analogies
between
text
compare
when
applied
gene
expression
data.
Specifically,
we
test
accuracy
specific
task
discovering
reconstructing
tissue
structure
human
transcriptome
distinguishing
healthy
from
cancerous
tissues.
We
examine
properties
space
recovered
by
methods,
highlight
differences,
pros
cons
across
tasks.
Finally,
show
that
can
be
useful
embedding
space,
where
neural
network
classifier
annotate
profiles
high
accuracy.
Language: Английский