Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 8, 2025
Abstract
Modeling
genetic
perturbations
and
their
effect
on
the
transcriptome
is
a
key
area
of
pharmaceutical
research.
Due
to
complexity
transcriptome,
there
has
been
much
excitement
development
in
deep
learning
(DL)
because
its
ability
model
complex
relationships.
In
particular,
transformer-based
foundation
paradigm
emerged
as
gold-standard
predicting
post-perturbation
responses.
However,
understanding
these
increasingly
models
evaluating
practical
utility
lacking,
along
with
simple
but
appropriate
benchmarks
compare
predictive
methods.
Here,
we
present
baseline
method
that
outperforms
both
state
art
(SOTA)
DL
other
proposed
simpler
neural
architectures,
setting
necessary
benchmark
evaluate
field
prediction.
We
also
elucidate
for
task
prediction
via
generalizable
fine-tuning
experiments
can
be
translated
different
applications
tasks
interest.
Furthermore,
provide
corrected
version
popular
dataset
used
benchmarking
perturbation
models.
Our
hope
this
work
will
properly
contextualize
further
space
control
procedures.
Language: Английский
Causal models and prediction in cell line perturbation experiments
BMC Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: Jan. 7, 2025
Abstract
In
cell
line
perturbation
experiments,
a
collection
of
cells
is
perturbed
with
external
agents
and
responses
such
as
protein
expression
measured.
Due
to
cost
constraints,
only
small
fraction
all
possible
perturbations
can
be
tested
in
vitro
.
This
has
led
the
development
computational
models
that
predict
cellular
silico
A
central
challenge
for
these
effect
new,
previously
untested
were
not
used
training
data.
Here
we
propose
causal
structural
equations
modeling
how
cells.
From
this
model,
derive
two
estimators
predicting
responses:
Linear
Regression
(LR)
estimator
structure
learning
term
Causal
Structure
(CSR).
The
CSR
requires
more
assumptions
than
LR,
but
effects
drugs
applied
Next
present
Cellbox,
recently
proposed
system
ordinary
differential
(ODEs)
based
model
obtained
best
prediction
performance
on
Melanoma
data
set
(Yuan
et
al.
Cell
Syst
12:128–140,
2021).
We
analytic
results
show
close
connection
between
providing
new
interpretation
Cellbox
model.
compare
LR
CSR/Cellbox
simulations,
highlighting
strengths
weaknesses
approaches.
Finally
benchmark
set.
find
comparable
or
slightly
better
Cellbox.
Language: Английский
AUC-PR is a More Informative Metric for Assessing the Biological Relevance of In Silico Cellular Perturbation Prediction Models
Hongxu Zhu,
No information about this author
Amir Asiaee,
No information about this author
Leila Azinfar
No information about this author
et al.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 11, 2025
Abstract
In
silico
perturbation
models,
computational
methods
which
can
predict
cellular
responses
to
perturbations,
present
an
opportunity
reduce
the
need
for
costly
and
time-intensive
in
vitro
experiments.
Many
recently
proposed
models
high-dimensional
responses,
such
as
gene
or
protein
expression
perturbations
knockout
drugs.
However,
evaluating
performance
has
largely
relied
on
metrics
R
2
,
assess
overall
prediction
accuracy
but
fail
capture
biologically
significant
outcomes
like
identification
of
differentially
expressed
genes.
this
study,
we
a
novel
evaluation
framework
that
introduces
AUC-PR
metric
precision
recall
DE
predictions.
By
applying
both
single-cell
pseudo-bulked
datasets,
systematically
benchmark
simple
advanced
models.
Our
results
highlight
discrepancy
between
AUC-PR,
with
achieving
high
values
struggling
identify
Differentially
genes
accurately,
reflected
their
low
values.
This
finding
underscores
limitations
traditional
importance
relevant
assessments.
provides
more
comprehensive
understanding
model
capabilities,
advancing
application
approaches
research.
Language: Английский
New horizons at the interface of artificial intelligence and translational cancer research
Cancer Cell,
Journal Year:
2025,
Volume and Issue:
43(4), P. 708 - 727
Published: April 1, 2025
Language: Английский
GenePert: Leveraging GenePT Embeddings for Gene Perturbation Prediction
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 29, 2024
Abstract
Predicting
how
perturbation
of
a
target
gene
affects
the
expression
other
genes
is
critical
component
understanding
cell
biology.
This
challenging
prediction
problem
as
model
must
capture
complex
gene-gene
relationships
and
output
high-dimensional
sparse.
To
address
this
challenge,
we
present
GenePert,
simple
approach
that
leverages
GenePT
embeddings,
which
are
derived
using
ChatGPT
from
text
descriptions
individual
genes,
to
predict
changes
due
perturbations
via
regularized
regression
models.
Benchmarked
on
eight
CRISPR
screen
datasets
across
multiple
types
five
different
pretrained
embedding
models,
GenePert
consistently
outperforms
all
state-of-the-art
models
measured
in
both
Pearson
correlation
mean
squared
error
metrics.
Even
with
limited
training
data,
our
generalizes
effectively,
offering
scalable
solution
for
predicting
outcomes.
These
findings
underscore
power
informative
embeddings
outcomes
unseen
genetic
experiments
silico
.
available
at
https://github.com/zou-group/GenePert
Language: Английский
A systematic comparison of computational methods for expression forecasting
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: July 31, 2023
Abstract
Expression
forecasting
methods
use
machine
learning
models
to
predict
how
a
cell
will
alter
its
transcriptome
upon
perturbation.
Such
are
enticing
because
they
promise
answer
pressing
questions
in
fields
ranging
from
developmental
genetics
fate
engineering
and
fast,
cheap,
accessible
complement
the
corresponding
experiments.
However,
absolute
relative
accuracy
of
these
is
poorly
characterized,
limiting
their
informed
use,
improvement,
interpretation
predictions.
To
address
issues,
we
created
benchmarking
platform
that
combines
panel
11
large-scale
perturbation
datasets
with
an
expression
software
engine
encompasses
or
interfaces
wide
variety
methods.
We
used
our
systematically
assess
methods,
parameters,
sources
auxiliary
data,
finding
performance
strongly
depends
on
choice
metric,
especially
for
simple
metrics
like
mean
squared
error,
it
uncommon
out-perform
baselines.
Our
serve
as
resource
improve
identify
contexts
which
can
succeed.
Language: Английский
A cross-species foundation model for single cells
Cell Research,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 31, 2024
Language: Английский
Integrative Computational Framework, Dyscovr, Links Mutated Driver Genes to Expression Dysregulation Across 19 Cancer Types
Sara Geraghty,
No information about this author
Jacob A. Boyer,
No information about this author
Mahya Fazel-Zarandi
No information about this author
et al.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 21, 2024
Though
somatic
mutations
play
a
critical
role
in
driving
cancer
initiation
and
progression,
the
systems-level
functional
impacts
of
these
mutations-particularly,
how
they
alter
expression
across
genome
give
rise
to
hallmarks-are
not
yet
well-understood,
even
for
well-studied
driver
genes.
To
address
this,
we
designed
an
integrative
machine
learning
model,
Dyscovr,
that
leverages
mutation,
gene
expression,
copy
number
alteration
(CNA),
methylation,
clinical
data
uncover
putative
relationships
between
nonsynonymous
key
genes
transcriptional
changes
genome.
We
applied
Dyscovr
pan-cancer
within
19
individual
types,
finding
both
broadly
relevant
type-specific
links
targets,
including
subset
further
identify
as
exhibiting
negative
genetic
relationships.
Our
work
newly
implicates-and
validates
cell
lines-
Language: Английский
Modeling and predicting single-cell multi-gene perturbation responses with scLAMBDA
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 8, 2024
Abstract
Understanding
cellular
responses
to
genetic
perturbations
is
essential
for
understanding
gene
regulation
and
phenotype
formation.
While
high-throughput
single-cell
RNA-sequencing
has
facilitated
detailed
profiling
of
heterogeneous
transcriptional
at
the
level,
there
remains
a
pressing
need
computational
models
that
can
decode
mechanisms
driving
these
accurately
predict
outcomes
prioritize
target
genes
experimental
design.
Here,
we
present
scLAMBDA,
deep
generative
learning
framework
designed
model
perturbations,
including
single-gene
combinatorial
multi-gene
perturbations.
By
leveraging
embeddings
derived
from
large
language
models,
scLAMBDA
effectively
integrates
prior
biological
knowledge
disentangles
basal
cell
states
perturbation-specific
salient
representations.
Through
comprehensive
evaluations
on
multiple
CRISPR
Perturb-seq
datasets,
consistently
outperformed
state-of-the-art
methods
in
predicting
perturbation
outcomes,
achieving
higher
prediction
accuracy.
Notably,
demonstrated
robust
generalization
unseen
its
predictions
captured
both
average
expression
changes
heterogeneity
responses.
Furthermore,
enable
diverse
downstream
analyses,
identification
differentially
expressed
exploration
interactions,
demonstrating
utility
versatility.
Language: Английский