Genetics,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 5, 2024
Abstract
Deep
learning
methods
have
been
applied
when
working
to
enhance
the
prediction
accuracy
of
traditional
statistical
in
field
plant
breeding.
Although
deep
seems
be
a
promising
approach
for
genomic
prediction,
it
has
proven
some
limitations,
since
its
conventional
fail
leverage
all
available
information.
Multimodal
aim
improve
predictive
power
their
unimodal
counterparts
by
introducing
several
modalities
(sources)
input
In
this
review,
we
introduce
theoretical
basic
concepts
multimodal
and
provide
list
most
widely
used
neural
network
architectures
learning,
as
well
strategies
fuse
data
from
different
modalities.
We
mention
computational
resources
practical
implementation
problems.
finally
performed
review
applications
selection
breeding
other
related
fields.
present
meta-picture
performance
highlight
how
these
tools
can
help
address
complex
problems
discussed
relevant
considerations
that
researchers
should
keep
mind
applying
methods.
holds
significant
potential
various
fields,
including
selection.
While
displays
enhanced
capabilities
over
machine
methods,
demands
more
resources.
effectively
captures
intermodal
interactions,
especially
integrating
sources.
To
apply
selection,
suitable
fusion
must
chosen.
It
is
like
powerful
tool
but
carefully
applied.
Given
edge
valuable
addressing
challenges
food
security
amid
growing
global
population.
Genome biology,
Journal Year:
2024,
Volume and Issue:
25(1)
Published: June 6, 2024
Abstract
Cancer
is
a
complex
disease
composing
systemic
alterations
in
multiple
scales.
In
this
study,
we
develop
the
Tumor
Multi-Omics
pre-trained
Network
(TMO-Net)
that
integrates
multi-omics
pan-cancer
datasets
for
model
pre-training,
facilitating
cross-omics
interactions
and
enabling
joint
representation
learning
incomplete
omics
inference.
This
enhances
sample
empowers
various
downstream
oncology
tasks
with
datasets.
By
employing
interpretable
learning,
characterize
contributions
of
distinct
features
to
clinical
outcomes.
The
TMO-Net
serves
as
versatile
framework
cross-modal
oncology,
paving
way
tumor
omics-specific
foundation
models.
Genome Medicine,
Journal Year:
2023,
Volume and Issue:
15(1)
Published: Oct. 31, 2023
Abstract
Background
Genotypes
are
strongly
associated
with
disease
phenotypes,
particularly
in
brain
disorders.
However,
the
molecular
and
cellular
mechanisms
behind
this
association
remain
elusive.
With
emerging
multimodal
data
for
these
mechanisms,
machine
learning
methods
can
be
applied
phenotype
prediction
at
different
scales,
but
due
to
black-box
nature
of
learning,
integrating
modalities
interpreting
biological
challenging.
Additionally,
partial
availability
presents
a
challenge
developing
predictive
models.
Method
To
address
challenges,
we
developed
DeepGAMI,
an
interpretable
neural
network
model
improve
genotype–phenotype
from
data.
DeepGAMI
leverages
functional
genomic
information,
such
as
eQTLs
gene
regulation,
guide
connections.
it
includes
auxiliary
layer
cross-modal
imputation
allowing
latent
features
missing
thus
predicting
phenotypes
single
modality.
Finally,
uses
integrated
gradient
prioritize
various
phenotypes.
Results
We
several
datasets
including
genotype
bulk
cell-type
expression
diseases,
electrophysiology
mouse
neuronal
cells.
Using
cross-validation
independent
validation,
outperformed
existing
classifying
types,
clinical
even
using
(e.g.,
AUC
score
0.79
Schizophrenia
0.73
cognitive
impairment
Alzheimer’s
disease).
Conclusion
demonstrated
that
improves
prioritizes
phenotypic
networks
multiple
complex
brains
diseases.
Also,
prioritized
disease-associated
variants,
genes,
regulatory
linked
providing
novel
insights
into
interpretation
mechanisms.
is
open-source
available
general
use.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 2, 2025
Abstract
Alzheimer’s
Disease
(AD)
is
the
leading
cause
of
dementia,
imposing
significant
economic
and
social
burdens.
Although
genome-wide
association
studies
(GWAS)
have
identified
approximately
70
risk
loci,
functional
mechanisms
underlying
AD
remain
unclear.
In
this
study,
we
integrated
GWAS
summary
statistics
from
Jiang
et
al.
with
gene
expression
data
GTEx
project
using
S-PrediXcan
method,
encompassing
61
brain-related
traits
across
49
tissues.
Comprehensive
analysis
five
traits,
including
family
history
AD,
highlighted
key
genes
such
as
APOE,
APOC1,
TOMM40,
which
play
crucial
roles
in
cholesterol
metabolism,
immune
response,
neuroinflammation.
Validation
ROSMAP
dataset
confirmed
these
phenotypes.
Furthermore,
developed
AD-MIF,
a
novel
deep
multi-layer
information
fusion
model
that
integrates
multi-omics
data,
achieving
10-20%
improvement
AUC
performance
for
predicting
AD-related
compared
to
traditional
models.
Gene
enrichment
emphasized
importance
pathways
metabolism
response
pathogenesis
AD.
Additionally,
drug
repositioning
candidate
drugs,
Dasatinib
Sirolimus,
may
alleviate
progression
by
reducing
neuroinflammation
clearing
senescent
cells.
Our
findings
advance
understanding
genetic
architecture
improve
predictive
models,
propose
potential
therapeutic
drugs.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 14, 2025
Single-omics
approaches
often
provide
a
limited
view
of
complex
biological
systems,
whereas
multiomics
integration
offers
more
comprehensive
understanding
by
combining
diverse
data
views.
However,
integrating
heterogeneous
types
and
interpreting
the
intricate
relationships
between
features-both
within
across
different
views-remains
bottleneck.
To
address
these
challenges,
we
introduce
COSIME
(Cooperative
Multi-view
Integration
Scalable
Interpretable
Model
Explainer).
uses
backpropagation
Learnable
Optimal
Transport
(LOT)
to
deep
neural
networks,
enabling
learning
latent
features
from
multiple
views
predict
disease
phenotypes.
In
addition,
incorporates
Monte
Carlo
sampling
efficiently
estimate
Shapley
values
Shapley-Taylor
indices,
assessment
both
feature
importance
their
pairwise
interactions-synergistically
or
antagonistically-in
predicting
We
applied
simulated
real-world
datasets,
including
single-cell
transcriptomics,
spatial
epigenomics,
metabolomics,
specifically
for
Alzheimer's
disease-related
Our
results
demonstrate
that
significantly
improves
prediction
performance
while
offering
enhanced
interpretability
relationships.
For
example,
identified
synergistic
interactions
microglia
astrocyte
genes
associated
with
AD
are
likely
be
active
at
edges
middle
temporal
gyrus
as
indicated
locations.
Finally,
is
open-source
available
general
use.
International Journal of Molecular Sciences,
Journal Year:
2025,
Volume and Issue:
26(5), P. 2085 - 2085
Published: Feb. 27, 2025
Complex
diseases
pose
challenges
in
prediction
due
to
their
multifactorial
and
polygenic
nature.
This
study
employed
machine
learning
(ML)
analyze
genomic
data
from
the
UK
Biobank,
aiming
predict
predisposition
complex
like
multiple
sclerosis
(MS)
Alzheimer's
disease
(AD).
We
tested
logistic
regression
(LR),
ensemble
tree
methods,
deep
models
for
this
purpose.
LR
displayed
remarkable
stability
across
various
subsets
of
data,
outshining
approaches,
which
showed
greater
variability
performance.
Additionally,
ML
methods
demonstrated
an
ability
maintain
optimal
performance
despite
correlated
features
linkage
disequilibrium.
When
comparing
risk
score
(PRS)
with
PRS
consistently
performed
at
average
level.
By
employing
explainability
tools
MS,
we
found
that
results
confirmed
polygenicity
disease.
The
highest-prioritized
variants
MS
were
identified
as
expression
or
splicing
quantitative
trait
loci
located
non-coding
regions
within
near
genes
associated
immune
response,
a
prevalence
human
leukocyte
antigen
(HLA)
gene
annotations.
Our
findings
shed
light
on
both
potential
capture
patterns,
paving
way
improved
predictive
models.
Genomics & Informatics,
Journal Year:
2025,
Volume and Issue:
23(1)
Published: March 6, 2025
Large-scale
national
biobank
projects
utilizing
whole-genome
sequencing
have
emerged
as
transformative
resources
for
understanding
human
genetic
variation
and
its
relationship
to
health
disease.
These
initiatives,
which
include
the
UK
Biobank,
All
of
Us
Research
Program,
Singapore's
PRECISE,
Biobank
Japan,
National
Project
Bio-Big
Data
Korea,
are
generating
unprecedented
volumes
high-resolution
genomic
data
integrated
with
comprehensive
phenotypic,
environmental,
clinical
information.
This
review
examines
methodologies,
contributions,
challenges
major
WGS-based
genome
worldwide.
We
first
discuss
landscape
highlighting
their
distinct
approaches
collection,
participant
recruitment,
phenotype
characterization.
then
introduce
recent
technological
advances
that
enable
efficient
processing
analysis
large-scale
WGS
data,
including
improvements
in
variant
calling
algorithms,
innovative
methods
creating
multi-sample
VCFs,
optimized
storage
formats,
cloud-based
computing
solutions.
The
synthesizes
key
discoveries
from
these
projects,
particularly
identifying
expression
quantitative
trait
loci
rare
variants
associated
complex
diseases.
Our
introduces
latest
findings
has
advanced
our
population-specific
diseases
Korean
East
Asian
populations.
Finally,
we
future
directions
maximizing
impact
on
precision
medicine
global
equity.
examination
demonstrates
how
revolutionizing
research
healthcare
delivery
while
importance
continued
investment
diverse,
resources.
Genome Medicine,
Journal Year:
2024,
Volume and Issue:
16(1)
Published: April 16, 2024
Abstract
Despite
the
abundance
of
genotype-phenotype
association
studies,
resulting
outcomes
often
lack
robustness
and
interpretations.
To
address
these
challenges,
we
introduce
PheSeq,
a
Bayesian
deep
learning
model
that
enhances
interprets
studies
through
integration
perception
phenotype
descriptions.
By
implementing
PheSeq
in
three
case
on
Alzheimer’s
disease,
breast
cancer,
lung
identify
1024
priority
genes
for
disease
818
566
cancer
respectively.
Benefiting
from
data
fusion,
findings
represent
moderate
positive
rates,
high
recall
interpretation
gene-disease
studies.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 2, 2024
Abstract
The
complexity
of
Alzheimer’s
disease
(AD)
manifests
in
diverse
clinical
phenotypes,
including
cognitive
impairment
and
neuropsychiatric
symptoms
(NPSs).
However,
the
etiology
these
phenotypes
remains
elusive.
To
address
this,
PsychAD
project
generated
a
population-level
single-nucleus
RNA-seq
dataset
comprising
over
6
million
nuclei
from
prefrontal
cortex
1,494
individual
brains,
covering
variety
AD-related
that
capture
impairment,
severity
pathological
lesions,
presence
NPSs.
Leveraging
this
dataset,
we
developed
deep
learning
framework,
called
Phenotype
Associated
Single
Cell
encoder
(PASCode),
to
score
single-cell
phenotype
associations,
identified
∼1.5
associate
cells
(PACs).
We
compared
PACs
within
27
distinct
brain
cell
subclasses
prioritized
subpopulations
their
expressed
genes
across
various
AD
upregulation
reactive
astrocyte
subtype
with
neuroprotective
function
resilient
donors.
Additionally,
link
multiple
subpopulation
protoplasmic
astrocytes
alter
gene
expression
regulation
donors
depression.
Uncovering
cellular
molecular
mechanisms
underlying
has
potential
provide
valuable
insights
towards
identification
novel
diagnostic
markers
therapeutic
targets.
All
PACs,
along
type
information,
are
summarized
into
an
AD-phenotypic
atlas
for
research
community.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 20, 2024
ABSTRACT
Complex
diseases
pose
challenges
in
disease
prediction
due
to
their
multifactorial
and
polygenic
nature.
In
this
work,
we
explored
the
of
two
complex
diseases,
multiple
sclerosis
(MS)
Alzheimer’s
(AD),
using
machine
learning
(ML)
methods
genomic
data
from
UK
Biobank.
Different
ML
were
applied,
including
logistic
regressions
(LR),
gradient
boosting
decision
trees
(GB),
extremely
randomized
(ET),
random
forest
(RF),
feedforward
networks
(FFN),
convolutional
neural
(CNN).
The
primary
goal
research
was
investigate
variability
models
classifying
based
on
risk.
LR
most
robust
method
across
folds
whereas
deep
(FFN
CNN)
exhibited
high
variability.
When
comparing
performance
risk
scores
(PRS)
with
methods,
PRS
consistently
performed
at
an
average
level.
However,
still
offers
several
practical
advantages
over
methods.
Despite
implementing
feature
selection
techniques
exclude
non-informative
correlated
predictors,
did
not
improve
significantly,
underscoring
ability
achieve
optimal
even
presence
features
linkage
disequilibrium.
Upon
applying
explainability
tools
extract
information
about
contributing
classification
task,
results
confirmed
polygenicity
MS.
prevalence
HLA
gene
annotations
among
top
chromosome
6
aligns
significance
context
Overall,
highest-prioritized
variants
identified
as
expression
or
splicing
quantitative
trait
loci
(eQTL
sQTL)
located
non-coding
regions
within
near
genes
associated
immune
response
summary,
deeper
insights
into
how
discern
patterns
related
diseases.
Genomics,
Journal Year:
2024,
Volume and Issue:
116(5), P. 110910 - 110910
Published: Aug. 5, 2024
This
article
explores
deep
learning
model
design,
drawing
inspiration
from
the
omnigenic
and
genetic
heterogeneity
concepts,
to
improve
schizophrenia
prediction
using
genotype
data.
It
introduces
an
innovative
three-step
approach
leveraging
neural
networks'
capabilities
efficiently
handle
interactions.
A
locally
connected
network
initially
routes
input
data
variants
their
corresponding
genes.
The
second
step
employs
Encoder-Decoder
capture
relationships
among
identified
final
integrates
knowledge
first
two
incorporates
a
parallel
component
consider
effects
of
additional
expansion
enhances
scores
by
considering
larger
number
Trained
models
achieved
average
AUC
0.83,
surpassing
other
genotype-trained
matching
gene
expression
dataset-based
approaches.
Additionally,
tests
on
held-out
sets
reported
sensitivity
0.72
accuracy
0.76,
aligning
with
heritability
predictions.
Moreover,
study
addresses
challenges
diverse
population
subsets.