medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 1, 2024
Amyotrophic
lateral
sclerosis
(ALS)
is
a
fatal
and
incurable
neurodegenerative
disease
caused
by
the
selective
progressive
death
of
motor
neurons
(MNs).
Understanding
genetic
molecular
factors
influencing
ALS
survival
crucial
for
management
therapeutics.
In
this
study,
we
introduce
deep
learning-powered
analysis
framework
to
link
rare
noncoding
variants
survival.
Using
data
from
human
induced
pluripotent
stem
cell
(iPSC)-derived
MNs,
method
prioritizes
functional
using
learning,
links
cis-regulatory
elements
(CREs)
target
genes
epigenomics
data,
integrates
these
through
gene-level
burden
tests
identify
survival-modifying
variants,
CREs,
genes.
We
apply
approach
analyze
6,715
genomes,
pinpoint
four
novel
associated
with
survival,
including
chr7:76,009,472:C>T
linked
CCDC146
.
CRISPR-Cas9
editing
variant
increases
expression
in
iPSC-derived
MNs
exacerbates
ALS-specific
phenotypes,
TDP-43
mislocalization.
Suppressing
an
antisense
oligonucleotide
(ASO),
showing
no
toxicity,
completely
rescues
ALS-associated
defects
derived
sporadic
patients
carriers
G4C2-repeat
expansion
within
C9ORF72
ASO
targeting
may
be
broadly
effective
therapeutic
ALS.
Our
provides
generic
powerful
studying
genetics
complex
diseases.
Cell Reports Methods,
Journal Year:
2023,
Volume and Issue:
3(1), P. 100384 - 100384
Published: Jan. 1, 2023
Gene
regulation
is
a
central
topic
in
cell
biology.
Advances
omics
technologies
and
the
accumulation
of
data
have
provided
better
opportunities
for
gene
studies
than
ever
before.
For
this
reason
deep
learning,
as
data-driven
predictive
modeling
approach,
has
been
successfully
applied
to
field
during
past
decade.
In
article,
we
aim
give
brief
yet
comprehensive
overview
representative
deep-learning
methods
regulation.
Specifically,
discuss
compare
design
principles
datasets
used
by
each
method,
creating
reference
researchers
who
wish
replicate
or
improve
existing
methods.
We
also
common
problems
approaches
prospectively
introduce
emerging
paradigms
that
will
potentially
alleviate
them.
hope
article
provide
rich
up-to-date
resource
shed
light
on
future
research
directions
area.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: March 20, 2023
Deep
learning
methods
have
recently
become
the
state-of-the-art
in
a
variety
of
regulatory
genomic
tasks1-6
including
prediction
gene
expression
from
DNA.
As
such,
these
promise
to
serve
as
important
tools
interpreting
full
spectrum
genetic
variation
observed
personal
genomes.
Previous
evaluation
strategies
assessed
their
predictions
across
regions,
however,
systematic
benchmarking
is
lacking
assess
individuals,
which
would
directly
evaluates
utility
DNA
interpreters.
We
used
paired
Whole
Genome
Sequencing
and
839
individuals
ROSMAP
study7
evaluate
ability
current
predict
at
varied
loci.
Our
approach
identifies
limitation
correctly
direction
variant
effects.
show
that
this
stems
insufficiently
learnt
sequence
motif
grammar,
suggest
new
model
training
improve
performance.
Nature Communications,
Journal Year:
2024,
Volume and Issue:
15(1)
Published: Feb. 22, 2024
Single-cell
chromatin
accessibility
sequencing
(scCAS)
has
emerged
as
a
valuable
tool
for
interrogating
and
elucidating
epigenomic
heterogeneity
gene
regulation.
However,
scCAS
data
inherently
suffers
from
limitations
such
high
sparsity
dimensionality,
which
pose
significant
challenges
downstream
analyses.
Although
several
methods
are
proposed
to
enhance
data,
there
still
that
hinder
the
effectiveness
of
these
methods.
Here,
we
propose
scCASE,
enhancement
method
based
on
non-negative
matrix
factorization
incorporates
an
iteratively
updating
cell-to-cell
similarity
matrix.
Through
comprehensive
experiments
multiple
datasets,
demonstrate
advantages
scCASE
over
existing
enhancement.
The
interpretable
cell
type-specific
peaks
identified
by
can
provide
biological
insights
into
subpopulations.
Moreover,
leverage
large
compendia
available
omics
reference,
further
expand
scCASER,
enables
incorporation
external
reference
improve
performance.
Bioengineering,
Journal Year:
2024,
Volume and Issue:
11(3), P. 263 - 263
Published: March 8, 2024
As
available
genomic
interval
data
increase
in
scale,
we
require
fast
systems
to
search
them.
A
common
approach
is
simple
string
matching
compare
a
term
metadata,
but
this
limited
by
incomplete
or
inaccurate
annotations.
An
alternative
directly
through
region
overlap
analysis,
leads
challenges
like
sparsity,
high
dimensionality,
and
computational
expense.
We
novel
methods
quickly
flexibly
query
large,
messy
databases.
Here,
develop
system
using
representation
learning.
train
numerical
embeddings
for
collection
of
sets
simultaneously
with
their
metadata
labels,
capturing
similarity
between
low-dimensional
space.
Using
these
learned
co-embeddings,
that
solves
three
related
information
retrieval
tasks
embedding
distance
computations:
retrieving
user
string,
suggesting
new
labels
database
sets,
similar
set.
evaluate
use
cases
show
jointly
representations
are
promising
fast,
flexible,
accurate
retrieval.
Nature Methods,
Journal Year:
2024,
Volume and Issue:
21(6), P. 1014 - 1022
Published: May 9, 2024
Abstract
Standard
scATAC
sequencing
(scATAC-seq)
analysis
pipelines
represent
cells
as
sparse
numeric
vectors
relative
to
an
atlas
of
peaks
or
genomic
tiles
and
consequently
ignore
sequence
information
at
accessible
loci.
Here
we
present
CellSpace,
efficient
scalable
sequence-informed
embedding
algorithm
for
scATAC-seq
that
learns
a
mapping
DNA
k
-mers
the
same
space,
address
this
limitation.
We
show
CellSpace
captures
meaningful
latent
structure
in
datasets,
including
cell
subpopulations
developmental
hierarchies,
can
score
transcription
factor
activities
single
based
on
proximity
binding
motifs
embedded
space.
Importantly,
implicitly
mitigates
batch
effects
arising
from
multiple
samples,
donors
assays,
even
when
individual
datasets
are
processed
different
peak
atlases.
Thus,
provides
powerful
tool
integrating
interpreting
large-scale
compendia.