bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2021,
Volume and Issue:
unknown
Published: April 8, 2021
Abstract
The
next
phase
of
genome
biology
research
requires
understanding
how
DNA
sequence
encodes
phenotypes,
from
the
molecular
to
organismal
levels.
How
noncoding
determines
gene
expression
in
different
cell
types
is
a
major
unsolved
problem,
and
critical
downstream
applications
human
genetics
depend
on
improved
solutions.
Here,
we
report
substantially
prediction
accuracy
through
use
new
deep
learning
architecture
called
Enformer
that
able
integrate
long-range
interactions
(up
100
kb
away)
genome.
This
improvement
yielded
more
accurate
variant
effect
predictions
for
both
natural
genetic
variants
saturation
mutagenesis
measured
by
massively
parallel
reporter
assays.
Notably,
outperformed
best
team
assessment
interpretation
(CAGI5)
challenge
with
no
additional
training.
Furthermore,
learned
predict
promoter-enhancer
directly
competitively
methods
take
direct
experimental
data
as
input.
We
expect
these
advances
will
enable
effective
fine-mapping
growing
disease
associations
cell-type-specific
regulatory
mechanisms
provide
framework
interpret
cis
-regulatory
evolution.
To
foster
applications,
have
made
pre-trained
model
openly
available,
pre-computed
all
common
1000
Genomes
dataset.
One-sentence
summary
Improved
candidate
enhancer
prioritization
driven
extended
interaction
modelling.
Nature Methods,
Journal Year:
2021,
Volume and Issue:
18(10), P. 1196 - 1203
Published: Oct. 1, 2021
Abstract
How
noncoding
DNA
determines
gene
expression
in
different
cell
types
is
a
major
unsolved
problem,
and
critical
downstream
applications
human
genetics
depend
on
improved
solutions.
Here,
we
report
substantially
prediction
accuracy
from
sequences
through
the
use
of
deep
learning
architecture,
called
Enformer,
that
able
to
integrate
information
long-range
interactions
(up
100
kb
away)
genome.
This
improvement
yielded
more
accurate
variant
effect
predictions
for
both
natural
genetic
variants
saturation
mutagenesis
measured
by
massively
parallel
reporter
assays.
Furthermore,
Enformer
learned
predict
enhancer–promoter
directly
sequence
competitively
with
methods
take
direct
experimental
data
as
input.
We
expect
these
advances
will
enable
effective
fine-mapping
disease
associations
provide
framework
interpret
cis
-regulatory
evolution.
Genome Medicine,
Journal Year:
2019,
Volume and Issue:
11(1)
Published: Nov. 19, 2019
Abstract
Artificial
intelligence
(AI)
is
the
development
of
computer
systems
that
are
able
to
perform
tasks
normally
require
human
intelligence.
Advances
in
AI
software
and
hardware,
especially
deep
learning
algorithms
graphics
processing
units
(GPUs)
power
their
training,
have
led
a
recent
rapidly
increasing
interest
medical
applications.
In
clinical
diagnostics,
AI-based
vision
approaches
poised
revolutionize
image-based
while
other
subtypes
begun
show
similar
promise
various
diagnostic
modalities.
some
areas,
such
as
genomics,
specific
type
algorithm
known
used
process
large
complex
genomic
datasets.
this
review,
we
first
summarize
main
classes
problems
well
suited
solve
describe
benefit
from
these
solutions.
Next,
focus
on
emerging
methods
for
including
variant
calling,
genome
annotation
classification,
phenotype-to-genotype
correspondence.
Finally,
end
with
discussion
future
potential
individualized
medicine
applications,
risk
prediction
common
diseases,
challenges,
limitations,
biases
must
be
carefully
addressed
successful
deployment
particularly
those
utilizing
genetics
genomics
data.
Science,
Journal Year:
2020,
Volume and Issue:
367(6476)
Published: Jan. 24, 2020
Organoids
recapitulate
brain
development
Gene
expression
changes
and
their
control
by
accessible
chromatin
in
the
human
during
is
of
great
interest
but
limited
accessibility.
Trevino
et
al.
avoided
this
problem
developing
three-dimensional
organoid
models
forebrain
examining
accessibility
gene
at
single-cell
level.
From
analysis,
they
matched
developmental
profiles
between
fetal
samples,
identified
transcription
factor
binding
profiles,
predicted
how
factors
are
linked
to
cortical
development.
The
researchers
were
able
correlate
neurodevelopmental
disease
risk
loci
genes
with
specific
cell
types
Science
,
issue
p.
eaay1645
Nucleic Acids Research,
Journal Year:
2021,
Volume and Issue:
49(10), P. e60 - e60
Published: Feb. 25, 2021
Abstract
Sequence-based
analysis
and
prediction
are
fundamental
bioinformatic
tasks
that
facilitate
understanding
of
the
sequence(-structure)-function
paradigm
for
DNAs,
RNAs
proteins.
Rapid
accumulation
sequences
requires
equally
pervasive
development
new
predictive
models,
which
depends
on
availability
effective
tools
support
these
efforts.
We
introduce
iLearnPlus,
first
machine-learning
platform
with
graphical-
web-based
interfaces
construction
pipelines
predictions
using
nucleic
acid
protein
sequences.
iLearnPlus
provides
a
comprehensive
set
algorithms
automates
sequence-based
feature
extraction
analysis,
deployment
assessment
performance,
statistical
data
visualization;
all
without
programming.
includes
wide
range
sets
encode
information
from
input
over
twenty
cover
several
deep-learning
approaches,
outnumbering
current
solutions
by
margin.
Our
solution
caters
to
experienced
bioinformaticians,
given
broad
options,
biologists
no
programming
background,
point-and-click
interface
easy-to-follow
design
process.
showcase
two
case
studies
concerning
long
noncoding
(lncRNAs)
RNA
transcripts
crotonylation
sites
in
chains.
is
an
open-source
available
at
https://github.com/Superzchen/iLearnPlus/
webserver
http://ilearnplus.erc.monash.edu/.
PLoS Computational Biology,
Journal Year:
2020,
Volume and Issue:
16(7), P. e1008050 - e1008050
Published: July 20, 2020
Machine
learning
algorithms
trained
to
predict
the
regulatory
activity
of
nucleic
acid
sequences
have
revealed
principles
gene
regulation
and
guided
genetic
variation
analysis.
While
human
genome
has
been
extensively
annotated
studied,
model
organisms
less
explored.
Model
organism
genomes
offer
both
additional
training
unique
annotations
describing
tissue
cell
states
unavailable
in
humans.
Here,
we
develop
a
strategy
train
deep
convolutional
neural
networks
simultaneously
on
multiple
apply
it
learn
sequence
predictors
for
large
compendia
mouse
data.
Training
improves
expression
prediction
accuracy
held
out
variant
sequences.
We
further
demonstrate
novel
powerful
approach
models
analyze
variants
associated
with
molecular
phenotypes
disease.
Together
these
techniques
unleash
thousands
non-human
epigenetic
transcriptional
profiles
toward
more
effective
investigation
how
affects
Current Opinion in Plant Biology,
Journal Year:
2020,
Volume and Issue:
54, P. 34 - 41
Published: Jan. 24, 2020
Our
era
has
witnessed
tremendous
advances
in
plant
genomics,
characterized
by
an
explosion
of
high-throughput
techniques
to
identify
multi-dimensional
genome-wide
molecular
phenotypes
at
low
costs.
More
importantly,
genomics
is
not
merely
acquiring
phenotypes,
but
also
leveraging
powerful
data
mining
tools
predict
and
explain
them.
In
recent
years,
deep
learning
been
found
extremely
effective
these
tasks.
This
review
highlights
two
prominent
questions
the
intersection
learning:
1)
how
can
flow
information
from
genomic
DNA
sequences
be
modeled;
2)
we
functional
variants
natural
populations
using
models?
Additionally,
discuss
possibility
unleashing
power
synthetic
biology
create
novel
elements
with
desirable
functions.
Taken
together,
propose
a
central
role
future
research
crop
genetic
improvement.