Science,
Journal Year:
2023,
Volume and Issue:
380(6643)
Published: April 27, 2023
Protein-coding
differences
between
species
often
fail
to
explain
phenotypic
diversity,
suggesting
the
involvement
of
genomic
elements
that
regulate
gene
expression
such
as
enhancers.
Identifying
associations
enhancers
and
phenotypes
is
challenging
because
enhancer
activity
can
be
tissue-dependent
functionally
conserved
despite
low
sequence
conservation.
We
developed
Tissue-Aware
Conservation
Inference
Toolkit
(TACIT)
associate
candidate
with
species'
using
predictions
from
machine
learning
models
trained
on
specific
tissues.
Applying
TACIT
motor
cortex
parvalbumin-positive
interneuron
neurological
revealed
dozens
enhancer-phenotype
associations,
including
brain
size-associated
interact
genes
implicated
in
microcephaly
or
macrocephaly.
provides
a
foundation
for
identifying
associated
evolution
any
convergently
evolved
phenotype
large
group
aligned
genomes.
Nature Genetics,
Journal Year:
2021,
Volume and Issue:
53(3), P. 403 - 411
Published: Feb. 25, 2021
Abstract
The
advent
of
single-cell
chromatin
accessibility
profiling
has
accelerated
the
ability
to
map
gene
regulatory
landscapes
but
outpaced
development
scalable
software
rapidly
extract
biological
meaning
from
these
data.
Here
we
present
a
suite
for
analysis
in
R
(ArchR;
https://www.archrproject.com/
)
that
enables
fast
and
comprehensive
ArchR
provides
an
intuitive,
user-focused
interface
complex
analyses,
including
doublet
removal,
clustering
cell
type
identification,
unified
peak
set
generation,
cellular
trajectory
DNA
element-to-gene
linkage,
transcription
factor
footprinting,
mRNA
expression
level
prediction
multi-omic
integration
with
RNA
sequencing
(scRNA-seq).
Enabling
over
1.2
million
single
cells
within
8
h
on
standard
Unix
laptop,
is
end-to-end
will
accelerate
understanding
regulation
at
resolution
individual
cells.
Bioinformatics,
Journal Year:
2021,
Volume and Issue:
37(15), P. 2112 - 2120
Published: Feb. 3, 2021
Abstract
Motivation
Deciphering
the
language
of
non-coding
DNA
is
one
fundamental
problems
in
genome
research.
Gene
regulatory
code
highly
complex
due
to
existence
polysemy
and
distant
semantic
relationship,
which
previous
informatics
methods
often
fail
capture
especially
data-scarce
scenarios.
Results
To
address
this
challenge,
we
developed
a
novel
pre-trained
bidirectional
encoder
representation,
named
DNABERT,
global
transferrable
understanding
genomic
sequences
based
on
up
downstream
nucleotide
contexts.
We
compared
DNABERT
most
widely
used
programs
for
genome-wide
elements
prediction
demonstrate
its
ease
use,
accuracy
efficiency.
show
that
single
transformers
model
can
simultaneously
achieve
state-of-the-art
performance
promoters,
splice
sites
transcription
factor
binding
sites,
after
easy
fine-tuning
using
small
task-specific
labeled
data.
Further,
enables
direct
visualization
nucleotide-level
importance
relationship
within
input
better
interpretability
accurate
identification
conserved
sequence
motifs
functional
genetic
variant
candidates.
Finally,
with
human
even
be
readily
applied
other
organisms
exceptional
performance.
anticipate
fined
tuned
many
analyses
tasks.
Availability
implementation
The
source
code,
pretrained
finetuned
are
available
at
GitHub
(https://github.com/jerryji1993/DNABERT).
Supplementary
information
data
Bioinformatics
online.
Genome biology,
Journal Year:
2021,
Volume and Issue:
22(1)
Published: April 15, 2021
Abstract
Differential
gene
expression
mechanisms
ensure
cellular
differentiation
and
plasticity
to
shape
ontogenetic
phylogenetic
diversity
of
cell
types.
A
key
regulator
differential
programs
are
the
enhancers,
gene-distal
cis
-regulatory
sequences
that
govern
spatiotemporal
quantitative
dynamics
target
genes.
Enhancers
widely
believed
physically
contact
promoters
effect
transcriptional
activation.
However,
our
understanding
full
complement
regulatory
proteins
definitive
mechanics
enhancer
action
is
incomplete.
Here,
we
review
recent
findings
present
some
emerging
concepts
on
also
outline
a
set
outstanding
questions.
Molecular Systems Biology,
Journal Year:
2021,
Volume and Issue:
17(3)
Published: March 1, 2021
Molecular
knowledge
of
biological
processes
is
a
cornerstone
in
omics
data
analysis.
Applied
to
single-cell
data,
such
analyses
provide
mechanistic
insights
into
individual
cells
and
their
interactions.
However,
intercellular
communication
scarce,
scattered
across
resources,
not
linked
intracellular
processes.
To
address
this
gap,
we
combined
over
100
resources
covering
interactions
roles
proteins
inter-
signaling,
as
well
transcriptional
post-transcriptional
regulation.
We
added
protein
complex
information
annotations
on
function,
localization,
role
diseases
for
each
protein.
The
resource
available
human,
via
homology
translation
mouse
rat.
are
accessible
OmniPath's
web
service
(https://omnipathdb.org/),
Cytoscape
plug-in,
packages
R/Bioconductor
Python,
providing
access
options
computational
experimental
scientists.
created
workflows
with
tutorials
facilitate
the
analysis
cell-cell
affected
downstream
signaling
OmniPath
provides
single
point
spanning
intra-
analysis,
demonstrate
applications
studying
SARS-CoV-2
infection
ulcerative
colitis.
The Plant Cell,
Journal Year:
2021,
Volume and Issue:
34(2), P. 718 - 741
Published: Nov. 18, 2021
The
identification
and
characterization
of
cis-regulatory
DNA
sequences
how
they
function
to
coordinate
responses
developmental
environmental
cues
is
paramount
importance
plant
biology.
Key
these
regulatory
processes
are
modules
(CRMs),
which
include
enhancers
silencers.
Despite
the
extraordinary
advances
in
high-quality
sequence
assemblies
genome
annotations,
understanding
CRMs,
regulate
gene
expression,
lag
significantly
behind.
This
especially
true
for
their
distinguishing
characteristics
activity
states.
Here,
we
review
current
knowledge
on
CRMs
breakthrough
technologies
enabling
identification,
characterization,
validation
CRMs;
compare
genomic
distributions
with
respect
target
genes
between
different
species,
discuss
role
transposable
elements
harboring
evolution
expression.
an
exciting
time
study
cis-regulomes
plants;
however,
significant
existing
challenges
need
be
overcome
fully
understand
appreciate
biology
crop
improvement.