Cis-regulatory
elements
(CREs)
are
crucial
for
regulating
gene
expression,
and
G-quadruplexes
(G4s),
as
prototypal
non-canonical
DNA
structures,
may
play
a
role
in
this
regulation.
However,
the
relationship
between
G4s
CREs,
especially
with
non-promoter-like
functional
elements,
requires
further
systematic
investigation.
We
aimed
to
investigate
associations
human
cCREs
(candidate
CREs)
inferred
from
Encyclopedia
of
Elements
(ENCODE)
data.
found
that
prominently
enriched
most
types
cCREs,
those
promoter-like
signatures
(PLS).
The
co-occurrence
CTCF
signals
H3K4me3
or
H3K27ac
strengthens
association
G4s.
Genetic
variants
G4s,
particularly
within
their
G-runs,
exhibit
higher
regulatory
potential
deleterious
effects
compared
cCREs.
G-runs
near
transcriptional
start
sites
(TSSs)
more
evolutionarily
constrained
while
far
TSS
relatively
less
conserved.
presence
is
often
linked
favorable
local
chromatin
environment
activation
execution
function
potentially
attributable
formation
G4
secondary
structures.
Finally,
we
discovered
G4-associated
widespread
variety
cancers.
Our
study
suggests
integral
components
cis-regulatory
extending
beyond
promoters.
primary
sequences
associated
localization
structures
these
elements.
Therefore,
propose
defining
pivotal
genome.
Zoonomia
is
the
largest
comparative
genomics
resource
for
mammals
produced
to
date.
By
aligning
genomes
240
species,
we
identify
bases
that,
when
mutated,
are
likely
affect
fitness
and
alter
disease
risk.
At
least
332
million
(~10.7%)
in
human
genome
unusually
conserved
across
species
(evolutionarily
constrained)
relative
neutrally
evolving
repeats,
4552
ultraconserved
elements
nearly
perfectly
conserved.
Of
101
significantly
constrained
single
bases,
80%
outside
protein-coding
exons
half
have
no
functional
annotations
Encyclopedia
of
DNA
Elements
(ENCODE)
resource.
Changes
genes
regulatory
associated
with
exceptional
mammalian
traits,
such
as
hibernation,
that
could
inform
therapeutic
development.
Earth's
vast
imperiled
biodiversity
offers
distinctive
power
identifying
genetic
variants
function
organismal
phenotypes.
Nucleic Acids Research,
Год журнала:
2024,
Номер
52(D1), С. D1143 - D1154
Опубликована: Янв. 5, 2024
Machine
Learning-based
scoring
and
classification
of
genetic
variants
aids
the
assessment
clinical
findings
is
employed
to
prioritize
in
diverse
studies
analyses.
Combined
Annotation-Dependent
Depletion
(CADD)
one
first
methods
for
genome-wide
prioritization
across
different
molecular
functions
has
been
continuously
developed
improved
since
its
original
publication.
Here,
we
present
our
most
recent
release,
CADD
v1.7.
We
explored
integrated
new
annotation
features,
among
them
state-of-the-art
protein
language
model
scores
(Meta
ESM-1v),
regulatory
variant
effect
predictions
(from
sequence-based
convolutional
neural
networks)
sequence
conservation
(Zoonomia).
evaluated
version
on
data
sets
derived
from
ClinVar,
ExAC/gnomAD
1000
Genomes
variants.
For
coding
effects,
tested
31
Deep
Mutational
Scanning
(DMS)
ProteinGym
and,
prediction,
used
saturation
mutagenesis
reporter
assay
promoter
enhancer
sequences.
The
inclusion
features
further
overall
performance
CADD.
As
with
previous
releases,
all
sets,
v1.7
scores,
scripts
on-site
an
easy-to-use
webserver
are
readily
provided
via
https://cadd.bihealth.org/
or
https://cadd.gs.washington.edu/
community.
Nature Communications,
Год журнала:
2023,
Номер
14(1)
Опубликована: Сен. 1, 2023
Transposable
elements
(TE)
are
repetitive
genomic
that
harbor
binding
sites
for
human
transcription
factors
(TF).
A
regulatory
role
TEs
has
been
suggested
in
embryonal
development
and
diseases
such
as
cancer
but
systematic
investigation
of
their
functions
limited
by
widespread
silencing
the
genome.
Here,
we
utilize
unbiased
massively
parallel
reporter
assay
data
using
a
whole
genome
library
to
identify
with
functional
enhancer
activity
two
types
endodermal
lineage,
colorectal
liver
cancers.
We
show
identified
TE
enhancers
characterized
features
associated
active
enhancers,
epigenetic
marks
TF
binding.
Importantly,
distinct
subfamilies
function
tissue-specific
namely
MER11-
LTR12-elements
colon
cancers,
respectively.
These
bound
TFs
each
cell
type,
they
have
predicted
associations
differentially
expressed
genes.
In
conclusion,
these
demonstrate
how
different
can
paving
way
comprehensive
understanding
bona
fide
genomes.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Июль 5, 2023
Abstract
Peripheral
sensory
neurons
in
the
dorsal
root
ganglion
(DRG)
and
trigeminal
(TG)
are
specialized
to
detect
transduce
diverse
environmental
stimuli
including
touch,
temperature,
pain
central
nervous
system.
Recent
advances
single-cell
RNA-sequencing
(scRNA-seq)
have
provided
new
insights
into
diversity
of
ganglia
cell
types
rodents,
non-human
primates,
humans,
but
it
remains
difficult
compare
transcriptomically
defined
across
studies
species.
Here,
we
built
cross-species
harmonized
atlases
DRG
TG
that
describe
18
neuronal
11
non-neuronal
6
species
19
studies.
We
then
demonstrate
utility
this
reference
atlas
by
using
annotate
newly
profiled
nuclei/cells
from
both
human
highly
regenerative
axolotl.
observe
transcriptomic
profiles
neuron
subtypes
broadly
similar
vertebrates,
expression
functionally
important
neuropeptides
channels
can
vary
notably.
The
resources
data
presented
here
guide
future
comparative
transcriptomics,
simplify
type
nomenclature
differences
studies,
help
prioritize
targets
for
therapy
development.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Июль 12, 2023
Abstract
Pre-trained
large
language
models
demonstrate
potential
in
extracting
information
from
DNA
sequences,
yet
adapting
to
a
variety
of
tasks
and
data
modalities
remains
challenge.
To
address
this,
we
propose
DNAGPT,
generalized
pre-training
model
trained
on
over
200
billion
base
pairs
all
mammals.
By
enhancing
the
classic
GPT
with
binary
classification
task
(DNA
sequence
order),
numerical
regression
(guanine-cytosine
content
prediction),
comprehensive
token
language,
DNAGPT
can
handle
versatile
analysis
while
processing
both
data.
Our
evaluation
genomic
signal
region
recognition,
mRNA
abundance
regression,
artificial
genome
generation
demonstrates
DNAGPT’s
superior
performance
compared
existing
designed
for
specific
downstream
tasks,
benefiting
using
newly
structure.
EMBO Reports,
Год журнала:
2024,
Номер
25(4), С. 1721 - 1733
Опубликована: Март 25, 2024
Abstract
Remnants
of
transposable
elements
(TEs)
are
widely
expressed
throughout
mammalian
embryo
development.
Originally
infesting
our
genomes
as
selfish
and
acting
a
source
genome
instability,
several
these
have
been
co-opted
part
complex
system
regulation.
Many
TEs
lost
transposition
ability
their
transcriptional
potential
has
tampered
result
interactions
with
the
host
evolutionary
time.
It
proposed
that
ultimately
repurposed
to
function
gene
regulatory
hubs
scattered
genomes.
In
early
in
particular,
find
perfect
environment
naïve
chromatin
escape
repression
by
host.
As
consequence,
it
is
thought
hosts
found
ways
co-opt
TE
sequences
regulate
large-scale
changes
transcription
state
this
review,
we
discuss
examples
during
development,
for
co-option
regulation
pressures
on
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Янв. 9, 2024
Abstract
Cis
-regulatory
elements
(CREs)
are
critical
in
regulating
gene
expression,
and
yet
understanding
of
CRE
evolution
remains
challenging.
Here,
we
constructed
a
comprehensive
single-cell
atlas
chromatin
accessibility
Oryza
sativa
,
integrating
data
from
103,911
nuclei
representing
126
discrete
cell
states
across
nine
distinct
organs.
We
used
comparative
genomics
to
compare
cell-type
resolved
between
O.
57,552
four
additional
grass
species
(
Zea
mays,
Sorghum
bicolor,
Panicum
miliaceum
Urochloa
fusca
).
Accessible
regions
(ACRs)
had
different
levels
conservation
depending
on
the
degree
specificity.
found
complex
relationship
ACRs
with
conserved
noncoding
sequences,
specificity,
conservation,
tissue-specific
switching.
Additionally,
that
epidermal
were
less
compared
other
types,
potentially
indicating
more
rapid
regulatory
has
occurred
L1-derived
layer
these
species.
Finally,
identified
characterized
subset
overlapped
repressive
histone
modification
H3K27me3,
implicating
them
as
silencer-like
CREs
maintained
by
evolution.
Collectively,
this
approach
highlights
dynamics
plant
cell-type-specific
Most
genetic
variants
associated
with
psychiatric
disorders
are
located
in
noncoding
regions
of
the
genome.
To
investigate
their
functional
implications,
we
integrate
epigenetic
data
from
PsychENCODE
Consortium
and
other
published
sources
to
construct
a
comprehensive
atlas
candidate
brain
cis-regulatory
elements.
Using
deep
learning,
model
these
elements'
sequence
syntax
predict
how
binding
sites
for
lineage-specific
transcription
factors
contribute
cell
type-specific
gene
regulation
various
types
glia
neurons.
The
evolutionary
history
suggests
that
new
regulatory
information
emerges
primarily
via
smaller
mutations
within
conserved
mammalian
elements
rather
than
entirely
human-
or
primate-specific
sequences.
However,
elements,
particularly
those
active
during
fetal
development
excitatory
neurons
astrocytes,
implicated
heritability
brain-related
human
traits.
Additionally,
introduce
PsychSCREEN,
web-based
platform
offering
interactive
visualization
PsychENCODE-generated
diverse
individuals
healthy
controls.
Marsupials
and
placental
mammals
exhibit
significant
differences
in
reproductive
life
history
strategies.
are
born
highly
underdeveloped
after
an
extremely
short
period
of
gestation,
leading
to
prioritization
the
development
structures
critical
for
post-birth
survival
pouch.
Critically,
they
must
undergo
accelerated
oro-facial
region
compared
placentals.
Previously
we
described
carnivorous
Australian
marsupial,
fat-tailed
dunnart
Sminthopsis
crassicaudata
that
has
one
shortest
gestations
any
mammal.
By
combining
genome
comparisons
mouse
with
functional
data
enhancer-associated
chromatin
modifications,
H3K4me3
H3K27ac,
investigated
divergence
craniofacial
regulatory
landscapes
between
these
species.
This
is
first
description
genome-wide
face
elements
a
60,626
putative
enhancers
12,295
promoters
described.
We
also
generated
RNA-seq
investigate
expression
dynamics
genes
near
predicted
active
elements.
While
involved
regulating
facial
were
largely
conserved
dunnart,
landscape
varied
significantly.
Additionally,
subset
dunnart-specific
associated
expressed
only
relating
cranial
neural
crest
proliferation,
embryonic
myogenesis
epidermis
development.
Comparative
analyses
tissue
revealed
mechanosensory
system.
Accelerated
sensory
system
likely
relates
cues
received
by
nasal-oral
during
postnatal
journey
Together
suggest
may
be
driven
enhancer
activity.
Our
study
highlights
power
marsupial-placental
comparative
genomics
understanding
role
driving
temporal
shifts