bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 20, 2024
Abstract
Binding
of
transcription
factors
(TFs)
at
gene
regulatory
elements
controls
cellular
epigenetic
state
and
expression.
Current
genome-wide
chromatin
profiling
approaches
have
inherently
limited
resolution,
complicating
assessment
TF
occupancy
co-occupancy,
especially
individual
alleles.
In
this
work,
we
introduce
Accessible
Chromatin
by
Cytosine
Editing
Site
Sequencing
with
ATAC-seq
(ACCESS-ATAC),
which
harnesses
a
double-stranded
DNA
cytosine
deaminase
(Ddd)
enzyme
to
stencil
binding
locations
within
accessible
regions.
We
optimize
bulk
single-cell
ACCESS-ATAC
protocols
develop
computational
methods
show
that
the
increased
resolution
compared
improves
accuracy
site
prediction.
use
perform
allelic
co-occupancy
imputation
for
64
TFs
each
in
HepG2
K562,
revealing
propensity
majority
co-occupy
nearby
motifs
oscillates
period
approximating
helical
turn
DNA.
Altogether,
expands
capabilities
epigenomic
profiling.
Nature,
Journal Year:
2025,
Volume and Issue:
637(8047), P. 965 - 973
Published: Jan. 8, 2025
Transcriptional
regulation,
which
involves
a
complex
interplay
between
regulatory
sequences
and
proteins,
directs
all
biological
processes.
Computational
models
of
transcription
lack
generalizability
to
accurately
extrapolate
unseen
cell
types
conditions.
Here
we
introduce
GET
(general
expression
transformer),
an
interpretable
foundation
model
designed
uncover
grammars
across
213
human
fetal
adult
types1,2.
Relying
exclusively
on
chromatin
accessibility
data
sequence
information,
achieves
experimental-level
accuracy
in
predicting
gene
even
previously
types3.
also
shows
remarkable
adaptability
new
sequencing
platforms
assays,
enabling
inference
broad
range
conditions,
uncovers
universal
cell-type-specific
factor
interaction
networks.
We
evaluated
its
performance
prediction
activity,
elements
regulators,
identification
physical
interactions
factors
found
that
it
outperforms
current
models4
lentivirus-based
massively
parallel
reporter
assay
readout5,6.
In
erythroblasts7,
identified
distal
(greater
than
1
Mbp)
regions
were
missed
by
previous
models,
and,
B
cells,
lymphocyte-specific
factor-transcription
explains
the
functional
significance
leukaemia
risk
predisposing
germline
mutation8-10.
sum,
provide
generalizable
accurate
for
together
with
catalogues
regulation
interactions,
type
specificity.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 25, 2024
Despite
extensive
mapping
of
cis-regulatory
elements
(cREs)
across
cellular
contexts
with
chromatin
accessibility
assays,
the
sequence
syntax
and
genetic
variants
that
regulate
transcription
factor
(TF)
binding
at
context-specific
cREs
remain
elusive.
We
introduce
ChromBPNet,
a
deep
learning
DNA
model
base-resolution
profiles
detects,
learns
deconvolves
assay-specific
enzyme
biases
from
regulatory
determinants
accessibility,
enabling
robust
discovery
compact
TF
motif
lexicons,
cooperative
precision
footprints
assays
sequencing
depths.
Extensive
benchmarks
show
despite
its
lightweight
design,
is
competitive
much
larger
contemporary
models
predicting
variant
effects
on
pioneer
reporter
activity
cell
ancestry,
while
providing
interpretation
disrupted
syntax.
ChromBPNet
also
helps
prioritize
interpret
influence
complex
traits
rare
diseases,
thereby
powerful
lens
to
decode
variation.
Proceedings of the National Academy of Sciences,
Journal Year:
2025,
Volume and Issue:
122(5)
Published: Jan. 27, 2025
Postdoctoral
training
is
a
career
stage
often
described
as
demanding
and
anxiety-laden
time
when
many
promising
PhDs
see
their
academic
dreams
slip
away
due
to
circumstances
beyond
control.
We
use
unique
dataset
of
publishing
...
NAR Genomics and Bioinformatics,
Journal Year:
2025,
Volume and Issue:
7(2)
Published: March 29, 2025
The
recent
expansion
of
single-cell
technologies
has
enabled
simultaneous
genome-wide
measurements
multiple
modalities
in
the
same
single
cell.
potential
to
jointly
profile
such
as
gene
expression,
chromatin
accessibility,
protein
epitopes,
or
histone
modifications
at
resolution
represents
a
compelling
opportunity
study
developmental
processes
layers
regulation.
Here,
we
present
Ocelli,
lightweight
Python
package
implemented
Ray
for
scalable
visualization
and
analysis
multimodal
data.
core
functionality
Ocelli
focuses
on
diffusion-based
modeling
biological
involving
cell
state
transitions.
addresses
common
tasks
data
analysis,
cells
low-dimensional
embedding
that
preserves
continuity
progression
cells,
identification
rare
transient
states,
integration
with
trajectory
inference
algorithms,
imputation
undetected
feature
counts.
Extensive
benchmarking
shows
outperforms
existing
methods
regarding
computational
time
quality
reconstructed
representation
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 26, 2024
Abstract
Cell
atlas
projects
curate
representative
datasets,
cell
types,
and
marker
genes
for
tissues
across
an
organism.
Despite
their
ubiquity,
rely
on
duplicated
manual
effort
to
annotate
types.
The
size
of
atlases
coupled
with
a
lack
data-compatible
tools
make
reprocessing
analysis
data
near-impossible.
To
overcome
these
challenges,
we
present
collection
data,
algorithms,
automate
cataloging
analyzing
types
in
organism,
demonstrate
its
utility
building
human
atlas.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 19, 2024
RNA
abundance
quantification
has
become
routine
and
affordable
thanks
to
high-throughput
“short-read”
technologies
that
provide
accurate
molecule
counts
at
the
gene
level.
Similarly
of
definitive
fulllength,
transcript
isoforms
remained
a
stubborn
challenge,
despite
its
obvious
biological
significance
across
wide
range
problems.
“Long-read”
sequencing
platforms
now
produce
data-types
can,
in
principle,
drive
isoform
quantification.
However
some
particulars
contemporary
long-read
datatypes,
together
with
complexity
genetic
variation,
present
bioinformatic
challenges.
We
show
here,
using
ONT
data,
fast
data
is
possible
it
improved
by
exome
capture.
To
perform
quantifications
we
developed
lr-kallisto,
which
adapts
kallisto
bulk
single-cell
RNA-seq
methods
for
technologies.
Nature Methods,
Journal Year:
2023,
Volume and Issue:
21(1), P. 32 - 36
Published: Dec. 4, 2023
Existing
approaches
to
scoring
single-nucleus
assay
for
transposase-accessible
chromatin
with
sequencing
(snATAC-seq)
feature
matrices
from
reads
are
inconsistent,
affecting
downstream
analyses
and
displaying
artifacts.
We
show
that,
even
sparse
single-cell
data,
quantitative
counts
informative
estimating
the
regulatory
state
of
a
cell,
which
calls
consistent
treatment.
propose
Paired-Insertion
Counting
as
uniform
method
snATAC-seq
characterization
provide
probability
model
inferring
latent
insertion
dynamics
count
matrices.