Nucleic Acids Research,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 6, 2024
In
single-cell
and
single-nucleus
RNA
sequencing
(RNA-seq),
the
coexistence
of
nascent
(unprocessed)
mature
(processed)
messenger
(mRNA)
poses
challenges
in
accurate
read
mapping
interpretation
count
matrices.
The
traditional
transcriptome
reference,
defining
"region
interest"
bulk
RNA-seq,
restricts
its
focus
to
mRNA
transcripts.
This
restriction
leads
two
problems:
reads
originating
outside
are
prone
mismapping
within
this
region,
additionally,
such
external
cannot
be
matched
specific
transcript
targets.
Expanding
encompass
both
targets
provides
a
more
comprehensive
framework
for
RNA-seq
analysis.
Here,
we
introduce
concept
distinguishing
flanking
k-mers
(DFKs)
improve
reads.
We
have
developed
an
algorithm
identify
DFKs,
which
serve
as
sophisticated
"background
filter",
enhancing
accuracy
quantification.
dual
strategy
expanded
region
interest
coupled
with
use
DFKs
enhances
precision
quantifying
molecules,
well
delineating
ambiguous
status.
PLoS Computational Biology,
Journal Year:
2022,
Volume and Issue:
18(9), P. e1010492 - e1010492
Published: Sept. 12, 2022
We
perform
a
thorough
analysis
of
RNA
velocity
methods,
with
view
towards
understanding
the
suitability
various
assumptions
underlying
popular
implementations.
In
addition
to
providing
self-contained
exposition
mathematics,
we
undertake
simulations
and
controlled
experiments
on
biological
datasets
assess
workflow
sensitivity
parameter
choices
biology.
Finally,
argue
for
more
rigorous
approach
velocity,
present
framework
Markovian
that
points
directions
improvement
mitigation
current
problems.
Nature Communications,
Journal Year:
2022,
Volume and Issue:
13(1)
Published: Dec. 9, 2022
The
question
of
how
cell-to-cell
differences
in
transcription
rate
affect
RNA
count
distributions
is
fundamental
for
understanding
biological
processes
underlying
transcription.
Answering
this
requires
quantitative
models
that
are
both
interpretable
(describing
concrete
biophysical
phenomena)
and
tractable
(amenable
to
mathematical
analysis).
This
enables
the
identification
experiments
which
best
discriminate
between
competing
hypotheses.
As
a
proof
principle,
we
introduce
simple
but
flexible
class
involving
continuous
stochastic
driving
discrete
splicing
process,
compare
contrast
two
biologically
plausible
hypotheses
about
variation.
One
assumes
variation
due
DNA
experiencing
mechanical
strain,
while
other
it
regulator
number
fluctuations.
We
framework
numerically
analytically
studying
such
models,
apply
Bayesian
model
selection
identify
candidate
genes
show
signatures
each
single-cell
transcriptomic
data
from
mouse
glutamatergic
neurons.
PLoS ONE,
Journal Year:
2023,
Volume and Issue:
18(5), P. e0285674 - e0285674
Published: May 11, 2023
Metabarcoding
is
a
powerful
molecular
tool
for
simultaneously
surveying
hundreds
to
thousands
of
species
from
single
sample,
underpinning
microbiome
and
environmental
DNA
(eDNA)
methods.
Deriving
quantitative
estimates
underlying
biological
communities
metabarcoding
critical
enhancing
the
utility
such
approaches
health
conservation.
Recent
work
has
demonstrated
that
correcting
amplification
biases
in
genetic
data
can
yield
template
concentrations.
However,
major
source
uncertainty
stems
non-detections
across
technical
PCR
replicates
where
one
replicate
fails
detect
observed
other
replicates.
Such
are
special
case
variability
among
data.
While
many
sampling
processes
underlie
variation
data,
understanding
causes
an
important
step
distinguishing
signal
noise
studies.
Here,
we
use
both
simulated
empirical
1)
suggest
how
may
arise
2)
outline
steps
recognize
uninformative
practice,
3)
identify
conditions
under
which
amplicon
sequence
reliably
signals.
We
show
with
simulations
that,
given
species,
rate
function
concentration
species-specific
efficiency.
Consequently,
conclude
datasets
strongly
affected
by
(1)
deterministic
during
(2)
stochastic
amplicons
sequencing-both
model-but
also
(3)
rare
molecules
prior
PCR,
remains
frontier
metabarcoding.
Our
results
highlight
importance
estimating
efficiencies
critically
evaluating
patterns
non-detection
better
distinguish
inherent
detections
targets.
Genome Research,
Journal Year:
2024,
Volume and Issue:
34(2), P. 179 - 188
Published: Feb. 1, 2024
A
mechanistic
understanding
of
the
biological
and
technical
factors
that
impact
transcript
measurements
is
essential
to
designing
analyzing
single-cell
single-nucleus
RNA
sequencing
experiments.
Nuclei
contain
same
pre-mRNA
population
as
cells,
but
they
a
small
subset
mRNAs.
Nonetheless,
early
studies
argued
analysis
yielded
results
comparable
cellular
samples
if
were
included.
However,
typical
workflows
do
not
distinguish
between
mRNA
when
estimating
gene
expression,
variation
in
their
relative
abundances
across
cell
types
has
received
limited
attention.
These
gaps
are
especially
important
given
incorporating
become
commonplace
for
both
assays,
despite
known
length
bias
capture.
Here,
we
reanalyze
public
data
sets
from
mouse
human
describe
mechanisms
contrasting
effects
sampling
on
expression
marker
selection
RNA-seq.
We
show
levels
vary
considerably
among
types,
which
mediates
degree
limits
generalizability
recently
published
normalization
method
intended
correct
this
bias.
As
an
alternative,
repurpose
existing
post
hoc
length–based
correction
conventional
RNA-seq
set
enrichment
analysis.
Finally,
inclusion
bioinformatic
processing
can
impart
larger
effect
than
assay
choice
itself,
pivotal
effective
reuse
data.
analyses
advance
our
sources
experiments
provide
useful
guidance
future
studies.
Proceedings of the National Academy of Sciences,
Journal Year:
2024,
Volume and Issue:
121(18)
Published: April 26, 2024
RNA
velocity
estimation
is
a
potentially
powerful
tool
to
reveal
the
directionality
of
transcriptional
changes
in
single-cell
RNA-sequencing
data,
but
it
lacks
accuracy,
absent
advanced
metabolic
labeling
techniques.
We
developed
an
approach,
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Jan. 14, 2023
Abstract
We
motivate
and
present
biVI
,
which
combines
the
variational
autoencoder
framework
of
scVI
with
biophysically
motivated,
bivariate
models
for
nascent
mature
RNA
distributions.
While
previous
approaches
to
integrate
bimodal
data
via
ignore
causal
relationship
between
measurements,
biophysical
processes
that
give
rise
observations.
demonstrate
through
simulated
benchmarking
captures
cell
type
structure
in
a
low-dimensional
space
accurately
recapitulates
parameter
values
copy
number
On
biological
data,
provides
scalable
route
identifying
mechanisms
underlying
gene
expression.
This
analytical
approach
outlines
generalizable
strateg
treating
multimodal
datasets
generated
by
high-throughput,
single-cell
genomic
assays.
Research Square (Research Square),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 8, 2025
Abstract
Background
Machine
learning
(ML)
models
can
automate
cell
annotation
and
reduce
human
bias.
However,
it
remains
unclear
which
ML
model
best
suits
the
characteristics
of
single-cell
RNA
sequencing
data
whether
a
trained
be
applied
to
transcriptomes
collected
from
nuclei
rather
than
whole
cells.
This
study
evaluates
performance
eight
selected
for
in
(scRNA-seq)
vs
single-nucleus
(snRNA-seq)
datasets,
focusing
on
their
ability
generalize
across
datasets
with
varying
populations
transcriptome
isolation
techniques.
Results
In
first
part,
we
use
two
publicly
available
scRNA-seq
Peripheral
Blood
Mononuclear
Cells
(PBMC3K
PBMC10K)
assess
each
type
classification
within
datasets.
XGBoost
achieved
high
accuracy
(95.4%-95.8%),
precision,
F1-scores,
outperforming
simpler
like
Logistic
Regression
Naive
Bayes.
Ensemble
methods
Random
Forest
demonstrated
strong
precision
recall.
Elastic
Net
nearly
as
good
generalizability
achieving
(94.7%-95.1%).
second
investigated
impact
techniques
(single-cell
vs.
RNA-seq)
using
cardiomyocyte
differentiation
(GSE129096).
Although
excelled
(accuracy
F1-scores
>
95%),
declined
notably
data,
suggesting
inherent
transcriptomic
differences
capacity.
Notably,
all
struggled
classifying
intermediate-stage
cells,
highlighting
challenges
distinguishing
transitional
populations,
such
cardiac
progenitors
that
retain
stem
markers
while
showing
expression
differentiated
markers.
Conclusion
classify
cells
origination
both
snRNA-seq.
tree-based
penalized
elastic
regression
superior
diverse
emphasizing
importance
selection
robust
annotation.
These
findings
underscore
need
tailored
computational
approaches
when
working
heterogeneous
data.
PLoS Computational Biology,
Journal Year:
2025,
Volume and Issue:
21(1), P. e1012752 - e1012752
Published: Jan. 21, 2025
Single-cell
transcriptomics
experiments
provide
gene
expression
snapshots
of
heterogeneous
cell
populations
across
states.
These
have
been
used
to
infer
trajectories
and
dynamic
information
even
without
intensive,
time-series
data
by
ordering
cells
according
similarity.
However,
while
single-cell
sometimes
offer
valuable
insights
into
processes,
current
methods
for
are
limited
descriptive
notions
“pseudotime”
that
lack
intrinsic
physical
meaning.
Instead
pseudotime,
we
propose
inference
“process
time”
via
a
principled
modeling
approach
formulating
inferring
latent
variables
corresponding
timing
subject
biophysical
process.
Our
implementation
this
approach,
called
Chronocell,
provides
formulation
built
on
state
transitions.
The
Chronocell
model
is
identifiable,
making
parameter
meaningful.
Furthermore,
can
interpolate
between
trajectory
inference,
when
states
lie
continuum,
clustering,
cluster
discrete
By
using
variety
datasets
ranging
from
cluster-like
continuous,
show
enables
us
assess
the
suitability
reveals
distinct
cellular
distributions
along
process
time
consistent
with
biological
times.
We
also
compare
our
estimates
degradation
rates
those
derived
metabolic
labeling
datasets,
thereby
showcasing
utility
Chronocell.
Nevertheless,
based
performance
characterization
simulations,
find
be
challenging,
highlighting
importance
dataset
quality
careful
assessment.