bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Aug. 24, 2023
Long-read
RNA-seq
has
emerged
as
a
powerful
tool
for
transcript
discovery,
even
in
well-annotated
organisms.
However,
assessing
the
accuracy
of
different
methods
identifying
annotated
and
novel
transcripts
remains
challenge.
Here,
we
present
SQANTI-SIM,
versatile
utility
that
wraps
around
popular
long-read
simulators
to
allow
precise
management
novelty
based
on
structural
categories
defined
by
SQANTI3.
By
selectively
excluding
specific
from
reference
dataset,
SQANTI-SIM
effectively
emulates
scenarios
involving
unannotated
transcripts.
Furthermore,
provides
customizable
features
supports
simulation
additional
types
data,
representing
first
multi-omics
lrRNA-seq
field.
We
demonstrate
effectiveness
benchmarking
five
transcriptome
reconstruction
pipelines
using
simulated
data.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 23, 2024
Abstract
Resolving
the
transcriptomes
of
higher
eukaryotes
is
more
tangible
with
advent
long
read
sequencing,
which
greatly
facilitates
identification
new
transcripts
and
their
splicing
isoforms.
However,
computational
analysis
RNA
sequencing
data
remains
challenging
as
it
difficult
to
disentangle
technical
artifacts
from
bona
fide
biological
information.
To
address
this,
we
evaluated
performance
multiple
leading
transcriptome
assembly
algorithms
on
ability
accurately
reconstruct
transcript
We
specifically
focused
deep
nanopore
synthetic
spike-in
controls
(Sequins™
SIRVs)
across
different
chemistries,
including
cDNA
direct
protocols.
Our
systematic
comparative
benchmarking
exposes
strengths
limitations
surveyed
strategies.
also
highlight
conceptual
challenges
annotation
formalization
quality
metrics.
results
complement
similar
recent
endeavors,
helping
forge
a
path
towards
gold
standard
analytical
pipeline
for
assembly.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 16, 2024
Recently
developed
long–read
RNA
sequencing
technologies
promise
to
provide
a
more
accurate
and
comprehensive
view
of
transcriptomes
compared
short-read
sequencers,
primarily
due
their
capability
achieve
full–length
transcripts.
However,
realizing
this
potential
requires
computational
tools
tailored
process
long
reads,
which
exhibit
higher
error
rate
than
short
reads.
Existing
methods
for
assembling
quantifying
data
often
disagree
on
expressed
transcripts
abundance
levels,
leading
researchers
lack
confidence
in
the
produced
using
data.
One
approach
address
uncertainties
transcriptome
assembly
quantification
is
by
assigning
reads
transcripts,
enabling
detailed
characterization
transcript
support
at
read
level.
Here,
we
introduce
TranSigner,
versatile
tool
that
assigns
any
input
transcriptome.
TranSigner
consists
three
consecutive
modules
performing:
alignment
given
computation
compatibility
scores
based
positions,
execution
an
expectation–maximization
algorithm
probabilistically
assign
fractions
while
estimating
abundances.
Using
simulated
experimental
datasets
from
well
studied
organisms
—
Homo
Sapiens,
Arabidopsis
thaliana
Mus
musculus
demonstrate
achieves
accuracy
estimation
assignment
existing
tools.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 24, 2024
The
human
neural
retina
is
a
complex
tissue
with
abundant
alternative
splicing
and
more
than
10%
of
genetic
variants
linked
to
inherited
retinal
diseases
(IRDs)
alter
splicing.
Traditional
short-read
RNA-sequencing
methods
have
been
used
for
understanding
retina-specific
but
limitations
in
detailing
transcript
isoforms.
To
address
this,
we
generated
proteogenomic
atlas
that
combines
PacBio
long-read
data
mass
spectrometry
whole
genome
sequencing
three
healthy
samples.
We
identified
nearly
60,000
isoforms,
which
approximately
one-third
are
novel.
Additionally,
ten
novel
peptides
confirmed
For
instance,
IMPDH1
isoform
combination
known
exons
supported
by
peptide
evidence.
Our
research
underscores
the
potential
in-depth
tissue-specific
transcriptomic
analysis
enhance
our
grasp
underlying
available
via
EGA
identifier
EGAD50000000101,
ProteomeXchange
PXD045187,
accessible
through
UCSC
browser.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 26, 2024
Genomic
drivers
of
human-specific
neurological
traits
remain
largely
undiscovered.
Duplicated
genes
expanded
uniquely
in
the
human
lineage
likely
contributed
to
brain
evolution,
including
increased
complexity
synaptic
connections
between
neurons
and
dramatic
expansion
neocortex.
Discovering
duplicate
is
challenging
because
similarity
paralogs
makes
them
prone
sequence-assembly
errors.
To
mitigate
this
issue,
we
analyzed
a
complete
telomere-to-telomere
genome
sequence
(T2T-CHM13)
identified
213
duplicated
gene
families
containing
(>98%
identity).
Positing
that
important
universal
features
should
exist
with
at
least
one
copy
all
modern
humans
exhibit
expression
brain,
narrowed
on
362
across
thousands
ancestrally
diverse
genomes
present
transcriptomes.
Of
these,
38
co-express
modules
enriched
for
autism-associated
potentially
contribute
language
cognition.
We
13
are
fixed
among
show
convincing
patterns.
Using
long-read
DNA
sequencing
revealed
hidden
variation
200
ancestries,
uncovering
signatures
selection
not
previously
identified,
possible
balancing
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 13, 2024
ABSTRACT
A
key
parameter
in
the
experimental
design
of
RNA-seq
projects
is
choice
sequencing
depth.
Considering
a
limited
budget,
one
needs
to
find
tradeoff
between
number
samples
and
sensitivity
analysis,
particularly
concerning
lowly
expressed
genes.
While
previous
studies
have
proposed
lower
bound
for
comprehensive
analysis
differential
gene
expression,
alternative
splicing,
it
has
only
been
human
adipose
tissue.
However,
splicing
differs
across
tissues
conditions.
We
analyzed
publicly
available
newly
generated
deep-sequenced
paired-end
(between
150
>500
million
reads,
read
length
50-150
bp)
from
buffy
coat
cells
diverse
sets
tissues,
including
gluteal
subcutaneous
fat,
heart,
hypothalamus.
Our
results
show
that
depth
typically
used
published
cohorts
not
sufficient
comprehensively
capture
landscape
splicing.
This
motivates
use
deeper
or
long-read
technologies
future
studies.
Toward
this
goal,
we
offer
guidelines
choosing
suitable
GRAPHICAL
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 8, 2024
Abstract
Local
protein
synthesis
in
neurons
is
vital
for
synaptic
plasticity,
yet
the
regulatory
mechanisms,
particularly
cytoplasmic
polyadenylation,
are
not
fully
understood.
This
study
employed
nanopore
sequencing
to
examine
transcriptomic
responses
rat
hippocampi
during
vivo
long-term
potentiation
(LTP)
and
synaptoneurosomes
after
vitro
stimulation.
Our
long-read
dataset
allows
detailed
analysis
of
mRNA
3′-ends,
poly(A)
tail
lengths,
composition.
We
observed
dynamic
shifts
polyadenylation
site
preference
post-LTP
induction,
with
significant
lengthening
restricted
transcriptionally
induced
mRNAs.
Poly(A)
tails
these
genes
showed
increased
non-adenosine
abundance.
In
synaptoneurosomes,
chemical
stimulation
led
shortening
on
preexisting
mRNAs,
indicating
translation-induced
deadenylation.
Additionally,
we
discovered
a
group
neuronal
transcripts
abundant
residues.
These
semi-templated
derived
from
extremely
adenosine-rich
3′UTRs.
provides
comprehensive
overview
3′-end
dynamics
LTP,
offering
insights
into
post-transcriptional
regulation
activation.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 13, 2024
Abstract
Alternative
splicing
(AS)
is
a
key
layer
of
regulation
in
eukaryotic
gene
expression
that
investigated
all
areas
life
sciences.
Differences
AS
between
conditions
can
be
quantified
from
transcriptome-wide
short-read
RNA
sequencing
(RNA-Seq)
data
with
designated
computational
tools.
However,
not
RNA-Seq
are
equally
suited
for
analysis.
Here,
we
perform
an
exemplary
analysis
to
showcase
the
impact
library
characteristics
on
obtained
results.
Using
three
standard
ENCODE
datasets
widespread
changes,
modulate
read
length,
depth
and
number
replicates
compare
their
influence
detection,
quantification
classification
events
state-of-the-art
algorithm
MAJIQ.
We
find
longer
reads
higher
most
effective
measures
improve
sensitivity
precision
From
our
results,
provide
recommendation
how
best
choose
specifications
Frontiers in Genetics,
Journal Year:
2024,
Volume and Issue:
15
Published: Sept. 19, 2024
The
human
neural
retina
is
a
complex
tissue
with
abundant
alternative
splicing
and
more
than
10%
of
genetic
variants
linked
to
inherited
retinal
diseases
(IRDs)
alter
splicing.
Traditional
short-read
RNA-sequencing
methods
have
been
used
for
understanding
retina-specific
but
limitations
in
detailing
transcript
isoforms.
To
address
this,
we
generated
proteogenomic
atlas
that
combines
PacBio
long-read
data
mass
spectrometry
whole
genome
sequencing
three
healthy
samples.
We
identified
nearly
60,000
isoforms,
which
approximately
one-third
are
novel.
Additionally,
ten
novel
peptides
confirmed
For
instance,
IMPDH1
isoform
combination
known
exons
supported
by
peptide
evidence.
Our
research
underscores
the
potential
in-depth
tissue-specific
transcriptomic
analysis
enhance
our
grasp
underlying
available
via
EGA
identifier
EGAD50000000101,
ProteomeXchange
PXD045187,
accessible
through
UCSC
browser.