bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Авг. 24, 2023
Long-read
RNA-seq
has
emerged
as
a
powerful
tool
for
transcript
discovery,
even
in
well-annotated
organisms.
However,
assessing
the
accuracy
of
different
methods
identifying
annotated
and
novel
transcripts
remains
challenge.
Here,
we
present
SQANTI-SIM,
versatile
utility
that
wraps
around
popular
long-read
simulators
to
allow
precise
management
novelty
based
on
structural
categories
defined
by
SQANTI3.
By
selectively
excluding
specific
from
reference
dataset,
SQANTI-SIM
effectively
emulates
scenarios
involving
unannotated
transcripts.
Furthermore,
provides
customizable
features
supports
simulation
additional
types
data,
representing
first
multi-omics
lrRNA-seq
field.
We
demonstrate
effectiveness
benchmarking
five
transcriptome
reconstruction
pipelines
using
simulated
data.
Nature Methods,
Год журнала:
2024,
Номер
21(5), С. 793 - 797
Опубликована: Март 20, 2024
SQANTI3
is
a
tool
designed
for
the
quality
control,
curation
and
annotation
of
long-read
transcript
models
obtained
with
third-generation
sequencing
technologies.
Leveraging
its
framework,
calculates
descriptors
models,
junctions
ends.
With
this
information,
potential
artifacts
can
be
identified
replaced
reliable
sequences.
Furthermore,
integrated
functional
feature
enables
subsequent
iso-transcriptomics
analyses.
Biomolecules,
Год журнала:
2024,
Номер
14(5), С. 568 - 568
Опубликована: Май 10, 2024
The
understanding
of
the
human
genome
has
been
greatly
improved
by
advent
next-generation
sequencing
technologies
(NGS).
Despite
undeniable
advantages
responsible
for
their
widespread
diffusion,
these
methods
have
some
constraints,
mainly
related
to
short
read
length
and
need
PCR
amplification.
As
a
consequence,
long-read
sequencers,
called
third-generation
(TGS),
developed,
promising
overcome
NGS.
Starting
from
first
prototype,
TGS
progressively
ameliorated
its
chemistries
improving
both
base-calling
accuracy,
as
well
simultaneously
reducing
costs/base.
Based
on
premises,
is
showing
potential
in
many
fields,
including
analysis
difficult-to-sequence
genomic
regions,
structural
variations
detection,
RNA
expression
profiling,
DNA
methylation
study,
metagenomic
analyses.
Protocol
standardization
development
easy-to-use
pipelines
data
will
enhance
use,
also
opening
way
routine
applications
diagnostic
contexts.
Nature Biotechnology,
Год журнала:
2024,
Номер
unknown
Опубликована: Май 22, 2024
Abstract
Determining
whether
the
RNA
isoforms
from
medically
relevant
genes
have
distinct
functions
could
facilitate
direct
targeting
of
for
disease
treatment.
Here,
as
a
step
toward
this
goal
neurological
diseases,
we
sequenced
12
postmortem,
aged
human
frontal
cortices
(6
Alzheimer
cases
and
6
controls;
50%
female)
using
one
Oxford
Nanopore
PromethION
flow
cell
per
sample.
We
identified
1,917
expressing
multiple
in
cortex
where
1,018
had
with
different
protein-coding
sequences.
Of
these
genes,
57
are
implicated
brain-related
diseases
including
major
depression,
schizophrenia,
Parkinson’s
disease.
Our
study
also
uncovered
53
new
several
isoform
was
most
highly
expressed
that
gene.
reported
on
five
mitochondrially
encoded,
spliced
isoforms.
found
99
differentially
between
controls.
Nature Communications,
Год журнала:
2024,
Номер
15(1)
Опубликована: Май 10, 2024
Abstract
The
advancement
of
Long-Read
Sequencing
(LRS)
techniques
has
significantly
increased
the
length
sequencing
to
several
kilobases,
thereby
facilitating
identification
alternative
splicing
events
and
isoform
expressions.
Recently,
numerous
computational
tools
for
detection
using
long-read
data
have
been
developed.
Nevertheless,
there
remains
a
deficiency
in
comparative
studies
that
systemically
evaluate
performance
these
tools,
which
are
implemented
with
different
algorithms,
under
various
simulations
encompass
potential
influencing
factors.
In
this
study,
we
conducted
benchmark
analysis
thirteen
methods
nine
capable
identifying
structures
from
RNA-seq
data.
We
evaluated
their
performances
simulated
data,
represented
diverse
platforms
generated
by
an
in-house
simulator,
RNA
sequins
(sequencing
spike-ins)
as
well
experimental
Our
findings
demonstrate
IsoQuant
highly
effective
tool
LRS,
Bambu
StringTie2
also
exhibiting
strong
performance.
These
results
offer
valuable
guidance
future
research
on
ongoing
improvement
LRS
Nature Communications,
Год журнала:
2024,
Номер
15(1)
Опубликована: Май 28, 2024
Abstract
Alternative
splicing
events
are
a
major
causal
mechanism
for
complex
traits,
but
they
have
been
understudied
due
to
the
limitation
of
short-read
sequencing.
Here,
we
generate
full-length
isoform
annotation
human
immune
cells
from
an
individual
by
long-read
sequencing
29
cell
subsets.
This
contains
number
unannotated
transcripts
and
isoforms
such
as
read-through
transcript
TOMM40-APOE
in
Alzheimer’s
disease
locus.
We
profile
characteristics
show
that
repetitive
elements
significantly
explain
diversity
isoforms,
providing
insight
into
genome
evolution.
In
addition,
some
expressed
cell-type
specific
manner,
whose
alternative
3’-UTRs
usage
contributes
their
specificity.
Further,
identify
disease-associated
switch
analysis
integration
several
quantitative
trait
loci
analyses
with
genome-wide
association
study
data.
Our
findings
will
promote
elucidation
diseases
via
splicing.
Human Molecular Genetics,
Год журнала:
2022,
Номер
31(R1), С. R123 - R136
Опубликована: Авг. 12, 2022
Abstract
Aberrant
splicing
underlies
many
human
diseases,
including
cancer,
cardiovascular
diseases
and
neurological
disorders.
Genome-wide
mapping
of
quantitative
trait
loci
(sQTLs)
has
shown
that
genetic
regulation
alternative
is
widespread.
However,
identification
the
corresponding
isoform
or
protein
products
associated
with
disease-associated
sQTLs
challenging
short-read
RNA-seq,
which
cannot
precisely
characterize
full-length
transcript
isoforms.
Furthermore,
contemporary
sQTL
interpretation
often
relies
on
reference
annotations,
are
incomplete.
Solutions
to
these
issues
may
be
found
through
integration
newly
emerging
long-read
sequencing
technologies.
Long-read
offers
capability
sequence
mRNA
transcripts
and,
in
some
cases,
link
isoforms
containing
disease-relevant
alterations.
Here,
we
provide
an
overview
approaches,
use
effects
isoforms,
linkage
RNA
protein-level
functions
comment
future
directions
field.
Based
recent
progress,
promises
part
disease
genetics
toolkit
discover
treat
causing
rare
complex
diseases.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июль 19, 2024
RNA
abundance
quantification
has
become
routine
and
affordable
thanks
to
high-throughput
“short-read”
technologies
that
provide
accurate
molecule
counts
at
the
gene
level.
Similarly
of
definitive
fulllength,
transcript
isoforms
remained
a
stubborn
challenge,
despite
its
obvious
biological
significance
across
wide
range
problems.
“Long-read”
sequencing
platforms
now
produce
data-types
can,
in
principle,
drive
isoform
quantification.
However
some
particulars
contemporary
long-read
datatypes,
together
with
complexity
genetic
variation,
present
bioinformatic
challenges.
We
show
here,
using
ONT
data,
fast
data
is
possible
it
improved
by
exome
capture.
To
perform
quantifications
we
developed
lr-kallisto,
which
adapts
kallisto
bulk
single-cell
RNA-seq
methods
for
technologies.
Long-read
RNA
sequencing
has
emerged
as
a
powerful
tool
for
transcript
discovery,
even
in
well-annotated
organisms.
However,
assessing
the
accuracy
of
different
methods
identifying
annotated
and
novel
transcripts
remains
challenge.
Here,
we
present
SQANTI-SIM,
versatile
that
wraps
around
popular
long-read
simulators
to
allow
precise
management
novelty
based
on
structural
categories
defined
by
SQANTI3.
By
selectively
excluding
specific
from
reference
dataset,
SQANTI-SIM
effectively
emulates
scenarios
involving
unannotated
transcripts.
Furthermore,
provides
customizable
features
supports
simulation
additional
types
data,
representing
first
multi-omics
lrRNA-seq
field.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Март 15, 2024
Abstract
Single-cell
RNA
sequencing
is
used
in
profiling
gene
expression
differences
between
cells.
Short-read
platforms
provide
high
throughput
and
high-quality
information
at
the
gene-level,
but
technique
hindered
by
limited
read
length,
failing
providing
an
understanding
of
cell
heterogeneity
isoform
level.
This
gap
has
recently
been
addressed
long-read
that
opportunity
to
preserve
full-length
transcript
during
sequencing.
To
objectively
evaluate
obtained
from
both
methods,
we
sequenced
four
samples
patient-derived
organoid
cells
clear
renal
carcinoma
one
healthy
sample
kidney
on
Illumina
Novaseq
6000
PacBio
Sequel
IIe.
For
for
each
sample,
cDNA
was
derived
same
10x
Genomics
3’
single-cell
library.
Here
present
technical
characteristics
datasets
compare
metrics
gene-level
information.
We
show
two
methods
largely
overlap
results
also
identify
sources
variability
which
a
set
advantages
disadvantages
methods.