Genes,
Год журнала:
2024,
Номер
15(12), С. 1547 - 1547
Опубликована: Ноя. 29, 2024
Background/Objectives:
Transcriptome
assembly
and
functional
annotation
are
essential
in
understanding
gene
expression
biological
function.
Nevertheless,
many
existing
pipelines
lack
the
flexibility
to
integrate
both
short-
long-read
sequencing
data
or
fail
provide
a
complete,
customizable
workflow
for
transcriptome
analysis,
particularly
non-model
organisms.
Methods:
We
present
TrAnnoScope,
analysis
pipeline
designed
process
Illumina
short-read
PacBio
data.
The
provides
generate
high-quality,
full-length
(FL)
transcripts
with
broad
annotation.
Its
modular
design
allows
users
adapt
specific
steps
other
platforms
types.
encompasses
from
quality
control
annotation,
employing
tools
established
databases
such
as
SwissProt,
Pfam,
Gene
Ontology
(GO),
Kyoto
Encyclopedia
of
Genes
Genomes
(KEGG),
Eukaryotic
Orthologous
Groups
(KOG).
As
case
study,
TrAnnoScope
was
applied
RNA-Seq
Iso-Seq
zebra
finch
brain,
ovary,
testis
tissue.
Results:
generated
by
tissue
demonstrated
strong
alignment
reference
genome
(99.63%),
it
found
that
93.95%
matched
protein
sequences
proteome
were
captured
nearly
complete.
Functional
provided
matches
known
assigned
relevant
terms
majority
transcripts.
Conclusions:
successfully
integrates
short
long
technologies
transcriptomes
minimal
user
input.
modularity
ease
use
make
valuable
tool
researchers
analyzing
complex
datasets,
Nature Methods,
Год журнала:
2024,
Номер
21(7), С. 1349 - 1363
Опубликована: Июнь 7, 2024
Abstract
The
Long-read
RNA-Seq
Genome
Annotation
Assessment
Project
Consortium
was
formed
to
evaluate
the
effectiveness
of
long-read
approaches
for
transcriptome
analysis.
Using
different
protocols
and
sequencing
platforms,
consortium
generated
over
427
million
sequences
from
complementary
DNA
direct
RNA
datasets,
encompassing
human,
mouse
manatee
species.
Developers
utilized
these
data
address
challenges
in
transcript
isoform
detection,
quantification
de
novo
detection.
study
revealed
that
libraries
with
longer,
more
accurate
produce
transcripts
than
those
increased
read
depth,
whereas
greater
depth
improved
accuracy.
In
well-annotated
genomes,
tools
based
on
reference
demonstrated
best
performance.
Incorporating
additional
orthogonal
replicate
samples
is
advised
when
aiming
detect
rare
novel
or
using
reference-free
approaches.
This
collaborative
offers
a
benchmark
current
practices
provides
direction
future
method
development
PyHMMER
provides
Python
integration
of
the
popular
profile
Hidden
Markov
Model
software
HMMER
via
Cython
bindings.
This
allows
annotation
protein
sequences
with
HMMs
and
building
new
ones
directly
Python.
increases
flexibility
use,
allowing
creating
queries
from
code,
launching
searches,
obtaining
results
without
I/O,
or
accessing
previously
unavailable
statistics
like
uncorrected
P-values.
A
parallelization
model
greatly
improves
performance
when
running
multithreaded
while
producing
exact
same
as
HMMER.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Июль 27, 2023
Abstract
The
Long-read
RNA-Seq
Genome
Annotation
Assessment
Project
(LRGASP)
Consortium
was
formed
to
evaluate
the
effectiveness
of
long-read
approaches
for
transcriptome
analysis.
consortium
generated
over
427
million
sequences
from
cDNA
and
direct
RNA
datasets,
encompassing
human,
mouse,
manatee
species,
using
different
protocols
sequencing
platforms.
These
data
were
utilized
by
developers
address
challenges
in
transcript
isoform
detection
quantification,
as
well
de
novo
identification.
study
revealed
that
libraries
with
longer,
more
accurate
produce
transcripts
than
those
increased
read
depth,
whereas
greater
depth
improved
quantification
accuracy.
In
well-annotated
genomes,
tools
based
on
reference
demonstrated
best
performance.
When
aiming
detect
rare
novel
or
when
reference-free
approaches,
incorporating
additional
orthogonal
replicate
samples
are
advised.
This
collaborative
offers
a
benchmark
current
practices
provides
direction
future
method
development
Nature Communications,
Год журнала:
2024,
Номер
15(1)
Опубликована: Май 10, 2024
Abstract
The
advancement
of
Long-Read
Sequencing
(LRS)
techniques
has
significantly
increased
the
length
sequencing
to
several
kilobases,
thereby
facilitating
identification
alternative
splicing
events
and
isoform
expressions.
Recently,
numerous
computational
tools
for
detection
using
long-read
data
have
been
developed.
Nevertheless,
there
remains
a
deficiency
in
comparative
studies
that
systemically
evaluate
performance
these
tools,
which
are
implemented
with
different
algorithms,
under
various
simulations
encompass
potential
influencing
factors.
In
this
study,
we
conducted
benchmark
analysis
thirteen
methods
nine
capable
identifying
structures
from
RNA-seq
data.
We
evaluated
their
performances
simulated
data,
represented
diverse
platforms
generated
by
an
in-house
simulator,
RNA
sequins
(sequencing
spike-ins)
as
well
experimental
Our
findings
demonstrate
IsoQuant
highly
effective
tool
LRS,
Bambu
StringTie2
also
exhibiting
strong
performance.
These
results
offer
valuable
guidance
future
research
on
ongoing
improvement
LRS
In
this
manuscript,
we
introduce
and
benchmark
Mandalorion
v4.1
for
the
identification
quantification
of
full-length
transcriptome
sequencing
reads.
It
further
improves
upon
already
strong
performance
v3.6
used
in
LRGASP
consortium
challenge.
By
processing
real
simulated
data,
show
three
main
features
Mandalorion:
first,
Mandalorion-based
isoform
has
very
high
precision
maintains
recall
even
absence
any
genome
annotation.
Second,
read
counts
as
quantified
by
a
correlation
with
counts.
Third,
isoforms
identified
closely
reflect
data
sets
they
are
based
on.
Transcription,
Год журнала:
2023,
Номер
14(3-5), С. 92 - 104
Опубликована: Июнь 14, 2023
The
profiling
of
gene
expression
patterns
to
glean
biological
insights
from
single
cells
has
become
commonplace
over
the
last
few
years.
However,
this
approach
overlooks
transcript
contents
that
can
differ
between
individual
and
cell
populations.
In
review,
we
describe
early
work
in
field
single-cell
short-read
sequencing
as
well
full-length
isoforms
cells.
We
then
recent
long-read
wherein
some
elements
have
been
observed
tandem.
Based
on
earlier
bulk
tissue,
motivate
study
combination
other
RNA
variables.
Given
are
still
blind
aspects
isoform
biology,
suggest
possible
future
avenues
such
CRISPR
screens
which
further
illuminate
function
variables
distinct
Genome Research,
Год журнала:
2024,
Номер
34(11), С. 1719 - 1734
Опубликована: Ноя. 1, 2024
Long-read
sequencing
(LRS)
technologies
have
the
potential
to
revolutionize
scientific
discoveries
in
RNA
biology
through
comprehensive
identification
and
quantification
of
full-length
mRNA
isoforms.
Despite
great
promise,
challenges
remain
widespread
implementation
LRS
for
RNA-based
applications,
including
concerns
about
low
coverage,
high
error,
robust
computational
pipelines.
Although
much
focus
has
been
placed
on
defining
exon
composition
structure
with
data,
less
careful
characterization
done
ability
assess
terminal
ends
isoforms,
specifically,
transcription
start
end
sites.
Such
is
crucial
completely
delineating
full
molecules
regulatory
consequences.
However,
there
are
substantial
inconsistencies
both
coordinates
reads
spanning
a
gene,
such
that
often
fail
accurately
recapitulate
annotated
or
empirically
derived
molecules.
Here,
we
describe
specific
identifying
quantifying
how
these
issues
influence
biological
interpretations
data.
We
then
review
recent
experimental
advances
designed
alleviate
problems,
ideal
use
cases
each
approach.
Finally,
outline
anticipated
developments
necessary
improvements
from
Journal of Molecular Neuroscience,
Год журнала:
2025,
Номер
75(1)
Опубликована: Март 6, 2025
Recent
improvements
in
the
accuracy
of
long-read
sequencing
(LRS)
technologies
have
expanded
scope
for
novel
transcriptional
isoform
discovery.
Additionally,
these
advancements
improved
precision
transcript
quantification,
enabling
a
more
accurate
reconstruction
complex
splicing
patterns
and
transcriptomes.
Thus,
this
project
aims
to
take
advantage
analytical
developments
discovery
analysis
RNA
isoforms
human
brain.
A
set
was
compiled
using
three
bioinformatic
tools,
quantifying
their
expression
across
eight
replicates
cerebellar
hemisphere,
five
frontal
cortex,
six
putamen.
By
taking
subset
consistent
all
methods,
170
highly
confident
curated
downstream
analysis.
This
consisted
104
messenger
RNAs
(mRNAs)
66
long
non-coding
(lncRNAs)
isoforms.
The
detailed
structure,
expression,
potential
encoded
proteins
mRNA
BambuTx321
been
further
described
as
an
exemplary
representative.
tissue-specific
[mean
counts
per
million
(CPM)
5.979]
lncRNA,
BambuTx1299,
hemisphere
observed.
Overall,
has
identified
annotated
several
diverse
tissues
brain,
providing
insights
into
investigating
functional
roles.
contributed
comprehensive
understanding
brain's
transcriptomic
landscape
applications
basic
research.