Simulation of nanopore sequencing signal data with tunable parameters
Genome Research,
Год журнала:
2024,
Номер
34(5), С. 778 - 783
Опубликована: Май 1, 2024
In
silico
simulation
of
high-throughput
sequencing
data
is
a
technique
used
widely
in
the
genomics
field.
However,
there
currently
lack
effective
tools
for
creating
simulated
from
nanopore
devices,
which
measure
DNA
or
RNA
molecules
form
time-series
current
signal
data.
Here,
we
introduce
Squigulator,
fast
and
simple
tool
realistic
Squigulator
takes
reference
genome,
transcriptome,
read
sequences,
generates
corresponding
raw
This
compatible
with
basecalling
software
Oxford
Nanopore
Technologies
(ONT)
other
third-party
tools,
thereby
providing
useful
substrate
development,
testing,
debugging,
validation,
optimization
at
every
stage
analysis
workflow.
The
user
may
generate
preset
parameters
emulating
specific
ONT
protocols
noise-free
“ideal”
data,
they
deterministically
modify
range
experimental
variables
and/or
noise
to
shape
their
needs.
We
present
brief
example
Squigulator's
use,
model
degree
different
impact
accuracy
downstream
variant
detection.
reveals
new
insights
into
nature
algorithms.
provide
as
an
open-source
community.
Язык: Английский
AsaruSim: a single-cell and spatial RNA-Seq Nanopore long-reads simulation workflow
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Сен. 24, 2024
Abstract
Motivation
The
combination
of
long-read
sequencing
technologies
like
Oxford
Nanopore
with
single-cell
RNA
(scRNAseq)
assays
enables
the
detailed
exploration
transcriptomic
complexity,
including
isoform
detection
and
quantification,
by
capturing
full-length
cDNAs.
However,
challenges
remain,
lack
advanced
simulation
tools
that
can
effectively
mimic
unique
complexities
scRNAseq
datasets.
Such
are
essential
for
evaluation
optimization
methods
dedicated
to
long
read
studies.
Results
We
developed
AsaruSim,
a
workflow
simulates
synthetic
datasets,
closely
mimicking
real
experimental
data.
AsaruSim
employs
multi-step
process
includes
creation
UMI
count
matrix,
generation
perfect
reads,
optional
PCR
amplification,
introduction
errors,
comprehensive
quality
control
reporting.
Applied
dataset
human
peripheral
blood
mononuclear
cells
(PBMCs),
accurately
reproduced
characteristics.
Availability
implementation
source
code
full
documentation
available
at:
https://github.com/GenomiqueENS/AsaruSim
.
Data
availability
1,090
Human
PBMCs
matrix
cell
type
annotation
files
accessible
on
zenodo
under
DOI:
10.5281/zenodo.12731408.
Язык: Английский