Research Square (Research Square),
Год журнала:
2023,
Номер
unknown
Опубликована: Сен. 29, 2023
Abstract
Long
extrachromosomal
circular
DNA
(leccDNA)
regulates
several
biological
processes
such
as
genomic
instability,
gene
amplification,
and
oncogenesis.
The
identification
of
leccDNA
holds
significant
importance
to
investigate
its
potential
associations
with
cancer,
autoimmune,
cardiovascular,
neurological
diseases.
In
addition,
understanding
these
can
provide
valuable
insights
about
disease
mechanisms
therapeutic
approaches.
Conventionally
,
wet
lab-based
methods
are
utilized
identify
leccDNA,
which
hindered
by
the
need
for
prior
knowledge,
resource-intensive
processes,
potentially
limiting
their
broader
applicability.
To
empower
process
across
multiple
species,
paper
in
hand
presents
very
first
computational
predictor.
proposed
iLEC-DNA
predictor
makes
use
SVM
classifier
along
sequence-derived
nucleotide
distribution
patterns
physico-chemical
properties-based
features.
study
introduces
a
set
12
benchmark
datasets
related
three
namely
Homo
sapiens
(HM),
Arabidopsis
Thaliana
(AT),
Saccharomyces
cerevisiae
(SC/YS).
It
performs
large-scale
experimentation
under
different
experimental
settings
using
more
than
140
baseline
predictors.
outperforms
predictors
diverse
producing
average
performance
values
80.699%,
61.45%
80.7%
terms
ACC,
MCC
AUC-ROC
all
datasets.
source
code
is
available
at
https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction.
facilitate
scientific
community,
web
application
https://sds_genetic_analysis.opendfki.de/iLEC_DNA//.
Nature Communications,
Год журнала:
2025,
Номер
16(1)
Опубликована: Янв. 24, 2025
Orphan
crops
are
important
sources
of
nutrition
in
developing
regions
and
many
tolerant
to
biotic
abiotic
stressors;
however,
modern
crop
improvement
technologies
have
not
been
widely
applied
orphan
due
the
lack
resources
available.
There
representatives
across
major
types
conservation
genes
between
these
related
species
can
be
used
improvement.
Machine
learning
(ML)
has
emerged
as
a
promising
tool
for
Transferring
knowledge
from
using
machine
improve
accuracy
efficiency
crops.
Here,
authors
review
transferring
breeding.
Journal of Translational Medicine,
Год журнала:
2025,
Номер
23(1)
Опубликована: Фев. 4, 2025
Abstract
The
revolutionary
CRISPR-Cas9
system
leverages
a
programmable
guide
RNA
(gRNA)
and
Cas9
proteins
to
precisely
cleave
problematic
regions
within
DNA
sequences.
This
groundbreaking
technology
holds
immense
potential
for
the
development
of
targeted
therapies
wide
range
diseases,
including
cancers,
genetic
disorders,
hereditary
diseases.
based
genome
editing
is
multi-step
process
such
as
designing
precise
gRNA,
selecting
appropriate
Cas
protein,
thoroughly
evaluating
both
on-target
off-target
activity
Cas9-gRNA
complex.
To
ensure
accuracy
effectiveness
system,
after
cleavage,
requires
careful
analysis
resultant
outcomes
indels
deletions.
Following
success
artificial
intelligence
(AI)
in
various
fields,
researchers
are
now
leveraging
AI
algorithms
catalyze
optimize
system.
achieve
this
goal
AI-driven
applications
being
integrated
into
each
step,
but
existing
predictors
have
limited
performance
many
steps
still
rely
on
expensive
time-consuming
wet-lab
experiments.
primary
reason
behind
low
gap
between
CRISPR
fields.
Effective
integration
demands
comprehensive
knowledge
domains.
paper
bridges
research.
It
offers
unique
platform
grasp
deep
understanding
biological
foundations
step
process.
Furthermore,
it
provides
details
80
available
system-related
datasets
that
can
be
utilized
develop
applications.
Within
landscape
process,
insights
representation
learning
methods,
machine
methods
trends,
values
50
predictive
pipelines.
In
context
classifiers/regressors,
thorough
pipelines
recommendations
more
robust
Frontiers in Medicine,
Год журнала:
2025,
Номер
12
Опубликована: Апрель 8, 2025
Deoxyribonucleic
acid
(DNA)
serves
as
fundamental
genetic
blueprint
that
governs
development,
functioning,
growth,
and
reproduction
of
all
living
organisms.
DNA
can
be
altered
through
germline
somatic
mutations.
Germline
mutations
underlie
hereditary
conditions,
while
induced
by
various
factors
including
environmental
influences,
chemicals,
lifestyle
choices,
errors
in
replication
repair
mechanisms
which
lead
to
cancer.
sequence
analysis
plays
a
pivotal
role
uncovering
the
intricate
information
embedded
within
an
organism's
understanding
modify
it.
This
helps
early
detection
diseases
design
targeted
therapies.
Traditional
wet-lab
experimental
traditional
methods
is
costly,
time-consuming,
prone
errors.
To
accelerate
large-scale
analysis,
researchers
are
developing
AI
applications
complement
methods.
These
approaches
help
generate
hypotheses,
prioritize
experiments,
interpret
results
identifying
patterns
large
genomic
datasets.
Effective
integration
with
validation
requires
scientists
understand
both
fields.
Considering
need
comprehensive
literature
bridges
gap
between
fields,
contributions
this
paper
manifold:
It
presents
diverse
range
tasks
methodologies.
equips
essential
biological
knowledge
44
distinct
aligns
these
3
AI-paradigms,
namely,
classification,
regression,
clustering.
streamlines
into
consolidating
36
databases
used
develop
benchmark
datasets
for
different
tasks.
ensure
performance
comparisons
new
existing
predictors,
it
provides
insights
140
related
word
embeddings
language
models
across
development
predictors
providing
survey
39
67
based
predictive
pipeline
values
well
top
performing
encoding-based
their
performances
Scientific Reports,
Год журнала:
2024,
Номер
14(1)
Опубликована: Апрель 24, 2024
Abstract
Long
extrachromosomal
circular
DNA
(leccDNA)
regulates
several
biological
processes
such
as
genomic
instability,
gene
amplification,
and
oncogenesis.
The
identification
of
leccDNA
holds
significant
importance
to
investigate
its
potential
associations
with
cancer,
autoimmune,
cardiovascular,
neurological
diseases.
In
addition,
understanding
these
can
provide
valuable
insights
about
disease
mechanisms
therapeutic
approaches.
Conventionally,
wet
lab-based
methods
are
utilized
identify
leccDNA,
which
hindered
by
the
need
for
prior
knowledge,
resource-intensive
processes,
potentially
limiting
their
broader
applicability.
To
empower
process
across
multiple
species,
paper
in
hand
presents
very
first
computational
predictor.
proposed
iLEC-DNA
predictor
makes
use
SVM
classifier
along
sequence-derived
nucleotide
distribution
patterns
physicochemical
properties-based
features.
study
introduces
a
set
12
benchmark
datasets
related
three
namely
Homo
sapiens
(HM),
Arabidopsis
Thaliana
(AT),
Saccharomyces
cerevisiae
(SC/YS).
It
performs
large-scale
experimentation
under
different
experimental
settings
using
predictor,
more
than
140
baseline
predictors,
858
encoder
ensembles.
outperforms
predictors
ensembles
diverse
producing
average
performance
values
81.09%,
62.2%
81.08%
terms
ACC,
MCC
AUC-ROC
all
datasets.
source
code
is
available
at
https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction
.
facilitate
scientific
community,
web
application
https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.
PeerJ,
Год журнала:
2023,
Номер
11, С. e16600 - e16600
Опубликована: Дек. 8, 2023
DNA
5-methylcytosine
(5mC)
is
widely
present
in
multicellular
eukaryotes,
which
plays
important
roles
various
developmental
and
physiological
processes
a
wide
range
of
human
diseases.
Thus,
it
essential
to
accurately
detect
the
5mC
sites.
Although
current
sequencing
technologies
can
map
genome-wide
sites,
these
experimental
methods
are
both
costly
time-consuming.
To
achieve
fast
accurate
prediction
we
propose
new
computational
approach,
BERT-5mC.
First,
pre-trained
domain-specific
BERT
(bidirectional
encoder
representations
from
transformers)
model
by
using
promoter
sequences
as
language
corpus
.
deep
two-way
representation
based
on
Transformer.
Second,
fine-tuned
training
dataset
build
model.
The
cross-validation
results
show
that
our
achieves
an
AUROC
0.966
higher
than
other
state-of-the-art
such
iPromoter-5mC,
5mC_Pred,
BiLSTM-5mC.
Furthermore,
was
evaluated
independent
test
set,
shows
also
methods.
Moreover,
analyzed
attention
weights
generated
identify
number
nucleotide
distributions
closely
associated
with
modifications.
facilitate
use
model,
built
webserver
be
freely
accessed
at:
http://5mc-pred.zhulab.org.cn
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Фев. 27, 2023
The
ability
to
deliver
genetic
cargo
human
cells
is
enabling
rapid
progress
in
molecular
medicine,
but
designing
this
for
precise
expression
specific
cell
types
a
major
challenge.
Expression
driven
by
regulatory
DNA
sequences
within
short
synthetic
promoters,
relatively
few
of
these
promoters
are
cell-type-specific.
design
cell-type-specific
using
model-based
optimization
would
be
impactful
research
and
therapeutic
applications.
However,
models
from
(promoter-driven
expression)
lacking
most
due
insufficient
training
data
those
types.
Although
there
many
large
datasets
both
endogenous
promoter-driven
other
types,
which
provide
information
that
could
used
transfer
learning,
strategies
remain
largely
unexplored
predicting
expression.
Here,
we
propose
variety
pretraining
tasks,
strategies,
model
architectures
modelling
To
thoroughly
evaluate
various
methods,
two
benchmarks
reflect
data-constrained
dataset
settings.
In
the
setting,
find
followed
learning
highly
effective,
improving
performance
24
−
27%.
leads
more
modest
gains,
up
2%.
We
also
best
architecture
when
scratch.
methods
identify
broadly
applicable
understudied
our
findings
will
guide
choice
suited
gene
delivery
applications
optimization.
Our
code
available
at
https://github.com/anikethjr/promoter_models
.
Frontiers in Genetics,
Год журнала:
2024,
Номер
15
Опубликована: Май 30, 2024
N4-acetylcysteine
(ac4C)
is
a
chemical
modification
in
mRNAs
that
alters
the
structure
and
function
of
mRNA
by
adding
an
acetyl
group
to
N4
position
cytosine.
Researchers
have
shown
ac4C
closely
associated
with
occurrence
development
various
cancers.
Therefore,
accurate
prediction
sites
on
human
crucial
for
revealing
its
role
diseases
developing
new
diagnostic
therapeutic
strategies.
However,
existing
deep
learning
models
still
limitations
accuracy
generalization
ability,
which
restrict
their
effectiveness
handling
complex
biological
sequence
data.
This
paper
introduces
learning-based
model,
STM-ac4C,
predicting
mRNA.
The
model
combines
advantages
selective
kernel
convolution,
temporal
convolutional
networks,
multi-head
self-attention
mechanisms
effectively
extract
integrate
multi-level
features
RNA
sequences,
thereby
achieving
high-precision
sites.
On
independent
test
dataset,
STM-ac4C
showed
improvements
1.81%,
3.5%,
0.37%
accuracy,
Matthews
correlation
coefficient,
area
under
curve,
respectively,
compared
state-of-the-art
technologies.
Moreover,
performance
additional
balanced
imbalanced
datasets
also
confirmed
model's
robustness
ability.
Various
experimental
results
indicate
outperforms
methods
predictive
performance.
In
summary,
excels
mRNA,
providing
powerful
tool
deeper
understanding
significance
modifications
cancer
treatment.
Additionally,
reveals
key
influence
through
region
impact
analysis,
offering
perspectives
future
research.
source
code
data
are
available
at
https://github.com/ymy12341/STM-ac4C.
Mathematical Biosciences & Engineering,
Год журнала:
2023,
Номер
21(1), С. 253 - 271
Опубликована: Янв. 1, 2023
<abstract>
<p>The
epigenetic
modification
of
DNA
N4-methylcytosine
(4mC)
is
vital
for
controlling
replication
and
expression.
It
crucial
to
pinpoint
4mC's
location
comprehend
its
role
in
physiological
pathological
processes.
However,
accurate
4mC
detection
difficult
achieve
due
technical
constraints.
In
this
paper,
we
propose
a
deep
learning-based
approach
4mCPred-GSIMP
predicting
sites
the
mouse
genome.
The
encodes
sequences
using
four
feature
encoding
methods
combines
multi-scale
convolution
improved
selective
kernel
adaptively
extract
fuse
features
from
different
scales,
thereby
improving
representation
optimization
effect.
addition,
also
use
convolutional
residual
connections,
global
response
normalization
pointwise
techniques
optimize
model.
On
independent
test
dataset,
shows
high
sensitivity,
specificity,
accuracy,
Matthews
correlation
coefficient
area
under
curve,
which
are
0.7812,
0.9312,
0.8562,
0.7207
0.9233,
respectively.
Various
experiments
demonstrate
that
outperforms
existing
prediction
tools.</p>
</abstract>
Research Square (Research Square),
Год журнала:
2023,
Номер
unknown
Опубликована: Сен. 29, 2023
Abstract
Long
extrachromosomal
circular
DNA
(leccDNA)
regulates
several
biological
processes
such
as
genomic
instability,
gene
amplification,
and
oncogenesis.
The
identification
of
leccDNA
holds
significant
importance
to
investigate
its
potential
associations
with
cancer,
autoimmune,
cardiovascular,
neurological
diseases.
In
addition,
understanding
these
can
provide
valuable
insights
about
disease
mechanisms
therapeutic
approaches.
Conventionally
,
wet
lab-based
methods
are
utilized
identify
leccDNA,
which
hindered
by
the
need
for
prior
knowledge,
resource-intensive
processes,
potentially
limiting
their
broader
applicability.
To
empower
process
across
multiple
species,
paper
in
hand
presents
very
first
computational
predictor.
proposed
iLEC-DNA
predictor
makes
use
SVM
classifier
along
sequence-derived
nucleotide
distribution
patterns
physico-chemical
properties-based
features.
study
introduces
a
set
12
benchmark
datasets
related
three
namely
Homo
sapiens
(HM),
Arabidopsis
Thaliana
(AT),
Saccharomyces
cerevisiae
(SC/YS).
It
performs
large-scale
experimentation
under
different
experimental
settings
using
more
than
140
baseline
predictors.
outperforms
predictors
diverse
producing
average
performance
values
80.699%,
61.45%
80.7%
terms
ACC,
MCC
AUC-ROC
all
datasets.
source
code
is
available
at
https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction.
facilitate
scientific
community,
web
application
https://sds_genetic_analysis.opendfki.de/iLEC_DNA//.