GigaScience,
Journal Year:
2022,
Volume and Issue:
12
Published: Dec. 28, 2022
Abstract
Transformer-based
language
models
are
successfully
used
to
address
massive
text-related
tasks.
DNA
methylation
is
an
important
epigenetic
mechanism,
and
its
analysis
provides
valuable
insights
into
gene
regulation
biomarker
identification.
Several
deep
learning–based
methods
have
been
proposed
identify
methylation,
each
seeks
strike
a
balance
between
computational
effort
accuracy.
Here,
we
introduce
MuLan-Methyl,
learning
framework
for
predicting
sites,
which
based
on
5
popular
transformer-based
models.
The
identifies
sites
3
different
types
of
methylation:
N6-adenine,
N4-cytosine,
5-hydroxymethylcytosine.
Each
the
employed
adapted
task
using
“pretrain
fine-tune”
paradigm.
Pretraining
performed
custom
corpus
fragments
taxonomy
lineages
self-supervised
learning.
Fine-tuning
aims
at
status
type.
collectively
predict
status.
We
report
excellent
performance
MuLan-Methyl
benchmark
dataset.
Moreover,
argue
that
model
captures
characteristic
differences
species
relevant
methylation.
This
work
demonstrates
can
be
applications
in
biological
sequence
joint
utilization
improves
performance.
Mulan-Methyl
open
source,
provide
web
server
implements
approach.
Nucleic Acids Research,
Journal Year:
2023,
Volume and Issue:
51(7), P. 3017 - 3029
Published: Feb. 17, 2023
Abstract
Here,
we
present
DeepBIO,
the
first-of-its-kind
automated
and
interpretable
deep-learning
platform
for
high-throughput
biological
sequence
functional
analysis.
DeepBIO
is
a
one-stop-shop
web
service
that
enables
researchers
to
develop
new
architectures
answer
any
question.
Specifically,
given
data,
supports
total
of
42
state-of-the-art
algorithms
model
training,
comparison,
optimization
evaluation
in
fully
pipeline.
provides
comprehensive
result
visualization
analysis
predictive
models
covering
several
aspects,
such
as
interpretability,
feature
sequential
region
discovery.
Additionally,
nine
base-level
annotation
tasks
using
architectures,
with
interpretations
graphical
visualizations
validate
reliability
annotated
sites.
Empowered
by
high-performance
computers,
allows
ultra-fast
prediction
up
million-scale
data
few
hours,
demonstrating
its
usability
real
application
scenarios.
Case
study
results
show
an
accurate,
robust
prediction,
power
deep
learning
Overall,
expect
ensure
reproducibility
analysis,
lessen
programming
hardware
burden
biologists
provide
meaningful
insights
at
both
level
base
from
sequences
alone.
publicly
available
https://inner.wei-group.net/DeepBIO.
Genome biology,
Journal Year:
2022,
Volume and Issue:
23(1)
Published: Oct. 17, 2022
Abstract
In
this
study,
we
propose
iDNA-ABF,
a
multi-scale
deep
biological
language
learning
model
that
enables
the
interpretable
prediction
of
DNA
methylations
based
on
genomic
sequences
only.
Benchmarking
comparisons
show
our
iDNA-ABF
outperforms
state-of-the-art
methods
for
different
methylation
predictions.
Importantly,
power
in
capturing
both
sequential
and
functional
semantics
information
from
background
genomes.
Moreover,
by
integrating
analysis
mechanism,
well
explain
what
learns,
helping
us
build
mapping
discovery
important
determinants
to
in-depth
their
functions.
BMC Biology,
Journal Year:
2024,
Volume and Issue:
22(1)
Published: Jan. 2, 2024
Abstract
Intrinsically
disordered
proteins
and
regions
(IDPs/IDRs)
are
functionally
important
that
exist
as
highly
dynamic
conformations
under
natural
physiological
conditions.
IDPs/IDRs
exhibit
a
broad
range
of
molecular
functions,
their
functions
involve
binding
interactions
with
partners
remaining
native
structural
flexibility.
The
rapid
increase
in
the
number
sequence
databases
diversity
challenge
existing
computational
methods
for
predicting
protein
intrinsic
disorder
functions.
A
region
interacts
different
to
perform
multiple
these
dependencies
correlations.
In
this
study,
we
introduce
DisoFLAG,
method
leverages
graph-based
interaction
language
model
(GiPLM)
jointly
its
potential
GiPLM
integrates
semantic
information
based
on
pre-trained
models
into
units
enhance
correlation
representation
DisoFLAG
predictor
takes
amino
acid
sequences
only
inputs
provides
predictions
six
proteins,
including
protein-binding,
DNA-binding,
RNA-binding,
ion-binding,
lipid-binding,
flexible
linker.
We
evaluated
predictive
performance
following
Critical
Assessment
Intrinsic
Disorder
(CAID)
experiments,
results
demonstrated
offers
accurate
comprehensive
extending
current
coverage
computationally
predicted
function
categories.
standalone
package
web
server
have
been
established
provide
prediction
tools
disorders
associated
Frontiers in Genetics,
Journal Year:
2024,
Volume and Issue:
15
Published: April 16, 2024
Introduction:
DNA
methylation
is
a
critical
epigenetic
modification
involving
the
addition
of
methyl
group
to
molecule,
playing
key
role
in
regulating
gene
expression
without
changing
sequence.
The
main
difficulty
identifying
sites
lies
subtle
and
complex
nature
patterns,
which
may
vary
across
different
tissues,
developmental
stages,
environmental
conditions.
Traditional
methods
for
site
identification,
such
as
bisulfite
sequencing,
are
typically
labor-intensive,
costly,
require
large
amounts
DNA,
hindering
high-throughput
analysis.
Moreover,
these
not
always
provide
resolution
needed
detect
at
specific
sites,
especially
genomic
regions
that
rich
repetitive
sequences
or
have
low
levels
methylation.
Furthermore,
current
deep
learning
approaches
generally
lack
sufficient
accuracy.
Methods:
This
study
introduces
iDNA-OpenPrompt
model,
leveraging
novel
OpenPrompt
framework.
model
combines
prompt
template,
verbalizer,
Pre-trained
Language
Model
(PLM)
construct
prompt-learning
framework
sequences.
vocabulary
library,
BERT
tokenizer,
label
words
also
introduced
into
enable
accurate
identification
sites.
Results
Discussion:
An
extensive
analysis
conducted
evaluate
predictive,
reliability,
consistency
capabilities
model.
experimental
outcomes,
covering
17
benchmark
datasets
include
various
species
three
modifications
(4mC,
5hmC,
6mA),
consistently
indicate
our
surpasses
outstanding
performance
robustness
approaches.
BMC Biology,
Journal Year:
2024,
Volume and Issue:
22(1)
Published: April 19, 2024
Abstract
Background
The
blood–brain
barrier
serves
as
a
critical
interface
between
the
bloodstream
and
brain
tissue,
mainly
composed
of
pericytes,
neurons,
endothelial
cells,
tightly
connected
basal
membranes.
It
plays
pivotal
role
in
safeguarding
from
harmful
substances,
thus
protecting
integrity
nervous
system
preserving
overall
homeostasis.
However,
this
remarkable
selective
transmission
also
poses
formidable
challenge
realm
central
diseases
treatment,
hindering
delivery
large-molecule
drugs
into
brain.
In
response
to
challenge,
many
researchers
have
devoted
themselves
developing
drug
systems
capable
breaching
barrier.
Among
these,
penetrating
peptides
emerged
promising
candidates.
These
had
advantages
high
biosafety,
ease
synthesis,
exceptional
penetration
efficiency,
making
them
an
effective
solution.
While
previous
studies
developed
few
prediction
models
for
peptides,
their
performance
has
often
been
hampered
by
issue
limited
positive
data.
Results
study,
we
present
Augur,
novel
model
using
borderline-SMOTE-based
data
augmentation
machine
learning.
extract
highly
interpretable
physicochemical
properties
while
solving
issues
small
sample
size
imbalance
negative
samples.
Experimental
results
demonstrate
superior
Augur
with
AUC
value
0.932
on
training
set
0.931
independent
test
set.
Conclusions
This
newly
demonstrates
predicting
offering
valuable
insights
development
targeting
neurological
disorders.
breakthrough
may
enhance
efficiency
peptide-based
discovery
pave
way
innovative
treatment
strategies
diseases.
BMC Biology,
Journal Year:
2024,
Volume and Issue:
22(1)
Published: Jan. 29, 2024
Abstract
Background
Circular
RNAs
(circRNAs)
have
been
confirmed
to
play
a
vital
role
in
the
occurrence
and
development
of
diseases.
Exploring
relationship
between
circRNAs
diseases
is
far-reaching
significance
for
studying
etiopathogenesis
treating
To
this
end,
based
on
graph
Markov
neural
network
algorithm
(GMNN)
constructed
our
previous
work
GMNN2CD,
we
further
considered
multisource
biological
data
that
affects
association
circRNA
disease
developed
an
updated
web
server
CircDA
human
hepatocellular
carcinoma
(HCC)
tissue
verify
prediction
results
CircDA.
Results
built
Tumarkov-based
deep
learning
framework.
The
regards
biomolecules
as
nodes
interactions
molecules
edges,
reasonably
abstracts
multiomics
data,
models
them
heterogeneous
biomolecular
network,
which
can
reflect
complex
different
biomolecules.
Case
studies
using
literature
from
HCC,
cervical,
gastric
cancers
demonstrate
predictor
identify
missing
associations
known
diseases,
quantitative
real-time
PCR
(RT-qPCR)
experiment
HCC
samples,
it
was
found
five
were
significantly
differentially
expressed,
proved
predict
related
new
circRNAs.
Conclusions
This
efficient
computational
case
analysis
with
sufficient
feedback
allows
us
circRNA-associated
disease-associated
Our
provides
method
provide
guidance
certain
For
ease
use,
online
(
http://server.malab.cn/CircDA
)
provided,
code
open-sourced
https://github.com/nmt315320/CircDA.git
convenience
improvement.