Scientific Reports,
Год журнала:
2024,
Номер
14(1)
Опубликована: Ноя. 7, 2024
Viral
oncoproteins
play
crucial
roles
in
transforming
normal
cells
into
cancer
cells,
representing
a
significant
factor
the
etiology
of
various
cancers.
Traditionally,
identifying
these
is
both
time-consuming
and
costly.
With
advancements
computational
biology,
bioinformatics
tools
based
on
machine
learning
have
emerged
as
effective
methods
for
predicting
biological
activities.
Here,
first
time,
we
propose
an
innovative
approach
that
combines
Generative
Adversarial
Networks
(GANs)
with
supervised
to
enhance
accuracy
generalizability
viral
oncoprotein
prediction.
Our
methodology
evaluated
multiple
models,
including
Random
Forest,
Multilayer
Perceptron,
Light
Gradient
Boosting
Machine,
eXtreme
Boosting,
Support
Vector
Machine.
In
ten-fold
cross-validation
our
training
dataset,
GAN-enhanced
Forest
model
demonstrated
superior
performance
metrics:
0.976
accuracy,
F1
score,
0.977
precision,
sensitivity,
1.0
AUC.
During
independent
testing,
this
achieved
0.982
These
results
establish
new
tool,
VirOncoTarget,
accessible
via
web
application.
We
anticipate
VirOncoTarget
will
be
valuable
resource
researchers,
enabling
rapid
reliable
prediction
advancing
understanding
their
role
biology.
Abstract
In
recent
years,
the
rapid
growth
of
biological
data
has
increased
interest
in
using
bioinformatics
to
analyze
and
interpret
this
data.
Proteomics,
which
studies
structure,
function,
interactions
proteins,
is
a
crucial
area
bioinformatics.
Using
natural
language
processing
(NLP)
techniques
proteomics
an
emerging
field
that
combines
machine
learning
text
mining
Recently,
transformer‐based
NLP
models
have
gained
significant
attention
for
their
ability
process
variable‐length
input
sequences
parallel,
self‐attention
mechanisms
capture
long‐range
dependencies.
review
paper,
we
discuss
advancements
proteome
examine
advantages,
limitations,
potential
applications
improve
accuracy
efficiency
various
tasks.
Additionally,
highlight
challenges
future
directions
these
research.
Overall,
provides
valuable
insights
into
revolutionize
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Авг. 14, 2023
Abstract
Toxicity
emerges
as
a
prominent
challenge
in
the
design
of
therapeutic
peptides,
causing
failure
numerous
peptides
during
clinical
trials.
In
2013,
our
group
developed
ToxinPred,
computational
method
that
has
been
extensively
adopted
by
scientific
community
for
predicting
peptide
toxicity.
this
paper,
we
propose
refined
variant
ToxinPred
showcases
improved
reliability
and
accuracy
Initially,
used
BLAST
alignment-based
toxicity
prediction,
yet
coverage
was
limited.
We
motif-based
approach
with
MERCI
software
to
identify
unique
toxic
patterns.
Despite
specificity
gains,
sensitivity
compromised.
alignment-free
methods
using
machine/deep
learning,
achieving
balance
prediction.
A
deep
learning
model
(ANN
–
LSTM
fixed
sequence
length)
one-hot
encoding
attained
0.93
AUROC
0.71
MCC
on
independent
data.
The
machine
(extra
tree)
compositional
features
achieved
0.95
0.78
MCC.
Lastly,
hybrid
or
ensemble
combining
two
more
models
enhance
performance.
Hybrid
approaches,
including
0.98
0.81
Evaluation
data
demonstrated
method’s
superiority.
To
cater
needs
community,
have
standalone
software,
pip
package
web-based
server
ToxinPred3
(
https://github.com/raghavagps/toxinpred3
https://webs.iiitd.edu.in/raghava/toxinpred3/
)
.
Author’s
Biography
Anand
Singh
Rathore
is
currently
pursuing
Ph.D.
Computational
Biology
at
Department
Biology,
Indraprastha
Institute
Information
Technology,
New
Delhi,
India.
Akanksha
Arora
Shubham
Choudhury
Purava
Tijare
Project
Fellow
Gajendra
P.
S.
Raghava
working
Professor
Head
Highlights
Implementation
alignment
similarly
based
techniques
peptides.
Discovery
toxicity-associated
patterns
identification
regions
Development
learning-based
Ensemble
combine
methods.
Web
screening
peptides/proteins.
NAR Genomics and Bioinformatics,
Год журнала:
2025,
Номер
7(1)
Опубликована: Янв. 7, 2025
Abstract
The
spatial
conformation
of
chromosomes
and
genomes
single
cells
is
relevant
to
cellular
function
useful
for
elucidating
the
mechanism
underlying
gene
expression
genome
methylation.
chromosomal
contacts
(i.e.
regions
in
proximity)
entailing
three-dimensional
(3D)
structure
a
cell
can
be
obtained
by
single-cell
chromosome
capture
techniques,
such
as
Hi-C
(ScHi-C).
However,
due
sparsity
ScHi-C
data,
it
still
challenging
traditional
3D
optimization
methods
reconstruct
structures
from
data.
Here,
we
present
machine
learning-based
method
based
on
novel
SO(3)-equivariant
graph
neural
network
(HiCEGNN)
HiCEGNN
consistently
outperforms
both
only
other
deep
learning
across
diverse
cells,
different
structural
resolutions,
noise
levels
Moreover,
robust
against
Abstract
This
comprehensive
review
aims
to
clarify
the
growing
impact
of
Transformer‐based
models
in
fields
neuroscience,
neurology,
and
psychiatry.
Originally
developed
as
a
solution
for
analyzing
sequential
data,
Transformer
architecture
has
evolved
effectively
capture
complex
spatiotemporal
relationships
long‐range
dependencies
that
are
common
biomedical
data.
Its
adaptability
effectiveness
deciphering
intricate
patterns
within
medical
studies
have
established
it
key
tool
advancing
our
understanding
neural
functions
disorders,
representing
significant
departure
from
traditional
computational
methods.
The
begins
by
introducing
structure
principles
architectures.
It
then
explores
their
applicability,
ranging
disease
diagnosis
prognosis
evaluation
cognitive
processes
decoding.
specific
design
modifications
tailored
these
applications
subsequent
on
performance
also
discussed.
We
conclude
providing
assessment
recent
advancements,
prevailing
challenges,
future
directions,
highlighting
shift
neuroscientific
research
clinical
practice
towards
an
artificial
intelligence‐centric
paradigm,
particularly
given
prominence
most
successful
large
pre‐trained
models.
serves
informative
reference
researchers,
clinicians,
professionals
who
interested
harnessing
transformative
potential
Genes,
Год журнала:
2025,
Номер
16(4), С. 411 - 411
Опубликована: Март 31, 2025
Background/Objectives:
Genomic
prediction
is
a
powerful
approach
that
predicts
phenotypic
traits
from
genotypic
information,
enabling
the
acceleration
of
trait
improvement
in
plant
breeding.
Traditional
genomic
methods
have
primarily
relied
on
linear
mixed
models,
such
as
Best
Linear
Unbiased
Prediction
(GBLUP),
and
conventional
machine
learning
like
Support
Vector
Regression
(SVR).
are
limited
handling
high-dimensional
data
nonlinear
relationships.
Thus,
deep
also
been
applied
to
recent
years.
Methods:
We
proposed
iADEP,
Integrated
Additive,
Dominant,
Epistatic
model
based
learning.
Specifically,
single
nucleotide
polymorphism
(SNP)
integrating
latent
genetic
interactions
genome-wide
association
study
results
biological
prior
knowledge
fused
an
SNP
embedding
block,
which
then
input
local
encoder.
The
encoder
with
omic-data-incorporated
global
decoder
through
multi-head
attention
mechanism,
followed
by
multilayer
perceptrons.
Results:
Firstly,
we
demonstrated
experiments
four
datasets
iADEP
outperforms
existing
genotype-to-phenotype
prediction.
Secondly,
validated
effectiveness
ablation
experiments.
Third,
provided
available
module
for
combining
other
omics
propose
novel
method
fusing
them.
Fourthly,
explored
impact
feature
selection
performance
conclude
utilizing
full
set
SNPs
generally
provides
optimal
results.
Finally,
altering
partition
training
testing
sets,
investigated
differences
between
transductive
inductive
Conclusions:
new
AI
breeding,
promising
integrates
enables
combination
data.
Briefings in Bioinformatics,
Год журнала:
2025,
Номер
26(2)
Опубликована: Март 1, 2025
Abstract
Non-coding
RNAs
(ncRNAs)
play
crucial
roles
in
drug
resistance
and
sensitivity,
making
them
important
biomarkers
therapeutic
targets.
However,
predicting
ncRNA-drug
associations
is
challenging
due
to
issues
such
as
dataset
imbalance
sparsity,
limiting
the
identification
of
robust
biomarkers.
Existing
models
often
fall
short
capturing
local
global
sequence
information,
reliability
predictions.
This
study
introduces
DMGAT
(diffusion
map
heterogeneous
graph
attention
network),
a
novel
deep
learning
model
designed
predict
associations.
integrates
diffusion
maps
for
embedding,
convolutional
networks
feature
extraction,
GAT
information
fusion.
To
address
imbalance,
incorporates
sensitivity
employs
random
forest
classifier
select
reliable
negative
samples.
embeds
ncRNA
sequences
SMILES
using
word2vec
technique,
information.
The
constructs
network
by
combining
similarity
Gaussian
Interaction
Profile
kernel
similarity,
providing
comprehensive
representation
interactions.
Evaluated
through
five-fold
cross-validation
on
curated
from
NoncoRNA
ncDR,
outperforms
seven
state-of-the-art
methods,
achieving
highest
area
under
receiver
operating
characteristic
curve
(0.8964),
precision-recall
(0.8984),
recall
(0.9576),
F1-score
(0.8285).
raw
data
are
released
Zenodo
with
identifier
13929676.
source
code
available
at
https://github.com/liutingyu0616/DMGAT/tree/main.