Briefings in Bioinformatics,
Journal Year:
2024,
Volume and Issue:
26(1)
Published: Nov. 22, 2024
Bacteriophages
are
viruses
that
target
bacteria,
playing
a
crucial
role
in
microbial
ecology.
Phage
proteins
important
understanding
phage
biology,
such
as
virus
infection,
replication,
and
evolution.
Although
large
number
of
new
phages
have
been
identified
via
metagenomic
sequencing,
many
them
limited
protein
function
annotation.
Accurate
annotation
presents
several
challenges,
including
their
inherent
diversity
the
scarcity
annotated
ones.
Existing
tools
yet
to
fully
leverage
unique
properties
annotating
functions.
In
this
work,
we
propose
tool
for
by
leveraging
modular
genomic
structure
genomes.
By
employing
embeddings
from
latest
foundation
models
Transformer
capture
contextual
information
between
genomes,
GOPhage
surpasses
state-of-the-art
methods
diverged
with
uncommon
functions
6.78%
13.05%
improvement,
respectively.
can
annotate
lacking
homology
search
results,
which
is
critical
characterizing
rapidly
accumulating
We
demonstrate
utility
identifying
688
potential
holins
phages,
exhibit
high
structural
conservation
known
holins.
The
results
show
extend
our
newly
discovered
phages.
International Journal of Machine Learning and Cybernetics,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 15, 2025
The
field
of
catalysis
holds
paramount
importance
in
shaping
the
trajectory
sustainable
development,
prompting
intensive
research
efforts
to
leverage
artificial
intelligence
(AI)
catalyst
design.
Presently,
fine-tuning
open-source
large
language
models
(LLMs)
has
yielded
significant
breakthroughs
across
various
domains
such
as
biology
and
healthcare.
Drawing
inspiration
from
these
advancements,
we
introduce
CataLM
(Catalytic
Language
Model),
a
model
tailored
domain
electrocatalytic
materials.
Our
findings
demonstrate
that
exhibits
remarkable
potential
for
facilitating
human-AI
collaboration
knowledge
exploration
To
best
our
knowledge,
stands
pioneering
LLM
dedicated
domain,
offering
novel
avenues
discovery
development.
Frontiers in Bioengineering and Biotechnology,
Journal Year:
2025,
Volume and Issue:
13
Published: Jan. 21, 2025
Protein
function
prediction
is
crucial
in
several
key
areas
such
as
bioinformatics
and
drug
design.
With
the
rapid
progress
of
deep
learning
technology,
applying
protein
language
models
has
become
a
research
focus.
These
utilize
increasing
amount
large-scale
sequence
data
to
deeply
mine
its
intrinsic
semantic
information,
which
can
effectively
improve
accuracy
prediction.
This
review
comprehensively
combines
current
status
latest
It
provides
an
exhaustive
performance
comparison
with
traditional
methods.
Through
in-depth
analysis
experimental
results,
significant
advantages
enhancing
depth
tasks
are
fully
demonstrated.
Journal of Medicinal Chemistry,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 2, 2025
Target
identification
is
a
critical
stage
in
the
drug
discovery
pipeline.
Various
computational
methodologies
have
been
dedicated
to
enhancing
classification
performance
of
compound-target
interactions,
yet
significant
room
remains
for
improving
recommendation
performance.
To
address
this
challenge,
we
developed
TarIKGC,
tool
target
prioritization
that
leverages
semantics
enhanced
knowledge
graph
(KG)
completion.
This
method
harnesses
representation
learning
within
heterogeneous
compound-target-disease
network.
Specifically,
TarIKGC
combines
an
attention-based
aggregation
neural
network
with
multimodal
feature
extractor
simultaneously
learn
internal
semantic
features
from
biomedical
entities
and
topological
KG.
Furthermore,
KG
embedding
model
employed
identify
missing
relationships
among
compounds
targets.
In
silico
evaluations
highlighted
superior
repositioning
tasks.
addition,
successfully
identified
two
potential
cyclin-dependent
kinase
2
(CDK2)
inhibitors
novel
scaffolds
through
reverse
fishing.
Both
exhibited
antiproliferative
activities
across
multiple
therapeutic
indications
targeting
CDK2.
Computers in Biology and Medicine,
Journal Year:
2025,
Volume and Issue:
190, P. 110064 - 110064
Published: April 5, 2025
The
rapidly
advancing
field
of
artificial
intelligence
(AI)
has
transformed
numerous
scientific
domains,
including
biology,
where
a
vast
and
complex
volume
data
is
available
for
analysis.
This
paper
provides
comprehensive
overview
the
current
state
AI-driven
methodologies
in
genomics,
proteomics,
systems
biology.
We
discuss
how
machine
learning
algorithms,
particularly
deep
models,
have
enhanced
accuracy
efficiency
embedding
sequences,
motif
discovery,
prediction
gene
expression
protein
structure.
Additionally,
we
explore
integration
AI
analysis
biological
networks,
protein-protein
interaction
networks
multi-layered
networks.
By
leveraging
large-scale
data,
techniques
enabled
unprecedented
insights
into
processes
disease
mechanisms.
work
underlines
potential
applying
to
highlighting
applications
suggesting
directions
future
research
further
this
evolving
field.
Briefings in Bioinformatics,
Journal Year:
2024,
Volume and Issue:
25(4)
Published: May 23, 2024
Abstract
Sequence
database
searches
followed
by
homology-based
function
transfer
form
one
of
the
oldest
and
most
popular
approaches
for
predicting
protein
functions,
such
as
Gene
Ontology
(GO)
terms.
These
are
also
a
critical
component
in
state-of-the-art
machine
learning
deep
learning-based
predictors.
Although
sequence
search
tools
basis
prediction,
previous
studies
have
scarcely
explored
how
to
select
optimal
configure
their
parameters
achieve
best
prediction.
In
this
paper,
we
evaluate
effect
using
different
options
from
among
tools,
well
impacts
parameters,
on
When
GO
terms
large
benchmark
dataset,
found
that
BLASTp
MMseqs2
consistently
exceed
performance
other
including
DIAMOND—one
prediction—under
default
parameters.
However,
with
correct
parameter
settings,
DIAMOND
can
perform
comparably
Additionally,
developed
new
scoring
derive
prediction
homologous
hits
outperform
previously
proposed
functions.
findings
enable
improvement
almost
all
algorithms
few
easily
implementable
changes
homolog-based
component.
This
study
emphasizes
role
settings
should
an
important
contribution
development
future
algorithms.
NAR Genomics and Bioinformatics,
Journal Year:
2024,
Volume and Issue:
6(3)
Published: July 2, 2024
Previous
protein
function
predictors
primarily
make
predictions
from
amino
acid
sequences
instead
of
tertiary
structures
because
the
limited
number
experimentally
determined
and
unsatisfying
qualities
predicted
structures.
AlphaFold
recently
achieved
promising
performances
when
predicting
structures,
structure
database
(AlphaFold
DB)
is
fast-expanding.
Therefore,
we
aimed
to
develop
a
deep-learning
tool
that
specifically
trained
with
models
predict
GO
terms
models.
We
developed
an
advanced
learning
architecture
by
combining
geometric
vector
perceptron
graph
neural
networks
variant
transformer
decoder
layers
for
multi-label
classification.
PANDA-3D
predicts
gene
ontology
(GO)
embeddings
based
on
large
language
model.
Our
method
significantly
outperformed
state-of-the-art
was
either
or
comparable
several
other
language-model-based
methods
as
input.
tailored
models,
DB
currently
contains
over
200
million
(as
May
1st,
2023),
making
useful
can
accurately
annotate
functions
proteins.
be
freely
accessed
web
server
http://dna.cs.miami.edu/PANDA-3D/
repository
https://github.com/zwang-bioinformatics/PANDA-3D.
Medicine Advances,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 16, 2024
Knowledge-enhanced
machine
learning
can
be
conceptualized
as
a
fusion
of
clinical
knowledge
and
domain
expertise
extracted
from
traditional
decision
making
methods
alongside
powerful
architectures.
significantly
improves
current
in
terms
interpretability,
generalizability,
accuracy,
equity.
Clinical
(CDM)
is
process
that
healthcare
professionals
undertake
when
assessments
about
patients'
conditions
decisions
the
care
to
provide
[1,
2].
Traditional
CDM
founded
on
either
unconscious
intuition
or
conscious
inference
frameworks
with
well-defined
logic
[3].
Intuition,
defined
understanding
without
rationale,
integrates
tacit
pertinent
experience
developed
over
years
practice
automate
cognitive
processing
devoid
formalized
rules
[4,
5].
However,
its
nature
obscures
precise
identification
initiating
cues
logic,
limiting
application
[6].
Distinct
intuition,
possess
execution
steps,
predominantly
encompassing
hypothetico-deductive
model
(HDM)
[7]
pattern
recognition
(PRM)
[8].
The
HDM
involves
four
indispensable
steps:
cue
acquisition,
hypothesis
generation,
interpretation,
evaluation
[7].
Initially,
acquisition
systematically
collects
patient
medical
information
per
requirement
by
clinicians.
Subsequently,
multiple
preliminary
hypotheses
are
derived
retrieved
information,
enabling
clinicians
use
an
established
theory
based
gathered
propose
according
their
[9].
This
followed
which
discern
initial
accordingly
refine
these
[10].
culminates
evaluation,
whereby
corroborated
refuted
amassed
evidence.
Should
all
rejected,
another
round
will
commence.
Ng
et
al.
present
real-world
task
[11].
When
diagnosing
patients
acute
chest
pain,
gather
related
cardiovascular
risk
factors,
smoking
history,
recent
viral
infections,
other
relevant
information.
Based
this
they
generate
diagnostic
hypotheses,
such
coronary
syndrome,
myocarditis,
pericarditis,
pneumonia.
They
interpret
evaluate
refining
ruling
out
possibilities
through
further
investigations,
electrocardiograms,
complete
blood
counts,
radiographs.
In
contrast
analytical
HDM,
PRM
employs
nonanalytical
matching
new
cases
similar
patterns
stored
memory,
specifically
for
encountered
previously
documented
within
guidelines
[8,
10].
routine
encounters,
outpaces
at
exceptional
rate.
Notably,
instances
ambiguity,
approach
retains
superiority
more
effective
solution
Although
widely
implemented
some
them,
including
have
been
recognized
gold
standards
[12],
(ML),
demonstrated
Figure
1,
increasingly
adopted
handle
unprecedented
volume
generated
advanced
instruments
electronically
recorded
systems
[13].
abundance
data
poses
challenges
relying
manual
effort,
but
ML
techniques
hold
promise
because
enable
computers
automatically
learn
projection
functions
between
raw
targets
interest
explicit
instructions
human
experts
[14].
For
example,
support
vector
machine,
random
forest,
k-nearest
neighbor
used
diagnose
Alzheimer's
disease
[15],
breast
cancer
[16],
Parkinson's
[17],
respectively.
addition
conventional
techniques,
deep
learning,
specialized
subset
focusing
design
training
strategies
artificial
neural
networks,
has
emerged
state-of-the-art
various
tasks
owing
extensive
parameterization
intricate
capability
Comparison
method
(hypothetico-deductive
model)
versus
contemporary
(random
forest)
scholarly
publications
last
25
years.
Specific
numbers
were
systematic
inquiry
Google
Scholar
employing
search
"clinical
making"
conjunction
"hypothetico-deductive
model"
"random
forest"
August
21,
2024.
Though
purely
data-driven
superior
accuracy
[18],
exhibits
drawbacks
interpretability
complex
architectures
[19].
Interpretability
stands
pivotal
characteristic
rectify
potential
erroneous
endanger
lives.
To
address
challenge
augment
capabilities
ML,
researchers
collaborate
incorporate
into
methodologies
[20].
As
depicted
2,
integration,
termed
knowledge-enhanced
(KEML)
[21],
architectures,
both
models
approaches
[22].
Schematic
plot
depicting
classic
leading
toward
future
making.
foremost
advantage
KEML
ability
improve
[23].
explainable
intelligence
proposed
supplement
explanations
frequently
suffer
logical
inconsistencies
stemming
noise
datasets
limited
applicability
particular
cohorts
[24].
instance,
previous
study
showed
classifiers
pneumothorax
often
rely
irrelevant
regions
beyond
lesion
area
diagnosis,
resulting
overfitting
specific
sources
Integrating
occurrence
enhance
generalization
classifiers.
Another
illustrative
example
knowledge-guided
interpretable
prediction
method,
showcases
graphs
modeling
personalized
improving
extracting
crucial
graph
paths
prompts
ChatGPT
clinician-comprehensible
natural
language
addition,
leverages
external
further.
Dynamic
gated
recurrent
network
exemplifies
enrichment
representation
event
additional
adjacent
events
[25].
also
alleviates
biases
involvement
expertise.
Chen
summarized
mitigate
disparity
inequity
each
stage
life
cycle
[26].
Hence,
integration
not
only
amplifies
mitigates
biases,
ultimately
advancing
deployment
ML-driven
solutions
[27].
commentary,
we
evolution
elucidated
underlying
rationale
driving
paradigm
shift,
emphasizing
imperative
adapting
era
big
data.
While
embracing
represents
advancement
embodied
should
disregarded.
Therefore,
advocate
KEML,
novel
capitalizing
strengths
methodologies,
propel
level
fairness
[28].
Han
Yuan:
Conceptualization
(lead);
curation
formal
analysis
investigation
methodology
visualization
writing—original
draft
writing—review
&
editing
(lead).
None.
author
declares
no
conflicts
interest.
Not
applicable.
Data
sharing
applicable
article
was
analyzed
during
study.