bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 24, 2024
Several
peptide-based
drugs
fail
in
clinical
trials
due
to
their
toxicity
or
hemolytic
activity
against
red
blood
cells
(RBCs).
Existing
methods
predict
peptides
but
not
the
concentration
(HC50)
required
lyse
50%
of
RBCs.
In
this
study,
we
developed
a
classification
model
and
regression
identify
quantify
peptides.
Our
models
were
trained
validated
on
1924
with
experimentally
determined
HC50
mammalian
Analysis
indicates
that
hydrophobic
positively
charged
residues
associated
higher
activity.
achieved
maximum
AUC
0.909
using
hybrid
ESM-2
motif-based
approach.
Regression
compositional
features
R
0.739
R2
0.543.
outperform
existing
are
implemented
web-based
platform
HemoPI2
standalone
software
for
designing
desired
values
(http://webs.iiitd.edu.in/raghava/hemopi2/).
Journal of Medical Internet Research,
Journal Year:
2024,
Volume and Issue:
26, P. e59505 - e59505
Published: Aug. 20, 2024
In
the
complex
and
multidimensional
field
of
medicine,
multimodal
data
are
prevalent
crucial
for
informed
clinical
decisions.
Multimodal
span
a
broad
spectrum
types,
including
medical
images
(eg,
MRI
CT
scans),
time-series
sensor
from
wearable
devices
electronic
health
records),
audio
recordings
heart
respiratory
sounds
patient
interviews),
text
notes
research
articles),
videos
surgical
procedures),
omics
genomics
proteomics).
While
advancements
in
large
language
models
(LLMs)
have
enabled
new
applications
knowledge
retrieval
processing
field,
most
LLMs
remain
limited
to
unimodal
data,
typically
text-based
content,
often
overlook
importance
integrating
diverse
modalities
encountered
practice.
This
paper
aims
present
detailed,
practical,
solution-oriented
perspective
on
use
(M-LLMs)
field.
Our
investigation
spanned
M-LLM
foundational
principles,
current
potential
applications,
technical
ethical
challenges,
future
directions.
By
connecting
these
elements,
we
aimed
provide
comprehensive
framework
that
links
aspects
M-LLMs,
offering
unified
vision
their
care.
approach
guide
both
practical
implementations
M-LLMs
care,
positioning
them
as
paradigm
shift
toward
integrated,
data–driven
We
anticipate
this
work
will
spark
further
discussion
inspire
development
innovative
approaches
next
generation
systems.
ACS Nano,
Journal Year:
2024,
Volume and Issue:
18(28), P. 18101 - 18117
Published: July 1, 2024
Raman
spectroscopy
has
made
significant
progress
in
biosensing
and
clinical
research.
Here,
we
describe
how
surface-enhanced
(SERS)
assisted
with
machine
learning
(ML)
can
expand
its
capabilities
to
enable
interpretable
insights
into
the
transcriptome,
proteome,
metabolome
at
single-cell
level.
We
first
review
advances
nanophotonics-including
plasmonics,
metamaterials,
metasurfaces-enhance
scattering
for
rapid,
strong
label-free
spectroscopy.
then
discuss
ML
approaches
precise
spectral
analysis,
including
neural
networks,
perturbation
gradient
algorithms,
transfer
learning.
provide
illustrative
examples
of
phenotyping
using
nanophotonics
ML,
bacterial
antibiotic
susceptibility
predictions,
stem
cell
expression
profiles,
cancer
diagnostics,
immunotherapy
efficacy
toxicity
predictions.
Lastly,
exciting
prospects
future
spectroscopy,
instrumentation,
self-driving
laboratories,
data
banks,
uncovering
biological
insights.
Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
In
recent
years,
inspired
by
the
success
of
large
language
models
(LLMs)
for
DNA
and
proteins,
several
LLMs
RNA
have
also
been
developed.
These
take
massive
datasets
as
inputs
learn,
in
a
self-supervised
way,
how
to
represent
each
base
with
semantically
rich
numerical
vector.
This
is
done
under
hypothesis
that
obtaining
high-quality
representations
can
enhance
data-costly
downstream
tasks,
such
fundamental
secondary
structure
prediction
problem.
However,
existing
RNA-LLM
not
evaluated
this
task
unified
experimental
setup.
Since
they
are
pretrained
models,
assessment
their
generalization
capabilities
on
new
structures
crucial
aspect.
Nonetheless,
has
just
partially
addressed
literature.
work
we
present
comprehensive
comparative
analysis
recently
proposed.
We
evaluate
use
these
common
deep
learning
architecture.
The
were
assessed
increasing
difficulty
benchmark
datasets.
Results
showed
two
clearly
outperform
other
revealed
significant
challenges
low-homology
scenarios.
Moreover,
study
provide
curated
complexity
setup
scientific
endeavor.
Source
code
available
repository:
https://github.com/sinc-lab/rna-llm-folding/.
Environmental Evidence,
Journal Year:
2025,
Volume and Issue:
14(1)
Published: April 15, 2025
Abstract
Systematic
reviews
(SRs)
in
environmental
science
is
challenging
due
to
diverse
methodologies,
terminologies,
and
study
designs
across
disciplines.
A
major
limitation
that
inconsistent
application
of
eligibility
criteria
evidence-screening
affects
the
reproducibility
transparency
SRs.
To
explore
potential
role
Artificial
Intelligence
(AI)
applying
criteria,
we
developed
evaluated
an
AI-assisted
framework
using
a
case
SR
on
relationship
between
stream
fecal
coliform
concentrations
land
use
cover
(LULC).
The
incorporates
publications
from
hydrology,
ecology,
public
health,
landscape,
urban
planning,
reflecting
interdisciplinary
nature
research.
We
fine-tuned
ChatGPT-3.5
Turbo
model
with
expert-reviewed
training
data
for
title,
abstract,
full-text
screening
120
articles.
AI
demonstrated
substantial
agreement
at
title/abstract
review
moderate
expert
reviewers
maintained
internal
consistency,
suggesting
its
structured
assistance.
findings
provide
consistently,
improving
evidence
efficiency,
reducing
labor
costs,
informing
large
language
models
(LLMs)
integration
Combining
domain
knowledge
provides
exploratory
step
evaluate
feasibility
screening,
especially
diverse,
volume,
studies.
Additionally,
has
approach
managing
disagreement
among
researchers
knowledge,
though
further
validation
needed.
Communications Biology,
Journal Year:
2025,
Volume and Issue:
8(1)
Published: Feb. 4, 2025
Peptide-based
drugs
often
fail
in
clinical
trials
due
to
their
toxicity
or
hemolytic
activity
against
red
blood
cells
(RBCs).
Existing
methods
predict
peptides
but
not
the
concentration
(HC50)
required
lyse
50%
of
RBCs.
This
study
develops
classification
and
regression
models
identify
quantify
activity.
These
train
on
1926
with
experimentally
determined
HC50
mammalian
Analysis
indicates
that
hydrophobic
positively
charged
residues
were
associated
higher
Among
models,
including
machine
learning
(ML),
quantum
ML,
protein
language
a
hybrid
model
combining
random
forest
(RF)
motif-based
approach
achieves
highest
area
under
receiver
operating
characteristic
curve
(AUROC)
0.921.
Regression
achieve
Pearson
correlation
coefficient
(R)
0.739
determination
(R²)
0.543.
outperform
existing
are
implemented
HemoPI2,
web-based
platform
standalone
software
for
designing
desired
values
(
http://webs.iiitd.edu.in/raghava/hemopi2/
).
Advanced Science,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 27, 2025
Abstract
Integrating
single‐cell
datasets
from
multiple
studies
provides
a
cost‐effective
way
to
build
comprehensive
cell
atlases,
granting
deeper
insights
into
cellular
characteristics
across
diverse
biological
systems.
However,
current
data
integration
methods
struggle
with
interference
in
partially
overlapping
and
varying
annotation
granularities.
Here,
multiselective
adversarial
network
is
introduced
for
the
first
time
present
UniMap,
which
functions
as
“discerner”
identify
exclude
interfering
cells
various
sources
during
dataset
integration.
Compared
other
state‐of‐the‐art
methods,
UniMap
emphasizes
type‐level
proves
be
best
model
preserving
variability,
achieving
noticeably
higher
accuracy
automated
under
circumstances.
Additionally,
it
enhances
interpretability
by
revealing
shared
domain‐specific
types
providing
prediction
confidence.
The
efficacy
of
demonstrated
terms
identifying
new
types,
creating
high‐resolution
annotating
along
developmental
trajectories,
performing
cross‐species
analysis,
underscoring
its
potential
robust
tool
research.
PLoS ONE,
Journal Year:
2025,
Volume and Issue:
20(5), P. e0322978 - e0322978
Published: May 7, 2025
Background
Predicting
protein-DNA
binding
sites
in
vivo
is
a
challenging
but
urgent
task
many
fields
such
as
drug
design
and
development.
Most
promoters
contain
transcription
factor
(TF)
sites,
yet
only
few
have
been
identified
through
time-consuming
biochemical
experiments.
To
address
this
challenge,
numerous
computational
approaches
proposed
to
predict
TF
from
DNA
sequences.
However,
current
deep
learning
methods
often
face
issues
gradient
vanishing
the
model
depth
increases,
leading
suboptimal
feature
extraction.
Results
We
propose
called
CBR-KAN
(where
C
represents
Convolutional
Neural
Network
(CNN),
B
Bidirectional
Long
Short
Term
Memory
(BiLSTM),
R
Residual
Mechanism)
sites.
Specifically,
we
designed
multi-scale
convolution
module
(ConvBlock1,
2,
3)
combined
with
BiLSTM
network,
introduced
KAN
network
replace
traditional
multilayer
perceptron,
promoted
optimization
residual
connections.
Testing
on
50
common
ChIP
seq
benchmark
datasets
shows
that
outperforms
other
state-of-the-art
DeepBind,
DanQ,
DeepD2V,
DeepSEA
predicting
Conclusions
The
significantly
improves
prediction
accuracy
for
by
effectively
integrating
multiple
neural
architectures
mechanisms.
This
approach
not
enhances
extraction
also
stabilizes
training
boosts
generalization
capabilities.
promising
results
key
performance
indicators
demonstrate
potential
of
bioinformatics
applications.