Computers in Biology and Medicine,
Journal Year:
2025,
Volume and Issue:
190, P. 110064 - 110064
Published: April 5, 2025
The
rapidly
advancing
field
of
artificial
intelligence
(AI)
has
transformed
numerous
scientific
domains,
including
biology,
where
a
vast
and
complex
volume
data
is
available
for
analysis.
This
paper
provides
comprehensive
overview
the
current
state
AI-driven
methodologies
in
genomics,
proteomics,
systems
biology.
We
discuss
how
machine
learning
algorithms,
particularly
deep
models,
have
enhanced
accuracy
efficiency
embedding
sequences,
motif
discovery,
prediction
gene
expression
protein
structure.
Additionally,
we
explore
integration
AI
analysis
biological
networks,
protein-protein
interaction
networks
multi-layered
networks.
By
leveraging
large-scale
data,
techniques
enabled
unprecedented
insights
into
processes
disease
mechanisms.
work
underlines
potential
applying
to
highlighting
applications
suggesting
directions
future
research
further
this
evolving
field.
Cell,
Journal Year:
2024,
Volume and Issue:
187(25), P. 7045 - 7063
Published: Dec. 1, 2024
Cells
are
essential
to
understanding
health
and
disease,
yet
traditional
models
fall
short
of
modeling
simulating
their
function
behavior.
Advances
in
AI
omics
offer
groundbreaking
opportunities
create
an
virtual
cell
(AIVC),
a
multi-scale,
multi-modal
large-neural-network-based
model
that
can
represent
simulate
the
behavior
molecules,
cells,
tissues
across
diverse
states.
This
Perspective
provides
vision
on
design
how
collaborative
efforts
build
AIVCs
will
transform
biological
research
by
allowing
high-fidelity
simulations,
accelerating
discoveries,
guiding
experimental
studies,
offering
new
for
cellular
functions
fostering
interdisciplinary
collaborations
open
science.
Nucleic Acids Research,
Journal Year:
2025,
Volume and Issue:
53(2)
Published: Jan. 11, 2025
Abstract
Recent
advancements
in
genomics,
propelled
by
artificial
intelligence,
have
unlocked
unprecedented
capabilities
interpreting
genomic
sequences,
mitigating
the
need
for
exhaustive
experimental
analysis
of
complex,
intertwined
molecular
processes
inherent
DNA
function.
A
significant
challenge,
however,
resides
accurately
decoding
which
inherently
involves
comprehending
rich
contextual
information
dispersed
across
thousands
nucleotides.
To
address
this
need,
we
introduce
GENA
language
model
(GENA-LM),
a
suite
transformer-based
foundational
models
capable
handling
input
lengths
up
to
36
000
base
pairs.
Notably,
integrating
newly
developed
recurrent
memory
mechanism
allows
these
process
even
larger
segments.
We
provide
pre-trained
versions
GENA-LM,
including
multispecies
and
taxon-specific
models,
demonstrating
their
capability
fine-tuning
addressing
spectrum
complex
biological
tasks
with
modest
computational
demands.
While
already
achieved
breakthroughs
protein
biology,
GENA-LM
showcases
similarly
promising
potential
reshaping
landscape
genomics
multi-omics
data
analysis.
All
are
publicly
available
on
GitHub
(https://github.com/AIRI-Institute/GENA_LM)
HuggingFace
(https://huggingface.co/AIRI-Institute).
In
addition,
web
service
(https://dnalm.airi.net/)
allowing
user-friendly
annotation
models.
Frontiers in Genetics,
Journal Year:
2025,
Volume and Issue:
15
Published: Jan. 7, 2025
Recent
advancements
in
deep
learning,
particularly
large
language
models
(LLMs),
made
a
significant
impact
on
how
researchers
study
microbiome
and
metagenomics
data.
Microbial
protein
genomic
sequences,
like
natural
languages,
form
of
life,
enabling
the
adoption
LLMs
to
extract
useful
insights
from
complex
microbial
ecologies.
In
this
paper,
we
review
applications
learning
analyzing
We
focus
problem
formulations,
necessary
datasets,
integration
modeling
techniques.
provide
an
extensive
overview
protein/genomic
their
contributions
studies.
also
discuss
such
as
novel
viromics
modeling,
biosynthetic
gene
cluster
prediction,
knowledge
for
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 4, 2024
ABSTRACT
The
emergence
of
genomic
language
models
(gLMs)
offers
an
unsupervised
approach
to
learning
a
wide
diversity
cis
-regulatory
patterns
in
the
non-coding
genome
without
requiring
labels
functional
activity
generated
by
wet-lab
experiments.
Previous
evaluations
have
shown
that
pre-trained
gLMs
can
be
leveraged
improve
predictive
performance
across
broad
range
regulatory
genomics
tasks,
albeit
using
relatively
simple
benchmark
datasets
and
baseline
models.
Since
these
studies
were
tested
upon
fine-tuning
their
weights
for
each
downstream
task,
determining
whether
gLM
representations
embody
foundational
understanding
biology
remains
open
question.
Here
we
evaluate
representational
power
predict
interpret
cell-type-specific
data
span
DNA
RNA
regulation.
Our
findings
suggest
probing
do
not
offer
substantial
advantages
over
conventional
machine
approaches
use
one-hot
encoded
sequences.
This
work
highlights
major
gap
with
current
gLMs,
raising
potential
issues
pre-training
strategies
genome.
Current Opinion in Structural Biology,
Journal Year:
2025,
Volume and Issue:
90, P. 102979 - 102979
Published: Jan. 7, 2025
The
mRNA
splicing
machinery
has
been
estimated
to
generate
100,000
known
protein-coding
transcripts
for
20,000
human
genes
(Ensembl,
Sept.
2024).
However,
this
set
is
expanding
with
the
massive
and
rapidly
growing
data
coming
from
high-throughput
technologies,
particularly
single-cell
long-read
sequencing.
Yet,
implications
of
complexity
at
protein
level
remain
largely
uncharted.
In
review,
we
describe
current
advances
toward
systematically
assessing
contribution
alternative
proteome
function
diversification.
We
discuss
potential
challenges
using
artificial
intelligence-based
techniques
in
identifying
proteoforms
characterising
their
structures,
interactions,
functions.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 8, 2025
Modeling
long-range
DNA
dependencies
is
crucial
for
understanding
genome
structure
and
function
across
a
wide
range
of
biological
contexts.
However,
effectively
capturing
these
extensive
dependencies,
which
may
span
millions
base
pairs
in
tasks
such
as
three-dimensional
(3D)
chromatin
folding
prediction,
remains
significant
challenge.
Furthermore,
comprehensive
benchmark
suite
evaluating
that
rely
on
notably
absent.
To
address
this
gap,
we
introduce
DNAL
ong
B
ench
,
dataset
encompassing
five
important
genomics
consider
up
to
1
million
pairs:
enhancer-target
gene
interaction,
expression
quantitative
trait
loci,
3D
organization,
regulatory
sequence
activity,
transcription
initiation
signals.
comprehensively
assess
evaluate
the
performance
methods:
task-specific
expert
model,
convolutional
neural
network
(CNN)-based
three
fine-tuned
foundation
models
-
HyenaDNA,
Caduceus-Ph,
Caduceus-PS.
We
envision
standardized
resource
with
potential
facilitate
comparisons
rigorous
evaluations
emerging
sequence-based
deep
learning
account
dependencies.
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: Jan. 24, 2025
Orphan
crops
are
important
sources
of
nutrition
in
developing
regions
and
many
tolerant
to
biotic
abiotic
stressors;
however,
modern
crop
improvement
technologies
have
not
been
widely
applied
orphan
due
the
lack
resources
available.
There
representatives
across
major
types
conservation
genes
between
these
related
species
can
be
used
improvement.
Machine
learning
(ML)
has
emerged
as
a
promising
tool
for
Transferring
knowledge
from
using
machine
improve
accuracy
efficiency
crops.
Here,
authors
review
transferring
breeding.