bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Sept. 28, 2023
Abstract
Gene
mining,
particularly
from
small
sample
sizes
such
as
in
plants,
remains
a
challenge
life
sciences.
Traditional
methods
often
omit
significant
genes,
while
deep
learning
techniques
are
hindered
by
constraints
and
lack
specialized
gene
mining
approaches.
This
paper
presents
TransGeneSelector,
the
first
method
tailored
for
key
transcriptomic
datasets,
ingeniously
integrating
data
augmentation,
filtering,
Transformer-based
classifier.
Tested
on
Arabidopsis
thaliana
seeds’
germination
classification
using
just
79
samples,
it
not
only
achieves
performance
par
with,
if
superior
to,
Random
Forest
SVM
but
also
excels
identifying
upstream
regulatory
genes
that
might
miss,
these
pinpointed
more
accurately
reflect
metabolic
processes
inherent
seed
germination.
TransGeneSelector’s
ability
to
mine
vital
limited
datasets
signifies
its
potential
current
state-of-the-art
scenarios,
providing
an
efficient
versatile
solution
this
critical
research
area.
PROTEOMICS,
Journal Year:
2023,
Volume and Issue:
23(23-24)
Published: June 29, 2023
Abstract
In
recent
years,
the
rapid
growth
of
biological
data
has
increased
interest
in
using
bioinformatics
to
analyze
and
interpret
this
data.
Proteomics,
which
studies
structure,
function,
interactions
proteins,
is
a
crucial
area
bioinformatics.
Using
natural
language
processing
(NLP)
techniques
proteomics
an
emerging
field
that
combines
machine
learning
text
mining
Recently,
transformer‐based
NLP
models
have
gained
significant
attention
for
their
ability
process
variable‐length
input
sequences
parallel,
self‐attention
mechanisms
capture
long‐range
dependencies.
review
paper,
we
discuss
advancements
proteome
examine
advantages,
limitations,
potential
applications
improve
accuracy
efficiency
various
tasks.
Additionally,
highlight
challenges
future
directions
these
research.
Overall,
provides
valuable
insights
into
revolutionize
Cell Systems,
Journal Year:
2024,
Volume and Issue:
15(6), P. 488 - 496
Published: May 28, 2024
As
words
can
have
multiple
meanings
that
depend
on
sentence
context,
genes
various
functions
the
surrounding
biological
system.
This
pleiotropic
nature
of
gene
function
is
limited
by
ontologies,
which
annotate
without
considering
contexts.
We
contend
problem
in
genetics
may
be
informed
recent
technological
leaps
natural
language
processing,
representations
word
semantics
automatically
learned
from
diverse
In
contrast
to
efforts
model
as
"is-a"
relationships
1990s,
modern
distributional
represents
vectors
a
semantic
space
and
fuels
current
advances
transformer-based
models
such
large
generative
pre-trained
transformers.
A
similar
shift
thinking
distributions
over
cellular
contexts
enable
breakthrough
data-driven
learning
datasets
inform
function.
Heliyon,
Journal Year:
2025,
Volume and Issue:
11(2), P. e41488 - e41488
Published: Jan. 1, 2025
Deciphering
information
of
RNA
sequences
reveals
their
diverse
roles
in
living
organisms,
including
gene
regulation
and
protein
synthesis.
Aberrations
sequence
such
as
dysregulation
mutations
can
drive
a
spectrum
diseases
cancers,
genetic
disorders,
neurodegenerative
conditions.
Furthermore,
researchers
are
harnessing
RNA's
therapeutic
potential
for
transforming
traditional
treatment
paradigms
into
personalized
therapies
through
the
development
RNA-based
drugs
therapies.
To
gain
insights
biological
functions
to
detect
at
early
stages
develop
potent
therapeutics,
performing
types
analysis
tasks.
conventional
wet-lab
methods
is
expensive,
time-consuming
error
prone.
enable
large-scale
analysis,
empowerment
experimental
with
Artificial
Intelligence
(AI)
applications
necessitates
scientists
have
comprehensive
knowledge
both
DNA
AI
fields.
While
molecular
biologists
encounter
challenges
understanding
methods,
computer
often
lack
basic
foundations
Considering
absence
literature
that
bridges
this
research
gap
promotes
AI-driven
applications,
contributions
manuscript
manifold:
It
equips
47
distinct
sets
stage
benchmark
datasets
related
tasks
by
facilitating
cruxes
64
different
databases.
presents
word
embeddings
language
models
across
streamlines
new
predictors
providing
survey
58
70
based
predictive
pipelines
performance
values
well
top
encoding
performances
BMC Genomics,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: March 17, 2025
Gene
mining
is
crucial
for
understanding
the
regulatory
mechanisms
underlying
complex
biological
processes,
particularly
in
plants
responding
to
environmental
conditions.
Traditional
machine
learning
methods,
while
useful,
often
overlook
important
gene
relationships
due
their
reliance
on
manual
feature
selection
and
limited
ability
capture
inter-gene
dynamics.
Deep
approaches,
powerful,
are
unsuitable
small
sample
sizes.
This
study
introduces
TransGeneSelector,
first
deep
framework
specifically
designed
key
genes
from
transcriptomic
datasets.
By
integrating
a
Wasserstein
Generative
Adversarial
Network
with
Gradient
Penalty
(WGAN-GP)
generation
Transformer-based
network
classification,
TransGeneSelector
efficiently
addresses
challenges
of
small-sample
data,
capturing
both
global
interactions
specific
processes.
Evaluated
Arabidopsis
thaliana,
model
achieved
high
classification
accuracy
predicting
seed
germination
heat
stress
conditions,
outperforming
traditional
methods
like
Random
Forest
Support
Vector
Machines
(SVM).
Moreover,
Shapley
Additive
Explanations
(SHAP)
analysis
construction
revealed
that
effectively
identified
appear
have
upstream
functions
based
our
analyses,
enriching
them
multiple
pathways
which
critical
response.
RT-qPCR
validation
further
confirmed
model's
accuracy,
demonstrating
consistent
expression
patterns
across
varying
The
findings
underscore
potential
as
robust
tool
mining,
offering
deeper
insights
into
regulation
organism
adaptation
under
diverse
work
provides
leverages
identification
Mathematics,
Journal Year:
2025,
Volume and Issue:
13(6), P. 975 - 975
Published: March 15, 2025
Hot
metal
temperature
is
a
key
factor
affecting
the
quality
and
energy
consumption
of
iron
steel
smelting.
Accurate
prediction
drop
in
hot
ladle
very
important
for
optimizing
transport,
improving
efficiency,
reducing
consumption.
Most
existing
studies
focus
on
molten
torpedo
tanks,
but
there
significant
research
gap
drop,
especially
as
increasingly
used
to
replace
tank
transportation
process,
this
has
not
been
fully
addressed
literature.
This
paper
proposes
an
interpretable
hybrid
deep
learning
model
combining
Bi-LSTM
Transformer
solve
complexity
prediction.
By
leveraging
Catboost-RFECV,
most
influential
variables
are
selected,
captures
both
local
features
with
global
dependencies
Transformer.
Hyperparameters
optimized
automatically
using
Optuna,
enhancing
performance.
Furthermore,
SHAP
analysis
provides
valuable
insights
into
factors
influencing
drops,
enabling
more
accurate
temperature.
The
experimental
results
demonstrate
that
proposed
outperforms
each
individual
ensemble
terms
R2,
RMSE,
MAE,
other
evaluation
metrics.
Additionally,
identifies
contributing
drop.