NAR Genomics and Bioinformatics,
Год журнала:
2024,
Номер
6(4)
Опубликована: Сен. 28, 2024
Abstract
Taxonomic
classification
of
viruses
is
essential
for
understanding
their
evolution.
Genomic
at
higher
taxonomic
ranks,
such
as
order
or
phylum,
typically
based
on
alignment
and
comparison
amino
acid
sequence
motifs
in
conserved
genes.
Classification
lower
genus
species,
usually
nucleotide
identities
between
genomic
sequences.
Building
our
whole-genome
analytical
framework,
we
here
describe
Genome
Relationships
Applied
to
Viral
Taxonomy
Version
2
(GRAViTy-V2),
which
encompasses
a
greatly
expanded
range
features
numerous
optimisations,
packaged
an
application
that
may
be
used
general-purpose
virus
tool.
Using
28
datasets
derived
from
the
ICTV
2022
taxonomy
proposals,
GRAViTy-V2
output
was
compared
against
human
expert-curated
classifications
assignments
2023
round
changes.
produced
taxonomies
equivalent
manually-curated
versions
down
family
level
almost
all
cases,
species
levels.
The
majority
discrepant
results
arose
errors
coding
annotations
INDSC
records,
inclusion
incomplete
genome
sequences
analysis.
Analysis
times
ranged
1-506
min
(median
3.59)
with
17-1004
genomes
mean
length
3000–1
000
bases.
Molecular Nutrition & Food Research,
Год журнала:
2025,
Номер
unknown
Опубликована: Март 26, 2025
Probiotics
are
microorganisms
that
offer
health
benefits
to
the
host.
Traditional
methods
for
identifying
these
organisms
time-consuming
and
resource-intensive.
This
study
addresses
need
a
more
efficient
accurate
approach
probiotic
identification
using
machine
learning
(ML)
techniques.
The
present
introduces
ProbML,
an
ML-based
from
whole
genome
sequences
of
prokaryotes.
Among
five
ML
algorithms
tested,
XGBoost
models
demonstrated
superior
performance,
achieving
maximum
accuracy
100%
on
data
95.45%
independent
test
dataset.
surpasses
existing
tools,
which
achieved
97.77%
66.28%
same
datasets,
respectively.
ProbML
were
used
analyze
4728
genomes
in
Unified
Human
Gastrointestinal
Genome
database,
classifying
650
as
probiotics,
with
many
previously
unreported.
A
versatile
GUI
platform
was
also
developed
employs
classification
or
can
be
generate
custom
classifiers
based
user-specific
needs
(https://github.com/sysbio-iitmandi/MLG_Dashboard).
emphasizes
power
genomic
advanced
techniques
accelerating
discovery.
Agriculture,
Год журнала:
2024,
Номер
14(12), С. 2299 - 2299
Опубликована: Дек. 14, 2024
Artificial
intelligence
(AI)
can
revolutionize
agriculture
by
enhancing
genomic
research
and
promoting
sustainable
crop
improvement.
AI
systems
integrate
machine
learning
(ML)
deep
(DL)
with
big
data
to
identify
complex
patterns
relationships
analyzing
vast
genomic,
phenotypic,
environmental
datasets.
This
capability
accelerates
breeding
cycles,
improves
predictive
accuracy,
supports
the
development
of
climate-resilient,
high-yielding
varieties.
Applications
such
as
precision
agriculture,
automated
phenotyping,
analytics,
early
pest
disease
detection
demonstrate
AI’s
ability
optimize
agricultural
practices
while
sustainability.
Despite
these
advancements,
challenges
remain,
including
fragmented
sources,
variability
in
phenotyping
protocols,
ownership
concerns.
Addressing
issues
through
standardized
integration
frameworks,
advanced
analytical
tools,
ethical
will
be
critical
for
realizing
full
potential.
review
provides
a
comprehensive
overview
AI-powered
research,
highlights
role
training
robust
models,
explores
technological
considerations
practices.
Whole
Genome
and
Proteome
Alignments,
represented
by
the
Multiple
Alignment
File
(MAF)
format,
have
become
a
standard
approach
in
comparative
genomics
proteomics.
These
often
require
identifying
conserved
motifs,
which
is
crucial
for
understanding
functional
evolutionary
relationships.
However,
current
approaches
lack
direct
method
motif
detection
within
MAF
files.
We
present
MAFin,
novel
tool
that
enables
efficient
conservation
analysis
files
to
address
this
gap,
streamlining
genomic
proteomic
research.
developed
first
Format
MAFin
multithreaded
search
of
motifs
using
three
approaches:
1)
user-specified
k-mers
sequences.
2)
with
regular
expressions,
case
one
or
more
patterns
are
searched,
3)
predefined
Position
Weight
Matrices.
Once
has
been
found,
detects
instances
calculates
across
aligned
also
percentage,
provides
information
about
levels
each
sequences,
based
on
number
matches
relative
length
motif.
A
set
statistics
interpretation
motif's
level,
detected
exported
JSON
CSV
downstream
analyses.
offered
as
Python
package
under
GPL
license
multi-platform
application
available
at:
https://github.com/Georgakopoulos-Soares-lab/MAFin.
Supplementary
data
at
Bioinformatics
online.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Фев. 8, 2025
Abstract
The
identification
of
succinct,
universal
fingerprints
that
enable
the
characterization
individual
taxonomies
can
reveal
insights
into
trait
development
and
have
widespread
applications
in
pathogen
diagnostics,
human
healthcare,
ecology
biomes.
Here,
we
investigated
existence
peptide
k-mer
sequences
are
exclusively
present
a
specific
taxonomy
absent
every
other
taxonomic
level,
termed
quasi-primes.
By
analyzing
proteomes
across
24,073
species,
identified
quasi-prime
peptides
to
superkingdoms,
kingdoms,
phyla,
uncovering
their
distributions
functional
relevance.
These
exhibit
remarkable
sequence
uniqueness
at
six-
seven-amino-
acid
lengths,
offering
evolutionary
divergence
lineage-specific
adaptations.
Moreover,
show
loci
more
prone
harboring
pathogenic
variants,
underscoring
significance.
This
study
introduces
quasi-primes
offers
contributions
proteomic
diversity,
pathways,
adaptations
tree
life,
while
emphasizing
potential
impact
on
health
disease.
ACS Omega,
Год журнала:
2025,
Номер
10(7), С. 6794 - 6800
Опубликована: Фев. 11, 2025
Analysis
of
observed
protein
sequences
across
all
species
within
the
UniProtKB/Swiss-Prot
data
set
reveals
CQWW
as
shortest
absent
stretch
amino
acids.
While
DNA
can
be
found
encoding
sequence,
it
has
never
been
to
translated
or
included
in
manually
curated
sets
proteins,
existing
only
predicted,
tentative
and
a
single
mature
antibody
sequence.
We
have
synthesized
this
"nullomer"
peptide,
along
with
13
derivatives,
reversed,
truncated,
stereoisomers,
alanine-scanning
peptides,
conjugated
polyarginine
stretches
increase
cellular
uptake.
their
impact
against
healthy
neuronal
line
six
patient-derived
glioblastoma
cell
lines
spanning
three
clinical
subtypes.
Results
reveal
IC50
values
averaging
4.9
μM
for
inhibition
survival
tested
oncogenic
lines.
High-content
phenotypic
analysis
features
reverse-phase
arrays
failed
discern
clear
mode
action
nullomer
peptide
but
suggests
mitochondrial
impairment
through
GSK3
isoforms,
supported
by
observations
reduced
stain
intensities.
With
recent
interest
we
see
results
study
starting
point
further
investigation
into
potentially
therapeutic
class.
BMC Bioinformatics,
Год журнала:
2025,
Номер
26(1)
Опубликована: Март 5, 2025
The
Mapper
algorithm
is
an
essential
tool
for
exploring
the
data
shape
in
topological
analysis.
With
a
dataset
as
input,
outputs
graph
representing
features
of
whole
dataset.
This
often
regarded
approximation
Reeb
classic
uses
fixed
interval
lengths
and
overlapping
ratios,
which
might
fail
to
reveal
subtle
dataset,
especially
when
underlying
structure
complex.
In
this
work,
we
introduce
distribution-guided
named
D-Mapper,
utilizes
property
probability
model
intrinsic
characteristics
generate
density-guided
covers
provide
enhanced
features.
Moreover,
metric
accounting
both
quality
overlap
clustering
extended
persistent
homology
measure
performance
Mapper-type
algorithms.
Our
numerical
experiments
indicate
that
D-Mapper
outperforms
various
scenarios.
We
also
apply
SARS-COV-2
coronavirus
RNA
sequence
explore
different
virus
variants.
results
can
vertical
horizontal
evolutionary
processes
viruses.
code
available
at
https://github.com/ShufeiGe/D-Mapper
.
from
based
on
model.
work
demonstrates
power
fusing
probabilistic
models
with
Briefings in Bioinformatics,
Год журнала:
2025,
Номер
26(2)
Опубликована: Март 1, 2025
Abstract
Predicting
long
non-coding
RNA
(lncRNA)-protein
interactions
is
essential
for
understanding
biological
processes
and
discovering
new
therapeutic
targets.
In
this
study,
we
propose
a
novel
model
based
on
inter-view
contrastive
learning
miRNA
fusion
lncRNA-protein
interaction
(LPI)
prediction,
called
ICMF-LPI,
which
utilizes
heterogeneous
information
network
to
enhance
LPI
prediction.
The
integrates
as
mediator,
constructing
an
lncRNA-miRNA-protein
network,
employs
metapath
extract
diverse
relationships
from
graphs.
By
fusing
miRNA-related
leveraging
across
inter-views,
ICMF-LPI
effectively
captures
potential
interactions.
Experimental
results,
including
five-fold
cross-validation,
demonstrate
the
model’s
superior
performance
compared
several
state-of-the-art
methods,
with
significant
improvements
in
area
under
receiver
operating
characteristic
curve
precision-recall
metrics.
Notably,
even
when
direct
connections
are
excluded,
still
achieves
competitive
predictive
accuracy,
performing
comparably
or
better
than
some
existing
models.
This
demonstrates
that
proposed
effective
scenarios
where
data
unavailable.
approach
offers
promising
direction
developing
models
bioinformatics,
particularly
challenging
conditions.
Frontiers in Zoology,
Год журнала:
2025,
Номер
22(1)
Опубликована: Апрель 17, 2025
Abstract
Reference
genome
assemblies
are
the
basis
for
comprehensive
genomic
analyses
and
comparisons.
Due
to
declining
sequencing
costs
growing
computational
power,
projects
now
feasible
in
smaller
labs.
De
novo
non-model
or
emerging
model
organisms
requires
knowledge
about
size
techniques
extracting
high
molecular
weight
DNA.
Next
quality,
amount
of
DNA
obtained
from
single
individuals
is
crucial,
especially,
when
dealing
with
small
organisms.
While
long-read
technologies
methods
choice
creating
quality
assemblies,
pure
short-read
might
bear
most
coding
parts
a
but
usually
much
more
fragmented
do
not
well
resolve
repeat
elements
structural
variants.
Several
initiatives
produce
organism
genomes
provide
rules
standards
assembly.
However,
sometimes
part
such
an
initiative
does
meet
its
standards.
Therefore,
if
scientific
question
can
be
answered
low
contiguity
intergenic
parts,
missing
chromosome
scale
assembly
should
prevent
publication.
This
review
describes
how
set
up
animal
project
lab,
estimate
resources,
deal
suboptimal
conditions.
Thus,
we
aim
suggest
optimal
strategies
that
fulfil
needs
according
specific
research
questions,
e.g.
“How
species
related
each
other
based
on
whole
genomes?”
(phylogenomics),
populations
within
differ?”
(population
genomics),
“Are
differences
between
relevant
conservation?”
(conservation
“Which
selection
pressure
acting
certain
genes?”
(identification
genes
under
selection),
“Did
repeats
expand
contract
recently?”
(repeat
dynamics).
Academia molecular biology and genomics.,
Год журнала:
2025,
Номер
2(2)
Опубликована: Апрель 18, 2025
The
simplest
building
blocks
of
the
genome,
k-mers,
show
two
properties
that
are
widely
observed.
Their
frequency
distribution
is
scale-free
(a
variant
Zipfian
distribution),
and
inverse
symmetry
k-mers
observable
on
same
strand.
These
phenomena
linked;
Watson–Crick
base
pairing
generates
(IS)
under
condition
present
both
strands
genome.
A
stable
equilibrium
k-mer
in
all
genomes
predicted
by
a
purely
probabilistic
theory,
Conservation
Hartley–Shannon
Information
(CoHSI).
This
does
not
replace
diverse
mechanism-based
explanations
IS
have
been
advanced,
but
principle,
it
aggregates
operative
mechanisms.
CoHSI
predicts
follows
from
should
decay
gradually
stochastically
as
genome
size
decreases
length
increases.
predictions
were
tested
178
domains
life
viruses.
precision
decayed
progressively
decreased
increased,
regardless
structure
genome;
DNA
or
RNA,
nuclear
plastid,
double-
single-stranded.
No
clear
partition
into
IS-compliant
non-compliant
could
be
inferred.
results
suggest
distributions
linked
emerge
probabilistically
mechanism-agnostic
manner
across
three