bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 21, 2024
Abstract
Genomes
are
fundamental
to
understanding
microbial
ecology
and
evolution.
The
emergence
of
high-throughput,
long-read
DNA
sequencing
has
enabled
recovery
genomes
from
environmental
samples
at
scale.
However,
expanding
the
genome
catalogue
soils
sediments
been
challenging
due
enormous
complexity
these
environments.
Here,
we
performed
deep,
Nanopore
154
soil
sediment
collected
across
Denmark
through
an
optimised
bioinformatics
pipeline,
recovered
15,314
novel
species,
including
4,757
high-quality
genomes.
span
1,086
genera
provide
first
reference
for
612
previously
known
genera,
phylogenetic
diversity
prokaryotic
tree
life
by
8
%.
assemblies
also
thousands
complete
rRNA
operons,
biosynthetic
gene
clusters
CRISPR-Cas
systems,
all
which
were
underrepresented
highly
fragmented
in
previous
terrestrial
catalogues.
Furthermore,
incorporation
MAGs
into
public
databases
significantly
improved
species-level
classification
rates
metagenomic
datasets,
thereby
enhancing
microbiome
characterization.
With
this
study,
demonstrate
that
bioinformatics,
allows
cost-effective
complex
ecosystems,
remain
largest
untapped
source
biodiversity
filling
gaps
life.
Science Advances,
Journal Year:
2025,
Volume and Issue:
11(3)
Published: Jan. 17, 2025
Following
30
years
of
sequencing,
we
assessed
the
phylogenetic
diversity
(PD)
>1.5
million
microbial
genomes
in
public
databases,
including
metagenome-assembled
(MAGs)
uncultivated
microbes.
As
compared
to
vast
uncovered
by
metagenomic
sequences,
cultivated
taxa
account
for
a
modest
portion
overall
diversity,
9.73%
bacteria
and
6.55%
archaea,
while
MAGs
contribute
48.54%
57.05%,
respectively.
Therefore,
substantial
fraction
bacterial
(41.73%)
archaeal
PD
(36.39%)
still
lacks
any
genomic
representation.
This
unrepresented
manifests
primarily
at
lower
taxonomic
ranks,
exemplified
134,966
species
identified
18,087
samples.
Our
study
exposes
hotspots
freshwater,
marine
subsurface,
sediment,
soil,
other
environments,
whereas
human
samples
yielded
minimal
novelty
within
context
existing
datasets.
These
results
offer
roadmap
future
genome
recovery
efforts,
delineating
uncaptured
underexplored
environments
underscoring
necessity
renewed
isolation
sequencing.
Environmental Microbiome,
Journal Year:
2024,
Volume and Issue:
19(1)
Published: Aug. 2, 2024
Soil
microbiomes
are
heterogeneous,
complex
microbial
communities.
Metagenomic
analysis
is
generating
vast
amounts
of
data,
creating
immense
challenges
in
sequence
assembly
and
analysis.
Although
advances
technology
have
resulted
the
ability
to
easily
collect
large
soil
samples
containing
thousands
unique
taxa
often
poorly
characterized.
These
reduce
usefulness
genome-resolved
metagenomic
(GRM)
seen
other
fields
microbiology,
such
as
creation
high
quality
assembled
genomes
adoption
genome
scale
modeling
approaches.
The
absence
these
resources
restricts
future
research,
limiting
hypothesis
generation
predictive
Creating
publicly
available
databases
MAGs,
similar
produced
for
microbiomes,
has
potential
transform
scientific
insights
about
without
requiring
computational
domain
expertise
binning.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 5, 2025
ABSTRACT
Genome
foundation
models
hold
transformative
potential
for
precision
medicine,
drug
discovery,
and
understanding
complex
biological
systems.
However,
existing
are
often
inefficient,
constrained
by
suboptimal
tokenization
architectural
design,
biased
toward
reference
genomes,
limiting
their
representation
of
low-abundance,
uncultured
microbes
in
the
rare
biosphere.
To
address
these
challenges,
we
developed
GenomeOcean
,
a
4-billion-parameter
generative
genome
model
trained
on
over
600
Gbp
high-quality
contigs
derived
from
220
TB
metagenomic
datasets
collected
diverse
habitats
across
Earth’s
ecosystems.
A
key
innovation
is
training
directly
large-scale
co-assemblies
samples,
enabling
enhanced
microbial
species
improving
generalizability
beyond
genome-centric
approaches.
We
implemented
byte-pair
encoding
(BPE)
strategy
sequence
generation,
alongside
optimizations,
achieving
up
to
150×
faster
generation
while
maintaining
high
fidelity.
excels
representing
generating
protein-coding
genes
evolutionary
principles.
Additionally,
its
fine-tuned
demonstrates
ability
discover
novel
biosynthetic
gene
clusters
(BGCs)
natural
genomes
perform
zero-shot
synthesis
biochemically
plausible,
complete
BGCs.
sets
new
benchmark
research,
product
synthetic
biology,
offering
robust
advancing
fields.
Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
One
of
the
main
goals
metagenomic
studies
is
to
describe
taxonomic
diversity
microbial
communities.
A
crucial
step
in
analysis
binning,
which
involves
(supervised)
classification
or
(unsupervised)
clustering
sequences.
Various
machine
learning
models
have
been
applied
address
this
task.
In
review,
contributions
artificial
neural
networks
(ANN)
context
binning
are
detailed,
addressing
both
supervised,
unsupervised,
and
semi-supervised
approaches.
34
ANN-based
tools
systematically
compared,
detailing
their
architectures,
input
features,
datasets,
advantages,
disadvantages,
other
relevant
aspects.
The
findings
reveal
that
deep
approaches,
such
as
convolutional
autoencoders,
achieve
higher
accuracy
scalability
than
traditional
methods.
Gaps
benchmarking
practices
highlighted,
future
directions
proposed,
including
standardized
datasets
optimization
for
third-generation
sequencing.
This
review
provides
support
researchers
identifying
trends
selecting
suitable
problem.
Scientific Data,
Journal Year:
2025,
Volume and Issue:
12(1)
Published: March 28, 2025
Soil
contains
a
diverse
community
of
organisms;
these
can
include
archaea,
fungi,
viruses,
and
bacteria.
In
situ
identification
soil
microorganisms
is
challenging.
The
use
genome-centric
metagenomics
enables
the
assembly
microbial
populations,
allowing
categorization
exploration
potential
functions
living
in
complex
environment.
However,
heterogeneity
soil-inhabiting
microbes
poses
tremendous
challenge,
with
their
left
unknown,
difficult
to
culture
lab
settings.
this
study,
using
genome
assembling
strategies
from
both
field
core
samples
enriched
monolith
samples,
we
assembled
679
highly
complete
metagenome-assembled
genomes
(MAGs).
ability
identify
MAGs
across
precipitation
gradient
state
Kansas
(USA)
provided
insights
into
impact
levels
on
populations.
Metabolite
modeling
revealed
that
more
than
80%
populations
possessed
carbohydrate-active
enzymes,
capable
breaking
down
chitin
starch.
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY,
Journal Year:
2024,
Volume and Issue:
74(9)
Published: Sept. 9, 2024
Here,
I
review
the
dynamic
history
of
prokaryotic
phyla.
Following
leads
set
by
Darwin,
Haeckel
and
Woese,
concept
phylum
has
evolved
from
a
group
sharing
common
phenotypes
to
organisms
ancestry,
with
modern
taxonomy
based
on
phylogenetic
classifications
drawn
macromolecular
sequences.
Phyla
came
as
surprising
latecomers
formalities
nomenclature
in
2021.
Since
then
names
have
been
validly
published
for
46
phyla,
replacing
some
established
neologisms,
prompting
criticism
debate
within
scientific
community.
Molecular
barcoding
enabled
analysis
microbial
ecosystems
without
cultivation,
leading
identification
candidate
divisions
(or
phyla)
diverse
environments.
The
introduction
metagenome-assembled
genomes
marked
significant
advance
identifying
classifying
uncultured
lumper–splitter
dichotomy
led
disagreements,
experts
cautioning
against
pressure
create
profusion
new
phyla
prominent
databases
adopting
conservative
stance.
Candidatus
designation
widely
used
provide
provisional
status
taxa,
named
under
this
convention
now
clearly
surpassing
those
names.
Genome
Taxonomy
Database
(GTDB)
offered
stable,
standardized
normalized
taxonomic
ranks,
which
both
lumping
splitting
pre-existing
GTDB
framework
introduced
unwieldy
alphanumeric
placeholder
labels,
recent
publication
over
100
user-friendly
Latinate
unnamed
Most
remain
‘known
unknowns’,
limited
knowledge
their
genomic
diversity,
ecological
roles,
or
Whether
still
reflect
evolutionary
partitions
across
life
remains
an
area
active
debate.
However,
practical
importance
microbiome
analyses,
particularly
clinical
research.
Despite
potential
diminishing
returns
discovery
biodiversity,
offer
extensive
research
opportunities
microbiologists
foreseeable
future.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Feb. 8, 2024
Abstract
Metagenomic
binning,
the
process
of
grouping
DNA
sequences
into
taxonomic
units,
is
critical
for
understanding
functions,
interactions,
and
evolutionary
dynamics
microbial
communities.
We
propose
a
deep
learning
approach
to
binning
using
two
neural
networks,
one
based
on
composition
another
environmental
abundance,
dynamically
weighting
contribution
each
characteristics
input
data.
Trained
over
43,000
prokaryotic
genomes,
our
network
composition-based
inspired
by
metric
techniques
used
facial
recognition.
Using
task-specific,
multi-GPU
accelerated
algorithm
cluster
embeddings
produced
network,
binner
leverages
marker
genes
observed
be
universally
present
in
nearly
all
taxa
grade
select
optimal
clusters
from
hierarchy
candidates.
evaluate
four
simulated
datasets
with
known
ground
truth.
Our
linear
time
integration
recovers
more
near
complete
genomes
than
state
art
but
computationally
infeasible
solutions
them,
while
being
an
order
magnitude
faster.
Finally,
we
demonstrate
scalability
acuity
testing
it
three
largest
metagenome
assemblies
ever
performed.
Compared
other
binners,
47%-183%
genomes.
From
these
datasets,
find
3000
new
candidate
species
which
have
never
been
previously
cataloged,
representing
potential
4%
expansion
bacterial
tree
life.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: June 4, 2024
Coastal
marine
sediments
function
as
large-scale
natural
biocatalytic
filters,
remineralizing
and
transforming
organic
matter.
Benthic
microbiomes
exhibit
remarkable
temporal
stability,
contrasting
with
the
dynamic,
substrate-driven
successions
of
bacterioplankton.
Nonetheless,
understanding
their
role
in
carbon
cycling
interactions
between
these
microbial
groups
is
limited
due
to
complexity
benthic
microbiomes.
Here,
we
used
a
seasonally
resolved,
deep
short-
long-read
metagenomic
approach
examine
distinctive
genomic
features
recovered
from
sediment,
overlaying
water
column,
particle-attached
bacteria
archaea
North
Sea.
We
115
metagenome-assembled
genomes
(MAGs)
that
belonged
Woeseiales
,
Rhizobiales
Planctomycetia
Gemmatimonadota
Desulfobacterota
species.
While
Proteobacteria
Actinobacteriota
were
characteristic
phyla
sediments,
Acidimicrobiia
Desulfocapsaceae
species
shared
fractions
indicative
significant
bentho-pelagic
coupling.
Predominant
members
family
Woeseiaceae
carried
polysaccharide
utilization
loci
(PULs)
predicted
target
laminarin,
alginate,
α-glucan
sediments.
In
contrast,
column
lacked
PULs
encoded
significantly
higher
fraction
sulfatases
peptidases,
indicating
degradation
protein-rich
sulfated
Our
findings
disentangle
family-level
adaptations
niche
differentiation
globally
populations
involved
matter
storage.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 14, 2024
Abstract
Background
Wet
tropical
forest
soils
store
a
vast
amount
of
organic
carbon
and
cycle
over
third
terrestrial
net
primary
production.
The
microbiomes
these
have
global
impact
on
greenhouse
gases
tolerate
remarkably
dynamic
redox
environment—driven
by
high
availability
reductant,
soil
moisture,
fine-textured
that
limit
oxygen
diffusion.
Yet
microbiomes,
particularly
virus-host
interactions,
remain
poorly
characterized,
we
little
understanding
how
they
will
shape
future
cycling
as
high-intensity
drought
precipitation
events
make
conditions
less
predictable.
Results
To
investigate
the
effects
shifting
active
viral
communities
virus-microbe
conducted
44-day
manipulation
experiment
using
from
Luquillo
Experimental
Forest,
Puerto
Rico,
amended
with
13
C-enriched
plant
biomass.
We
sequenced
10
bulk
metagenomes
85
stable
isotope
probing
targeted
generated
extracting
whole
community
DNA,
performing
density
fractionation,
conducting
shotgun
sequencing.
Viral
microbial
genomes
were
assembled
resulting
in
5,420
populations
(vOTUs)
927
medium-to-high-quality
metagenome-assembled
across
25
bacterial
phyla.
Notably,
half
(54%)
vOTUs
C-enriched,
highlighting
their
role
degradation
litter.
These
primarily
infected
phyla
Pseudomonadota
,
Acidobacteriota
Actinomycetota
57%
unique
to
particular
treatment.
anoxic
samples
exhibited
most
distinct
communities,
an
increased
potential
for
modulating
host
metabolism
carrying
redox-specific
glycoside
hydrolases.
However,
present
all
conditions,
suggesting
selection
cosmopolitan
viruses
occurs
naturally
experience
conditions.
Conclusions
Our
study
demonstrates
interactions
soils.
By
applying
different
DNA
assembly
methods
incubating
under
various
regimes,
identified
observed
significant
variations
composition
function.
findings
highlight
specialized
roles
diverse
environmental
providing
important
insights
into
contributions
broader
implications
climate
change.
Computing in Science & Engineering,
Journal Year:
2024,
Volume and Issue:
26(2), P. 8 - 15
Published: April 1, 2024
The
Exabiome
project
seeks
to
improve
the
understanding
of
microbiomes
through
development
methods
for
accelerating
metagenomic
science
using
exascale
computing.
This
article
gives
an
overview
scientific
impact
three
components
project:
metagenome
assembly,
protein
family
detection
and
comparative
analysis
metagenomes.
developed
MetaHipMer,
only
assembler
capable
scaling
full
systems.
MetaHipMer
has
enabled
ground-breaking
assemblies
on
Frontier
supercomputer,
with
many
benefits,
such
as
discovery
rare
species
viral
genomes.
To
investigate
families,
two
tools,
PASTIS
HipMCL.
Together,
these
can
utilize
resources
understand
functional
diversity
billions
"dark
matter"
proteins
novel
families.
For
analysis,
kmerprof,
a
tool
that
be
used
compare
huge
metagenomes
different
purposes,
example,
grouping
human
according
body
location.