Scientific Data,
Journal Year:
2024,
Volume and Issue:
11(1)
Published: Nov. 21, 2024
This
study
showcases
121
new
genomes
of
spore-forming
Bacillales
from
strains
collected
globally
a
variety
habitats,
assembled
using
Oxford
Nanopore
long-read
and
MGI
short-read
sequences.
Bacilli
are
renowned
for
their
capacity
to
produce
diverse
secondary
metabolites
with
use
in
agriculture,
biotechnology,
medicine.
These
encoded
within
biosynthetic
gene
clusters
(smBGCs).
smBGCs
have
significant
research
interest
due
potential
as
sources
bioactivate
compounds.
Our
dataset
includes
62
complete
genomes,
2
at
chromosome
level,
57
contig
covering
genomic
size
range
3.50
Mb
7.15
Mb.
Phylotaxonomic
analysis
revealed
that
these
span
16
genera,
69
them
belonging
Bacillus.
A
total
1,176
predicted
BGCs
were
identified
by
silico
genome
mining.
We
anticipate
the
open-access
data
presented
here
will
expand
reported
information
facilitate
deeper
understanding
genetic
basis
Bacillales'
metabolite
production.
Nucleic Acids Research,
Journal Year:
2024,
Volume and Issue:
52(13), P. 7487 - 7503
Published: June 22, 2024
Abstract
Filamentous
Actinobacteria,
recently
renamed
Actinomycetia,
are
the
most
prolific
source
of
microbial
bioactive
natural
products.
Studies
on
biosynthetic
gene
clusters
benefit
from
or
require
chromosome-level
assemblies.
Here,
we
provide
DNA
sequences
>1000
isolates:
881
complete
genomes
and
153
near-complete
genomes,
representing
28
genera
389
species,
including
244
likely
novel
species.
All
filamentous
isolates
class
Actinomycetia
NBC
culture
collection.
The
largest
genus
is
Streptomyces
with
886
742
We
use
this
data
to
show
that
analysis
can
bring
biological
understanding
not
previously
derived
more
fragmented
less
systematic
datasets.
document
central
structured
location
core
genes
distal
specialized
metabolite
duplicate
linear
chromosome,
analyze
content
length
terminal
inverted
repeats
which
characteristic
for
Streptomyces.
then
diversity
trans-AT
polyketide
synthase
clusters,
encodes
machinery
a
biotechnologically
highly
interesting
compound
class.
These
insights
have
both
ecological
biotechnological
implications
in
importance
high
quality
genomic
resources
complex
role
synteny
plays
biology.
Genome biology,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: Jan. 14, 2025
Background
Streptomyces
is
a
highly
diverse
genus
known
for
the
production
of
secondary
or
specialized
metabolites
with
wide
range
applications
in
medical
and
agricultural
industries.
Several
thousand
complete
nearly
genome
sequences
are
now
available,
affording
opportunity
to
deeply
investigate
biosynthetic
potential
within
these
organisms
advance
natural
product
discovery
initiatives.
Results
We
perform
pangenome
analysis
on
2371
genomes,
including
approximately
1200
assemblies.
Employing
data-driven
approach
based
similarities,
was
classified
into
7
primary
42
Mash-clusters,
forming
basis
comprehensive
mining.
A
refined
workflow
grouping
gene
clusters
(BGCs)
redefines
their
diversity
across
different
Mash-clusters.
This
also
reassigns
2729
BGC
families
only
440
families,
reduction
caused
by
inaccuracies
boundary
detections.
When
genomic
location
BGCs
included
analysis,
conserved
structure,
synteny,
among
becomes
apparent
species
synteny
suggests
that
vertical
inheritance
major
factor
diversification
BGCs.
Conclusions
Our
dataset
at
scale
thousands
genomes
refines
predictions
using
Mash-clusters
as
analysis.
The
observed
conservation
order
BGCs’
locations
shows
vertically
inherited.
presented
in-depth
pave
way
large-scale
investigations
enhance
our
understanding
genus.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 28, 2025
Gene
clusters,
groups
of
physically
adjacent
genes
that
work
collectively,
are
pivotal
to
bacterial
fitness
and
valuable
in
biotechnology
medicine.
While
various
genome
mining
tools
can
identify
characterize
gene
they
often
overlook
their
evolutionary
diversity,
a
crucial
factor
revealing
novel
cluster
functions
applications.
To
address
this
gap,
we
developed
GATOR-GC,
targeted
tool
enables
comprehensive
flexible
exploration
clusters
single
execution.
We
show
GATOR-GC
identified
diversity
over
4
million
similar
experimentally
validated
biosynthetic
(BGCs)
other
fail
detect.
highlight
the
utility
previously
uncharacterized
co-occurring
conserved
potentially
involved
mycosporine-like
amino
acid
biosynthesis
mapped
taxonomic
patterns
genomic
islands
modify
DNA
with
7-deazapurines.
Additionally,
its
proximity-weighted
similarity
scoring,
successfully
differentiated
BGCs
FK-family
metabolites
(e.g.,
rapamycin,
FK506/520)
according
chemistries.
anticipate
will
be
assess
for
targeted,
exploratory,
mining.
is
available
at
https://github.com/chevrettelab/gator-gc
.
mSystems,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 24, 2025
ABSTRACT
Encoded
within
many
microbial
genomes,
biosynthetic
gene
clusters
(BGCs)
underlie
the
synthesis
of
various
secondary
metabolites
that
often
mediate
ecologically
important
functions.
Several
studies
and
bioinformatics
methods
developed
over
past
decade
have
advanced
our
understanding
both
pangenomes
BGC
evolution.
In
this
minireview,
we
first
highlight
challenges
in
broad
evolutionary
analysis
BGCs,
including
delineation
boundaries
clustering
BGCs
across
genomes.
We
further
summarize
key
findings
from
comparative
genomics
on
conservation
taxa
habitats
discuss
potential
fitness
effects
different
settings.
Afterward,
recent
research
showing
importance
genomic
context
production
evolution
is
highlighted.
These
draw
parallels
to
recent,
broader,
investigations
gene-to-gene
associations
pangenomes.
Finally,
describe
mechanisms
by
which
evolve,
ranging
acquisition
or
origination
entire
micro-evolutionary
trends
individual
genes.
An
outlook
how
expansions
capabilities
some
might
support
theories
open
are
result
adaptive
also
discussed.
conclude
with
remarks
about
future
work
leveraging
longitudinal
metagenomics
diverse
ecosystems
likely
significantly
improve
genomes
BGCs.
Nucleic Acids Research,
Journal Year:
2025,
Volume and Issue:
unknown
Published: April 25, 2025
Abstract
Microorganisms
synthesize
small
bioactive
compounds
through
their
secondary
or
specialized
metabolism.
Those
play
an
important
role
in
microbial
interactions
and
soil
health,
but
are
also
crucial
for
the
development
of
pharmaceuticals
agrochemicals.
Over
past
decades,
advancements
genome
sequencing
have
enabled
identification
large
numbers
biosynthetic
gene
clusters
directly
from
genomes.
Since
its
inception
2011,
antiSMASH
(https://antismash.secondarymetabolites.org/),
has
become
leading
tool
detecting
characterizing
these
bacteria
fungi.
This
paper
introduces
version
8
antiSMASH,
which
increased
number
detectable
cluster
types
81
to
101,
improved
analysis
support
terpenoids
tailoring
enzymes,
as
well
improvements
modular
enzymes
like
polyketide
synthases
nonribosomal
peptide
synthetases.
These
modifications
keep
up-to-date
with
developments
field
extend
overall
predictive
capabilities
natural
product
mining.
Nucleic Acids Research,
Journal Year:
2024,
Volume and Issue:
53(D1), P. D806 - D818
Published: Nov. 22, 2024
Abstract
The
exponential
growth
of
microbial
genome
data
presents
unprecedented
opportunities
for
unlocking
the
potential
microorganisms.
burgeoning
field
pangenomics
offers
a
framework
extracting
insights
from
this
big
biological
data.
Recent
advances
in
pangenomic
research
have
generated
substantial
and
literature,
yielding
valuable
knowledge
across
diverse
species.
PanKB
(pankb.org),
knowledgebase
designed
biotechnological
applications,
was
built
to
capitalize
on
wealth
information.
currently
includes
51
pangenomes
8
industrially
relevant
families,
comprising
8402
genomes,
over
500
000
genes
7M
mutations.
To
describe
data,
implements
four
main
components:
(1)
Interactive
analytics
facilitate
exploration,
intuition,
discoveries;
(2)
Alleleomic
analytics,
pangenomic-scale
analysis
variants,
providing
into
intra-species
sequence
variation
mutations
applications;
(3)
A
global
search
function
enabling
broad
deep
investigations
power
bioengineering
workflows;
(4)
bibliome
833
open-access
papers
an
interface
with
LLM
that
can
answer
in-depth
questions
using
its
knowledge.
empowers
researchers
bioengineers
harness
serves
as
resource
bridging
gap
between
practical
applications.
mSystems,
Journal Year:
2024,
Volume and Issue:
9(11)
Published: Oct. 4, 2024
ABSTRACT
Bacillus
subtilis
is
an
important
industrial
and
environmental
microorganism
known
to
occupy
many
niches
produce
compounds
of
interest.
Although
it
one
the
best-studied
organisms,
much
this
focus
including
reconstruction
genome-scale
metabolic
models
has
been
placed
on
a
few
key
laboratory
strains.
Here,
we
substantially
expand
these
prior
pan-genome-scale,
representing
481
genomes
B.
with
2,315
orthologous
gene
clusters,
1,874
metabolites,
2,239
reactions.
Furthermore,
incorporate
data
from
carbon
utilization
experiments
for
eight
strains
refine
validate
its
predictions.
This
comprehensive
pan-genome
model
enables
assessment
strain-to-strain
differences
related
nutrient
utilization,
fermentation
outputs,
robustness,
other
aspects.
Using
phenotypic
predictions,
divide
into
five
groups
distinct
patterns
behavior
that
correlate
across
features.
The
offers
deep
insights
subtilis’
metabolism
as
varies
environments
provides
understanding
how
different
have
adapted
dynamic
habitats.
IMPORTANCE
As
volume
genomic
computational
power
increased,
so
number
models.
These
encapsulate
totality
functions
given
organism.
strain
168
first
bacteria
which
network
was
reconstructed.
Since
then,
several
updated
reconstructions
generated
microorganism.
single
pan-genome-scale
model,
consists
individual
By
evaluating
between
strains,
identified
allowing
rapid
classification
any
particular
strain.
aids
identification
suitable
application.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 19, 2024
Abstract
The
exponential
growth
of
microbial
genome
data
presents
unprecedented
opportunities
for
mining
the
potential
microorganisms.
burgeoning
field
pangenomics
offers
a
framework
extracting
insights
from
this
big
biological
data.
Recent
advances
in
pangenomic
research
have
generated
substantial
and
literature,
yielding
valuable
knowledge
across
diverse
species.
PanKB
(pankb.org),
knowledgebase
designed
biotechnological
applications,
was
built
to
capitalize
on
wealth
information.
currently
includes
51
pangenomes
8
industrially
relevant
families,
comprising
8,
402
genomes,
over
500,
000
genes,
7M
mutations.
To
describe
data,
implements
four
main
components:
1)
Interactive
analytics
facilitate
exploration,
intuition,
discoveries;
2)
Alleleomic
analytics,
pangenomic-
scale
analysis
variants,
providing
into
intra-species
sequence
variation
mutations
applications;
3)
A
global
search
function
enabling
broad
deep
investigations
power
bioengineering
workflows;
4)
bibliome
833
open-
access
papers
an
interface
with
LLM
that
can
answer
in-depth
questions
using
their
knowledge.
empowers
researchers
bioengineers
harness
full
serves
as
resource
bridging
gap
between
practical
applications.
Graphical
BMC Genomics,
Journal Year:
2024,
Volume and Issue:
25(1)
Published: Nov. 16, 2024
Microbes
produce
diverse
bioactive
natural
products
with
applications
in
fields
such
as
medicine
and
agriculture.
In
their
genomes,
these
are
encoded
by
physically
clustered
genes
known
biosynthetic
gene
clusters
(BGCs).
Genome
metagenome
sequencing
advances
have
enabled
high-throughput
identification
of
BGCs
a
promising
avenue
for
product
discovery.
BGC
mining
from
(meta)genomes
using
silico
tools
has
allowed
access
to
vast
diversity
potentially
novel
products.
However,
fundamental
limitation
been
the
ability
assemble
complete
BGCs,
especially
complex
metagenomes.
With
fragmented
assemblies,
short-read
technologies
struggle
recover
long
repetitive
nonribosomal
peptide
synthetase
(NRPS)
polyketide
synthase
(PKS).
Recent
long-read
sequencing,
High
Fidelity
(HiFi)
technology
PacBio,
reduced
this
can
help
retrieve
both
accurate
metagenomes,
warranting
improvement
existing
approach
better
utilization
HiFi
data.
Here,
we
present
HiFiBGC,
command-line-based
workflow
identify
PacBio
HiFiBGC
leverages
an
ensemble
assemblies
three
HiFi-tailored
assemblers
reads
not
represented
assemblies.
Based
on
our
analyses
four
metagenomic
datasets
different
environments,
show
that
identifies,
average,
78%
more
than
top-performing
single-assembler-based
method.
This
increase
is
due
HiFiBGC's
assembly
approach,
which
improves
recovery
25%,
well
inclusion
mostly
identified
unmapped
reads.
computational
identifying
implemented
majorly
Python
programming
language
manager
Snakemake.
available
GitHub
at
https://github.com/ay-amityadav/HiFiBGC
under
MIT
license.
The
code
related
figures
presented
manuscript
https://github.com/ay-amityadav/HiFiBGC_analyses
.