The
presence
of
feathers
is
a
vital
characteristic
among
birds,
yet
most
modern
birds
had
no
feather
on
their
feet.
discoveries
the
hind
limbs
basal
and
dinosaurs
have
sparked
an
interest
in
evolutionary
origin
genetic
mechanism
feathered
However,
majority
studies
investigating
genes
associated
with
this
trait
focused
domestic
populations.
Understanding
underpinned
feathered-foot
development
wild
still
its
infancy.
Here,
we
assembled
chromosome-level
genome
Asian
house
martin
(Delichon
dasypus)
using
long-read
High
Fidelity
sequencing
approach
to
initiate
search
for
We
employed
whole-genome
alignment
D.
dasypus
other
swallow
species
identify
high-SNP
regions
chromosomal
inversions
genome.
After
filtering
out
variations
unrelated
evolution,
found
six
related
near
regions.
also
detected
three
between
barn
genomes.
discussed
association
wingless/integrated
(WNT),
bone
morphogenetic
protein,
fibroblast
growth
factor
pathways
potential
roles
development.
Future
are
encouraged
utilize
explore
process
avian
species.
This
endeavor
will
shed
light
path
birds.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 12, 2023
Abstract
Gene
prediction
has
remained
an
active
area
of
bioinformatics
research
for
a
long
time.
Still,
gene
in
large
eukaryotic
genomes
presents
challenge
that
must
be
addressed
by
new
algorithms.
The
amount
and
significance
the
evidence
available
from
transcriptomes
proteomes
vary
across
genomes,
between
genes
even
along
single
gene.
User-friendly
accurate
annotation
pipelines
can
cope
with
such
data
heterogeneity
are
needed.
previously
developed
BRAKER1
BRAKER2
use
RNA-seq
or
protein
data,
respectively,
but
not
both.
A
further
significant
performance
improvement
was
made
recently
released
GeneMark-ETP
integrating
all
three
types.
We
here
present
BRAKER3
pipeline
builds
on
AUGUSTUS
improves
accuracy
using
TSEBRA
combiner.
annotates
protein-coding
both
short-read
database,
statistical
models
learned
iteratively
specifically
target
genome.
benchmarked
11
species
under
assumed
level
relatedness
proteome
to
proteomes.
outperformed
BRAKER2.
average
transcript-level
F1-score
increased
∼
20
percentage
points
average,
while
difference
most
pronounced
withlarge
complex
genomes.
also
other
existing
tools,
MAKER2,
Funannotate
FINDER.
code
is
GitHub
as
ready-to-run
Docker
container
execution
Singularity.
Overall,
accurate,
easy-to-use
tool
genome
annotation.
BMC Bioinformatics,
Journal Year:
2023,
Volume and Issue:
24(1)
Published: Aug. 31, 2023
The
Earth
Biogenome
Project
has
rapidly
increased
the
number
of
available
eukaryotic
genomes,
but
most
released
genomes
continue
to
lack
annotation
protein-coding
genes.
In
addition,
no
transcriptome
data
is
for
some
genomes.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 17, 2024
Abstract
Biological
language
model
performance
depends
heavily
on
pretraining
data
quality,
diversity,
and
size.
While
metagenomic
datasets
feature
enormous
biological
their
utilization
as
has
been
limited
due
to
challenges
in
accessibility,
quality
filtering
deduplication.
Here,
we
present
the
Open
MetaGenomic
(OMG)
corpus,
a
genomic
dataset
totalling
3.1T
base
pairs
3.3B
protein
coding
sequences,
obtained
by
combining
two
largest
repositories
(JGI’s
IMG
EMBL’s
MGnify).
We
first
document
composition
of
describe
steps
taken
remove
poor
data.
make
OMG
corpus
available
mixed-modality
sequence
that
represents
multi-gene
encoding
sequences
with
translated
amino
acids
for
nucleic
intergenic
sequences.
train
(gLM2)
leverages
context
information
learn
robust
functional
representations,
well
coevolutionary
signals
protein-protein
interfaces
regulatory
syntax.
Furthermore,
show
deduplication
embedding
space
can
be
used
balance
demonstrating
improved
downstream
tasks.
The
is
publicly
hosted
Hugging
Face
Hub
at
https://huggingface.co/datasets/tattabio/OMG
gLM2
https://huggingface.co/tattabio/gLM2_650M
.
Scientific Data,
Journal Year:
2024,
Volume and Issue:
11(1)
Published: Jan. 6, 2024
Abstract
The
red
palm
weevil
(RPW)
is
a
highly
destructive
pest
that
mainly
affects
palms,
particularly
date
palms
(
Phoenix
dactylifera
),
in
the
Arabian
Gulf
region.
In
this
study,
we
present
near-chromosomal-level
genome
assembly
of
RPW
using
combination
PacBio
HiFi
and
Dovetail
Omini-C
reads.
final
around
779
Mb
size,
with
an
N50
~43
Mb,
consistent
our
previous
flow
cytometry
estimates.
completeness
was
confirmed
through
BUSCO
analysis,
which
indicates
presence
99.5%
single
copy
orthologous
genes.
annotation
identified
total
29,666
protein-coding,
1,091
tRNA
543
rRNA
Overall,
proposed
significantly
superior
to
existing
assemblies
terms
contiguity,
integrity,
completeness.
Scientific Data,
Journal Year:
2025,
Volume and Issue:
12(1)
Published: Jan. 21, 2025
Meteorus
pulchricornis
Wesmael
(Hymenoptera:
Braconidae)
is
an
important
parasitoid
of
lepidopteran
insects.
So
far,
only
three
scaffold-level
genomes
have
been
published
for
the
genus
Meteorus.
In
this
study,
we
present
a
high-quality,
chromosome-level
genome
assembly
M.
pulchricornis,
characterized
by
high
accuracy
and
contiguity.
This
was
achieved
using
Oxford
Nanopore
Technologies
long-read,
MGI-SEQ
short-read,
Hi-C
sequencing
methods.
The
final
158.5
Mb
in
size,
with
153.8
(97.03%)
assigned
to
ten
pseudochromosomes.
scaffold
N50
length
reached
17.51
Mb,
complete
Benchmarking
Universal
Single-Copy
Orthologs
(BUSCO)
score
99.3%.
contains
28.29
repetitive
elements,
accounting
18.39%
total
size.
We
identified
12,342
protein-coding
genes,
which
12,308
genes
were
annotated
functionally.
Our
investigation
into
gene
family
evolution
showed
that
563
families
expanded,
1,739
contracted,
58
underwent
rapid
evolution.
high-quality
report
here
advantageous
further
research
on
wasps
provides
foundational
data
resource
natural
enemy
studies.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 31, 2025
Abstract
To
cope
with
extreme
environmental
conditions
diverse
marine
species
have
developed
mechanisms
that
allow
them
to
permanently
or
temporarily
attach
substrates.
In
the
intertidal
zone
of
habitats,
where
tidal
ranges
and
currents
may
drift
organisms
away
from
their
habitat,
temporary
adhesive
systems
such
as
one
inherent
arrow
worm
Spadella
cephaloptera
(Chaetognatha)
constitute
an
essential
trait
for
survival
this
taxon.
The
underlying
molecular
mechanism
system
has
not
been
described
yet,
existing
morphological
information
is
limited
adults.
Furthermore,
a
relationship
between
nervous
attachment
in
S.
remains
be
demonstrated.
study,
single-nuclei
sequencing
hatchlings
was
performed,
using
reference
newly
sequenced
assembled
genome
identify
transcriptomic
profiles
cells
mediating
attachment,
neuronal
populations,
main
cell
types
chaetognath
hatchlings.
Our
findings,
supported
by
previous
studies,
suggest
evolved
convergently
those
other
metazoans.
Moreover,
were
identified
ventral
nerve
center
multiple
ciliated
previously
anatomical
observations
validated.
Ongoing
in-depth
investigation
these
data,
together
datasets
developmental
stages,
will
provide
further
insights
into
evolutionary
origins
unique
body
plan.
Scientific Data,
Journal Year:
2025,
Volume and Issue:
12(1)
Published: Feb. 8, 2025
Leptopilina
wasps
are
crucial
for
biological
pest
control,
particularly
against
the
globally
emerging
Drosophila
suzukii.
Despite
their
ecological
significance,
genomic
basis
of
host
selection
and
parasitism
in
this
genus
remains
underexplored.
In
study,
we
assembled
a
high-quality,
chromosome-level
genome
myrica,
species
collected
Taizhou,
Zhejiang
Province,
China.
We
employed
combination
PacBio
long-read
sequencing,
Illumina
short-read
Hi-C
technology
to
produce
assembly
approximately
462.30
Mb,
with
scaffold
N50
47.32
Mb
contig
4.07
Mb.
By
comparing
protein-coding
genes
L.
myrica
those
other
Hymenoptera
species,
gained
insights
into
evolutionary
history
parasitoid
wasps.
This
high-quality
will
provide
foundation
future
research
on
genetic
functional
traits
wasps,
shedding
light
dynamics
host-parasite
interactions.
The
provides
valuable
resource
studies
interactions
wasp
biology.
Research Square (Research Square),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 10, 2025
Abstract
Background
Sex
chromosomes
often
evolve
exceptionally
fast
and
degenerate
after
recombination
arrest.
However,
the
underlying
evolutionary
processes
are
under
persistent
debate,
particularly
whether
or
not
arrest
evolves
in
a
stepwise
manner
how
switches
sex
determination
genes
contribute
to
chromosome
evolution.
Here,
we
study
dioecious
plant
genus
Salix
with
high
turnover
of
chromosomes.
Results
We
identified
Z
W
sex-linked
regions
(~
8
Mb)
on
15
dwarf
willow
Salix
herbacea
using
new
haplotype-resolved
assembly.
The
region
harboured
large
(5
embedded
inversion.
Analyses
synteny
other
species,
sequence
divergence
between
degeneration
suggest
that
inversion
recently
incorporated
pseudoautosomal
sequences
into
region,
extending
its
length
nearly
three-fold.
W-hemizygous
exclusively
contained
seven
pairs
inverted
partial
repeats
male
essential
floral
identity
gene
PISTILLATA,
suggesting
possible
PISTILLATA
suppression
mechanism
by
interfering
RNA
females.
Such
pseudogenes
were
also
found
species
ZW
but
those
XY
determination.
Conclusions
Our
provides
rare
compelling
direct
support
for
long-standing
theory
reduction
mediated
inversions
suggests
Salicaceae
family
is
associated
switch
gene.
Frontiers in Ecology and Evolution,
Journal Year:
2025,
Volume and Issue:
13
Published: Feb. 19, 2025
Sedimentary
ancient
DNA
(sedaDNA)
provides
valuable
insights
into
past
ecosystems,
yet
its
functional
diversity
has
remained
unexplored
due
to
potential
limitations
in
gene
annotation
for
short-read
data.
Eukaryotes,
especially,
are
typically
underrepresented
and
have
low
coverage
complex
metagenomic
datasets
from
sediments.
In
this
study,
we
evaluate
the
of
eukaryotic
sedimentary
time-series
data
covering
last
23,000
years.
We
compared
four
pipelines
(GAPs)
that
apply
Prodigal
(ProkGAP)
MetaEuk
(EukGAP)
with
without
taxonomic
pre-classification.
identify
ProkGAP
as
pipeline
which
recovers
largest
catalog
6,568,483
genes
highest
number
(5,895
unique
KEGG
orthologs).
Our
findings
show
ProkGAP,
originally
invented
prokaryotic
prediction,
yields
share
among
all
GAPs
tested.
At
same
time,
it
allows
analysis
functions
parallel
predicts
most
diversity.
Interestingly,
our
size
an
increasing
trend
towards
recent
times
indicating
a
more
community
during
Holocene.
However,
limited
by
incomplete
reference
databases,
hamper
link
between
taxonomic-functional
relationships
when
considering
lower
levels.
Future
research
on
prediction
short
read
sedaDNA
should
focus
expanding
databases
sequencing
depth
explore
composition
ecosystems
their
environmental
change.