A chromosome-level genome assembly of the Asian house martin implies potential genes associated with the feathered-foot trait DOI Creative Commons
Yuan-Fu Chan, Chia‐Wei Lu, Hao‐Chih Kuo

et al.

G3 Genes Genomes Genetics, Journal Year: 2024, Volume and Issue: 14(6)

Published: April 12, 2024

The presence of feathers is a vital characteristic among birds, yet most modern birds had no feather on their feet. discoveries the hind limbs basal and dinosaurs have sparked an interest in evolutionary origin genetic mechanism feathered However, majority studies investigating genes associated with this trait focused domestic populations. Understanding underpinned feathered-foot development wild still its infancy. Here, we assembled chromosome-level genome Asian house martin (Delichon dasypus) using long-read High Fidelity sequencing approach to initiate search for We employed whole-genome alignment D. dasypus other swallow species identify high-SNP regions chromosomal inversions genome. After filtering out variations unrelated evolution, found six related near regions. also detected three between barn genomes. discussed association wingless/integrated (WNT), bone morphogenetic protein, fibroblast growth factor pathways potential roles development. Future are encouraged utilize explore process avian species. This endeavor will shed light path birds.

Language: Английский

BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA DOI Creative Commons
Lars Gabriel, Tomáš Brůna, Katharina J. Hoff

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: June 12, 2023

Abstract Gene prediction has remained an active area of bioinformatics research for a long time. Still, gene in large eukaryotic genomes presents challenge that must be addressed by new algorithms. The amount and significance the evidence available from transcriptomes proteomes vary across genomes, between genes even along single gene. User-friendly accurate annotation pipelines can cope with such data heterogeneity are needed. previously developed BRAKER1 BRAKER2 use RNA-seq or protein data, respectively, but not both. A further significant performance improvement was made recently released GeneMark-ETP integrating all three types. We here present BRAKER3 pipeline builds on AUGUSTUS improves accuracy using TSEBRA combiner. annotates protein-coding both short-read database, statistical models learned iteratively specifically target genome. benchmarked 11 species under assumed level relatedness proteome to proteomes. outperformed BRAKER2. average transcript-level F1-score increased ∼ 20 percentage points average, while difference most pronounced withlarge complex genomes. also other existing tools, MAKER2, Funannotate FINDER. code is GitHub as ready-to-run Docker container execution Singularity. Overall, accurate, easy-to-use tool genome annotation.

Language: Английский

Citations

126

Galba: genome annotation with miniprot and AUGUSTUS DOI Creative Commons
Tomáš Brůna, Heng Li, Joseph Guhlin

et al.

BMC Bioinformatics, Journal Year: 2023, Volume and Issue: 24(1)

Published: Aug. 31, 2023

The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation protein-coding genes. In addition, no transcriptome data is for some genomes.

Language: Английский

Citations

34

Near chromosome-level and highly repetitive genome assembly of the snake pipefish Entelurus aequoreus (Syngnathiformes: Syngnathidae) DOI Creative Commons
Magnus Wolf, Bruno Lopes da Silva Ferrette, Raphael T. F. Coimbra

et al.

Gigabyte, Journal Year: 2024, Volume and Issue: 2024, P. 1 - 13

Published: Jan. 11, 2024

The snake pipefish,

Language: Английский

Citations

7

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling DOI Creative Commons

Andre Cornman,

Jacob West-Roberts, Antônio Pedro Camargo

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 17, 2024

Abstract Biological language model performance depends heavily on pretraining data quality, diversity, and size. While metagenomic datasets feature enormous biological their utilization as has been limited due to challenges in accessibility, quality filtering deduplication. Here, we present the Open MetaGenomic (OMG) corpus, a genomic dataset totalling 3.1T base pairs 3.3B protein coding sequences, obtained by combining two largest repositories (JGI’s IMG EMBL’s MGnify). We first document composition of describe steps taken remove poor data. make OMG corpus available mixed-modality sequence that represents multi-gene encoding sequences with translated amino acids for nucleic intergenic sequences. train (gLM2) leverages context information learn robust functional representations, well coevolutionary signals protein-protein interfaces regulatory syntax. Furthermore, show deduplication embedding space can be used balance demonstrating improved downstream tasks. The is publicly hosted Hugging Face Hub at https://huggingface.co/datasets/tattabio/OMG gLM2 https://huggingface.co/tattabio/gLM2_650M .

Language: Английский

Citations

6

Near-chromosomal-level genome of the red palm weevil (Rhynchophorus ferrugineus), a potential resource for genome-based pest control DOI Creative Commons

Naganeeswaran Sudalaimuthuasari,

Biduth Kundu, Khaled M. Hazzouri

et al.

Scientific Data, Journal Year: 2024, Volume and Issue: 11(1)

Published: Jan. 6, 2024

Abstract The red palm weevil (RPW) is a highly destructive pest that mainly affects palms, particularly date palms ( Phoenix dactylifera ), in the Arabian Gulf region. In this study, we present near-chromosomal-level genome assembly of RPW using combination PacBio HiFi and Dovetail Omini-C reads. final around 779 Mb size, with an N50 ~43 Mb, consistent our previous flow cytometry estimates. completeness was confirmed through BUSCO analysis, which indicates presence 99.5% single copy orthologous genes. annotation identified total 29,666 protein-coding, 1,091 tRNA 543 rRNA Overall, proposed significantly superior to existing assemblies terms contiguity, integrity, completeness.

Language: Английский

Citations

4

A chromosome-level genome assembly of Meteorus pulchricornis Wesmael (Hymenoptera: Braconidae) DOI Creative Commons
Shiji Tian, Ruizhong Yuan,

Xingzhou Ma

et al.

Scientific Data, Journal Year: 2025, Volume and Issue: 12(1)

Published: Jan. 21, 2025

Meteorus pulchricornis Wesmael (Hymenoptera: Braconidae) is an important parasitoid of lepidopteran insects. So far, only three scaffold-level genomes have been published for the genus Meteorus. In this study, we present a high-quality, chromosome-level genome assembly M. pulchricornis, characterized by high accuracy and contiguity. This was achieved using Oxford Nanopore Technologies long-read, MGI-SEQ short-read, Hi-C sequencing methods. The final 158.5 Mb in size, with 153.8 (97.03%) assigned to ten pseudochromosomes. scaffold N50 length reached 17.51 Mb, complete Benchmarking Universal Single-Copy Orthologs (BUSCO) score 99.3%. contains 28.29 repetitive elements, accounting 18.39% total size. We identified 12,342 protein-coding genes, which 12,308 genes were annotated functionally. Our investigation into gene family evolution showed that 563 families expanded, 1,739 contracted, 58 underwent rapid evolution. high-quality report here advantageous further research on wasps provides foundational data resource natural enemy studies.

Language: Английский

Citations

0

Insights into adhesive and neuronal cell populations of the chaetognath Spadella cephaloptera using a single-nuclei transcriptomic atlas and genomic resources DOI Open Access

Cristian Camilo Barrera Grijalba,

June F. Ordoñez, Juan D. Montenegro

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 31, 2025

Abstract To cope with extreme environmental conditions diverse marine species have developed mechanisms that allow them to permanently or temporarily attach substrates. In the intertidal zone of habitats, where tidal ranges and currents may drift organisms away from their habitat, temporary adhesive systems such as one inherent arrow worm Spadella cephaloptera (Chaetognatha) constitute an essential trait for survival this taxon. The underlying molecular mechanism system has not been described yet, existing morphological information is limited adults. Furthermore, a relationship between nervous attachment in S. remains be demonstrated. study, single-nuclei sequencing hatchlings was performed, using reference newly sequenced assembled genome identify transcriptomic profiles cells mediating attachment, neuronal populations, main cell types chaetognath hatchlings. Our findings, supported by previous studies, suggest evolved convergently those other metazoans. Moreover, were identified ventral nerve center multiple ciliated previously anatomical observations validated. Ongoing in-depth investigation these data, together datasets developmental stages, will provide further insights into evolutionary origins unique body plan.

Language: Английский

Citations

0

Chromosome-level genome sequencing and assembly of the parasitoid wasp Leptopilina myrica DOI Creative Commons
Zhi Dong,

Zixuan Xu,

Junwei Zhang

et al.

Scientific Data, Journal Year: 2025, Volume and Issue: 12(1)

Published: Feb. 8, 2025

Leptopilina wasps are crucial for biological pest control, particularly against the globally emerging Drosophila suzukii. Despite their ecological significance, genomic basis of host selection and parasitism in this genus remains underexplored. In study, we assembled a high-quality, chromosome-level genome myrica, species collected Taizhou, Zhejiang Province, China. We employed combination PacBio long-read sequencing, Illumina short-read Hi-C technology to produce assembly approximately 462.30 Mb, with scaffold N50 47.32 Mb contig 4.07 Mb. By comparing protein-coding genes L. myrica those other Hymenoptera species, gained insights into evolutionary history parasitoid wasps. This high-quality will provide foundation future research on genetic functional traits wasps, shedding light dynamics host-parasite interactions. The provides valuable resource studies interactions wasp biology.

Language: Английский

Citations

0

Dynamic evolution of a sex-linked region DOI Creative Commons
Xiaomeng Mao, Nima Rafati, Christian Tellgren‐Roth

et al.

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 10, 2025

Abstract Background Sex chromosomes often evolve exceptionally fast and degenerate after recombination arrest. However, the underlying evolutionary processes are under persistent debate, particularly whether or not arrest evolves in a stepwise manner how switches sex determination genes contribute to chromosome evolution. Here, we study dioecious plant genus Salix with high turnover of chromosomes. Results We identified Z W sex-linked regions (~ 8 Mb) on 15 dwarf willow Salix herbacea using new haplotype-resolved assembly. The region harboured large (5 embedded inversion. Analyses synteny other species, sequence divergence between degeneration suggest that inversion recently incorporated pseudoautosomal sequences into region, extending its length nearly three-fold. W-hemizygous exclusively contained seven pairs inverted partial repeats male essential floral identity gene PISTILLATA, suggesting possible PISTILLATA suppression mechanism by interfering RNA females. Such pseudogenes were also found species ZW but those XY determination. Conclusions Our provides rare compelling direct support for long-standing theory reduction mediated inversions suggests Salicaceae family is associated switch gene.

Language: Английский

Citations

0

Functional annotation of eukaryotic genes from sedimentary ancient DNA DOI Creative Commons
Uğur Çabuk, Ulrike Herzschuh, Lars Harms

et al.

Frontiers in Ecology and Evolution, Journal Year: 2025, Volume and Issue: 13

Published: Feb. 19, 2025

Sedimentary ancient DNA (sedaDNA) provides valuable insights into past ecosystems, yet its functional diversity has remained unexplored due to potential limitations in gene annotation for short-read data. Eukaryotes, especially, are typically underrepresented and have low coverage complex metagenomic datasets from sediments. In this study, we evaluate the of eukaryotic sedimentary time-series data covering last 23,000 years. We compared four pipelines (GAPs) that apply Prodigal (ProkGAP) MetaEuk (EukGAP) with without taxonomic pre-classification. identify ProkGAP as pipeline which recovers largest catalog 6,568,483 genes highest number (5,895 unique KEGG orthologs). Our findings show ProkGAP, originally invented prokaryotic prediction, yields share among all GAPs tested. At same time, it allows analysis functions parallel predicts most diversity. Interestingly, our size an increasing trend towards recent times indicating a more community during Holocene. However, limited by incomplete reference databases, hamper link between taxonomic-functional relationships when considering lower levels. Future research on prediction short read sedaDNA should focus expanding databases sequencing depth explore composition ecosystems their environmental change.

Language: Английский

Citations

0