Strategies to improve selection compared to selection based on estimated breeding values DOI Open Access
Torsten Pook, Azadeh Hassanpour, Tobias Niehoff

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

Abstract Background Selection of individuals based on their estimated breeding values aims to maximize response selection the next generation in additive model. However, when aim is not only about short-term population-wide genetic gain but also over multiple generations, an optimal strategy as clear-cut, maintenance diversity may become important factor. This study provides extended comparison existing strategies a unifying testing pipeline using simulation software MoBPS. Results Applying weighting factor SNP effects frequency beneficial allele resulted increase long-term 1.6% after 50 generations while reducing inbreeding rates by 16.2% compared truncation values. this losses 1.2% with break-even point reached 25 generations. In contrast, inclusion average kinship individual top population additional trait index weight 17.5% no and increased gains 4.3% 15.8%, achieving very similar efficiency use optimum contribution selection. Combining management strategies, weights for each optimized evolutionary algorithm scheme 5.1% 37.3% reduced rates. The proposed included contribution, frequency, index, avoiding matings between related individuals, lowering proportion selected individuals. Conclusions combination was shown be far superior any singular method tested study. As efficient methods does necessarily lead comes at extra costs, it critical companies implement such success.

Language: Английский

LucaOne: Generalized Biological Foundation Model with Unified Nucleic Acid and Protein Language DOI Creative Commons
Yong He, Pan Fang, Yongtao Shan

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 14, 2024

In recent years, significant advancements have been observed in the domain of Natural Language Processing(NLP) with introduction pre-trained foundational models, paving way for utilizing similar AI technologies to interpret language biology. this research, we introduce “LucaOne”, a novel model designed integratively learn from genetic and proteomic languages, encapsulating data 169,861 species en-compassing DNA, RNA, proteins. This work illuminates potential creating biological aimed at universal bioinformatics appli-cation. Remarkably, through few-shot learning, efficiently learns central dogma molecular biology demonstrably outperforms com-peting models. Furthermore, tasks requiring inputs proteins, or combination thereof, LucaOne exceeds state-of-the-art performance using streamlined downstream architecture, thereby providing empirical ev-idence innovative perspectives on models comprehend complex systems.

Language: Английский

Citations

4

ProteInfer: deep networks for protein functional inference DOI Creative Commons
Theo Sanderson, Maxwell L. Bileschi, David Belanger

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2021, Volume and Issue: unknown

Published: Sept. 23, 2021

Predicting the function of a protein from its amino acid sequence is long-standing challenge in bioinformatics. Traditional approaches use alignment to compare query either thousands models families or large databases individual sequences. Here we instead employ deep convolutional neural networks directly predict variety functions – EC numbers and GO terms an unaligned sequence. This approach provides precise predictions which complement alignment-based methods, computational efficiency single network permits novel lightweight software interfaces, demonstrate with in-browser graphical interface for prediction all computation performed on user’s personal computer no data uploaded remote servers. Moreover, these place full-length sequences into generalised functional space, facilitating downstream analysis interpretation. To read interactive version this paper, please visit https://google-research.github.io/proteinfer/ Abstract Figure QR code preprint at

Language: Английский

Citations

27

Disease diagnostics using machine learning of immune receptors DOI Open Access
Maxim Zaslavsky, Erin Craig, Jackson Michuda

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: April 28, 2022

Clinical diagnosis typically incorporates physical examination, patient history, and various laboratory tests imaging studies, but makes limited use of the human system's own record antigen exposures encoded by receptors on B cells T cells. We analyzed immune receptor datasets from 593 individuals to develop MAchine Learning for Immunological Diagnosis (Mal-ID) , an interpretive framework screen multiple illnesses simultaneously or precisely test one condition. This approach detects specific infections, autoimmune disorders, vaccine responses, disease severity differences. Human-interpretable features model recapitulate known responses SARS-CoV-2, Influenza, HIV, highlight antigen-specific receptors, reveal distinct characteristics Systemic Lupus Erythematosus Type-1 Diabetes autoreactivity. analysis has broad potential scientific clinical interpretation responses.

Language: Английский

Citations

17

CryoPPP: A Large Expert-Labelled Cryo-EM Image Dataset for Machine Learning Protein Particle Picking DOI Creative Commons
Ashwin Dhakal, Rajan Gyawali, Liguo Wang

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Feb. 22, 2023

Cryo-electron microscopy (cryo-EM) is currently the most powerful technique for determining structures of large protein complexes and assemblies. Picking single-protein particles from cryo-EM micrographs (images) a key step in reconstructing structures. However, widely used template-based particle picking process labor-intensive time-consuming. Though emerging machine learning-based can potentially automate process, its development severely hindered by lack large, high-quality, manually labelled training data. Here, we present CryoPPP, diverse, expert-curated image dataset single analysis to address this bottleneck. It consists 32 non-redundant, representative datasets selected Electron Microscopy Public Image Archive (EMPIAR). includes 9,089 high-resolution (∼300 images per EMPIAR dataset) which coordinates were human experts. The labelling was rigorously validated both 2D class validation 3D density map with gold standard. expected greatly facilitate learning artificial intelligence methods automated picking. data processing scripts are available at https://github.com/BioinfoMachineLearning/cryoppp.

Language: Английский

Citations

10

Strategies to improve selection compared to selection based on estimated breeding values DOI Open Access
Torsten Pook, Azadeh Hassanpour, Tobias Niehoff

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

Abstract Background Selection of individuals based on their estimated breeding values aims to maximize response selection the next generation in additive model. However, when aim is not only about short-term population-wide genetic gain but also over multiple generations, an optimal strategy as clear-cut, maintenance diversity may become important factor. This study provides extended comparison existing strategies a unifying testing pipeline using simulation software MoBPS. Results Applying weighting factor SNP effects frequency beneficial allele resulted increase long-term 1.6% after 50 generations while reducing inbreeding rates by 16.2% compared truncation values. this losses 1.2% with break-even point reached 25 generations. In contrast, inclusion average kinship individual top population additional trait index weight 17.5% no and increased gains 4.3% 15.8%, achieving very similar efficiency use optimum contribution selection. Combining management strategies, weights for each optimized evolutionary algorithm scheme 5.1% 37.3% reduced rates. The proposed included contribution, frequency, index, avoiding matings between related individuals, lowering proportion selected individuals. Conclusions combination was shown be far superior any singular method tested study. As efficient methods does necessarily lead comes at extra costs, it critical companies implement such success.

Language: Английский

Citations

0