Towards the next generation of species delimitation methods: an overview of Machine Learning applications DOI Creative Commons

Matheus Salles,

Fabrícius Maia Chaves Bicalho Domingos

Опубликована: Дек. 7, 2023

Species delimitation is the process of distinguishing between populations same species and distinct a particular group organisms. Various methods exist for inferring limits, with most them being rooted in Coalescent Theory. Their primary goal to identify independently evolving lineages that should represent separate species. models have improved by enabling explicit testing hypotheses regarding evolutionary independence among lineages. However, they some limitations, especially complex scenarios, large datasets, varying genetic data types. In this context, machine learning (ML) can be considered as promising analytical tool, clearly provides an effective way explore dataset structures when species-level divergences are hypothesised. review, we examine use ML provide overview critical appraisal existing workflows. We also simple explanations on how main types approaches operate, which help researchers students interested field. While current designed infer limits analytically powerful, present specific limitations not definitive alternatives traditional coalescent delimitation. For instance, there clear utilisation simulated data, supervised deep approaches, type representation used each approach. then discuss strengths weaknesses pipelines, propose best practices delimitation, offer insights into potential future applications. Generative adversarial networks domain adaptation techniques, could partially address misspecification issue related simulating data. Besides, integrating hypothesis process, alongside available coalescent-based methods, enable more comprehensive exploration parameters, improving accuracy biological interpretability analyses. Additionally, suggest guidelines enhancing accessibility, effectiveness, objectivity processes, aiming transformative perspective subject.

Язык: Английский

Recent Advances in Generative Adversarial Networks for Gene Expression Data: A Comprehensive Review DOI Creative Commons
Minhyeok Lee

Mathematics, Год журнала: 2023, Номер 11(14), С. 3055 - 3055

Опубликована: Июль 10, 2023

The evolving field of generative artificial intelligence (GenAI), particularly deep learning, is revolutionizing a host scientific and technological sectors. One the pivotal innovations within this domain emergence adversarial networks (GANs). These unique models have shown remarkable capabilities in crafting synthetic data, closely emulating real-world distributions. Notably, their application to gene expression data systems fascinating rapidly growing focus area. Restrictions related ethical logistical issues often limit size, diversity, data-gathering speed data. Herein lies potential GANs, as they are capable producing offering solution these limitations. This review provides thorough analysis most recent advancements at innovative crossroads GANs specifically during period from 2019 2023. In context fast-paced progress learning technologies, accurate inclusive reviews current practices critical guiding subsequent research efforts, sharing knowledge, catalyzing continual growth discipline. review, through highlighting studies seminal works, serves key resource for academics professionals alike, aiding journey compelling confluence systems.

Язык: Английский

Процитировано

28

Harnessing deep learning for population genetic inference DOI
Xin Huang, Aigerim Rymbekova, Olga Dolgova

и другие.

Nature Reviews Genetics, Год журнала: 2023, Номер 25(1), С. 61 - 78

Опубликована: Сен. 4, 2023

Язык: Английский

Процитировано

27

Phylogenetic inference using generative adversarial networks DOI Creative Commons
Megan L. Smith, Matthew W. Hahn

Bioinformatics, Год журнала: 2023, Номер 39(9)

Опубликована: Сен. 1, 2023

The application of machine learning approaches in phylogenetics has been impeded by the vast model space associated with inference. Supervised require data from across this to train models. Because this, previous have typically limited inferring relationships among unrooted quartets taxa, where there are only three possible topologies. Here, we explore potential generative adversarial networks (GANs) address limitation. GANs consist a generator and discriminator: at each step, aims create that is similar real data, while discriminator attempts distinguish generated data. By using an evolutionary as generator, use make inferences. Since new can be considered iteration, heuristic searches complex spaces possible. Thus, offer solution challenges applying phylogenetics.

Язык: Английский

Процитировано

21

Interpreting generative adversarial networks to infer natural selection from genetic data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

и другие.

Genetics, Год журнала: 2024, Номер 226(4)

Опубликована: Фев. 22, 2024

Abstract Understanding natural selection and other forms of non-neutrality is a major focus for the use machine learning in population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically require slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect local evolutionary processes requires relatively few during training. We build upon generative adversarial network simulate This consists generator (fitted model), discriminator (convolutional network) predicts whether genomic region or fake. As can only generate data under processes, regions recognizes as having probability being “real” do not fit model therefore candidates targets selection. To incentivize identification specific mode fine-tune small number custom non-neutral show this has power various simulations, finds positive identified by state-of-the-art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics.

Язык: Английский

Процитировано

6

Tree sequences as a general-purpose tool for population genetic inference DOI Creative Commons
Logan S. Whitehouse,

Dylan D. Ray,

Daniel R. Schrider

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 21, 2024

ABSTRACT As population genetics data increases in size new methods have been developed to store genetic information efficient ways, such as tree sequences. These structures are computationally and storage efficient, but not interchangeable with existing used for many inference methodologies the use of convolutional neural networks (CNNs) applied alignments. To better utilize these we propose implement a graph network (GCN) directly learn from sequence topology node data, allowing applications without an intermediate step converting sequences alignment format. We then compare our approach standard CNN approaches on set previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression demographic model parameter inference. show that can be learned using GCN perform well common accuracies roughly matching or even exceeding CNN-based method. become more widely research foresee developments optimizations this work provide foundation moving forward.

Язык: Английский

Процитировано

4

Deep convolutional and conditional neural networks for large-scale genomic data generation DOI Creative Commons
Burak Yelmen, Aurélien Decelle,

Leila Lea Boulos

и другие.

PLoS Computational Biology, Год журнала: 2023, Номер 19(10), С. e1011584 - e1011584

Опубликована: Окт. 30, 2023

Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from characterization to generation segments and functional sequences. In our previous study, we demonstrated that adversarial networks (GANs) restricted Boltzmann machines (RBMs) can be used create novel high-quality artificial genomes (AGs) which preserve complex characteristics real such as population structure, linkage disequilibrium selection signals. However, a major drawback these is scalability, since large feature space genome-wide increases computational complexity vastly. To address this issue, implemented convolutional Wasserstein GAN (WGAN) model along conditional RBM (CRBM) framework generating AGs high SNP number. These implicitly learn varying landscape haplotypic structure order capture correlation patterns genome generate wide diversity plausible haplotypes. We performed comparative analyses assess both quality generated haplotypes amount possible privacy leakage training data. As importance genetic becomes more prevalent, need effective protection measures increases. neural possess many without substantial dataset. near future, further improvements haplotype preservation, large-scale databases assembled provide easily accessible surrogates databases, allowing researchers conduct studies diverse within safe ethical terms donor privacy.

Язык: Английский

Процитировано

7

Latent generative modeling of long genetic sequences with GANs DOI Creative Commons

Antoine Szatkownik,

Cyril Furtlehner, Guillaume Charpiat

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Авг. 7, 2024

Abstract Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial silico genomes. Following body of work on Artificial Genomes (AGs) created various models trained raw genomic input, we propose conceptually different approach address the issues scalability and complexity very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), Generative Adversarial Network (GAN) learning this reduced space. Using framework, generated proxy datasets for diverse human populations around world. We compared quality AGs our established report improvements capturing population structure, linkage disequilibrium, metrics related privacy leakage. Furthermore, developed frugal model orders magnitude fewer parameters comparable performance larger models. For assessment, also implemented new evaluation metric based information theory measure local haplotypic diversity, showing that yield higher diversity than real In addition, addressed shrinkage issue associated PCA modeling, examined its relation nearest neighbor resemblance metric, proposed resolution. Finally, evaluated effect binarization methods output AGs.

Язык: Английский

Процитировано

2

Interpreting Generative Adversarial Networks to Infer Natural Selection from Genetic Data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Март 8, 2023

Understanding natural selection in humans and other species is a major focus for the use of machine learning population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically requires slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Mismatches between data test can lead incorrect inference. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect relatively few during training. We Generative Adversarial Network (GAN) simulate The resulting GAN consists generator (fitted model) discriminator (convolutional network). For genomic region, predicts whether "real" or "fake" sense could have been by generator. As includes regions experienced cannot produce such regions, probability being likely selection. To further incentivize this behavior, "fine-tune" small number show has power simulations, finds under identified state-of-the art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics. In summary, our novel, efficient, powerful way

Язык: Английский

Процитировано

4

Training Generative Models From Privatized Data via Entropic Optimal Transport DOI
Daria Reshetova, Weining Chen, Ayfer Özgür

и другие.

IEEE Journal on Selected Areas in Information Theory, Год журнала: 2024, Номер 5, С. 221 - 235

Опубликована: Янв. 1, 2024

Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop framework training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport – popular in the literature has often been leveraged its computational benefits enables generator to learn raw (unprivatized) distribution even though it only access samples. prove at same time leads fast statistical convergence parametric rate. This shows uniquely mitigation both effects privatization noise and curse dimensionality convergence. provide experimental evidence support efficacy our practice.

Язык: Английский

Процитировано

1

Training Generative Models from Privatized Data via Entropic Optimal Transport DOI
Daria Reshetova, Weining Chen, Ayfer Özgür

и другие.

2022 IEEE International Symposium on Information Theory (ISIT), Год журнала: 2024, Номер unknown, С. 605 - 610

Опубликована: Июль 7, 2024

Язык: Английский

Процитировано

1