ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis DOI Open Access
Can Fırtına, Kamlesh Pillai, Gurpreet S. Kalsi

et al.

ACM Transactions on Architecture and Code Optimization, Journal Year: 2023, Volume and Issue: 21(1), P. 1 - 29

Published: Dec. 28, 2023

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences represented graph structures, where states and edges capture modifications (i.e., insertions, deletions, substitutions) by assigning probabilities them. These subsequently used compute the similarity score a sequence pHMM graph. The Baum-Welch algorithm, prevalent highly accurate method, utilizes these optimize scores. Accurate computation of is essential for correct identification similarities. However, algorithm computationally intensive, existing solutions offer either software-only hardware-only approaches with fixed designs. When we analyze state-of-the-art works, an urgent need flexible, high-performance, energy-efficient hardware-software co-design address major inefficiencies pHMMs. We introduce ApHMM , first flexible acceleration framework designed significantly reduce both computational energy overheads associated employs tackle (1) designing hardware accommodate designs, (2) exploiting predictable data dependency patterns through on-chip memory memoization techniques, (3) rapidly filtering out unnecessary computations using hardware-based filter, (4) minimizing redundant computations. achieves substantial speedups 15.55×–260.03×, 1.83×–5.34×, 27.97× when compared CPU, GPU, FPGA implementations respectively. outperforms CPU three key applications: error correction, family search, multiple alignment, 1.29×–59.94×, 1.03×–1.75×, 1.03×–1.95×, respectively, while improving their efficiency 64.24×–115.46×, 1.75×, 1.96×.

Language: Английский

A Hitchhiker's Guide to long-read genomic analysis DOI
Medhat Mahmoud, Daniel Paiva Agustinho, Fritz J. Sedlazeck

et al.

Genome Research, Journal Year: 2025, Volume and Issue: 35(4), P. 545 - 558

Published: April 1, 2025

Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering hidden and complex regions of genome. Significant cost efficiency, scalability, accuracy advancements have driven this evolution. Concurrently, novel analytical methods emerged to harness full potential long reads. These enabled milestones such as first fully completed human genome, enhanced identification understanding genomic variants, deeper insights interplay between epigenetics variation. This mini-review provides comprehensive overview latest developments in DNA analysis, encompassing reference-based de novo assembly approaches. We explore entire workflow, from initial data processing variant calling annotation, focusing on how these improve our ability interpret wide array variants. Additionally, we discuss current challenges, limitations, future directions field, offering detailed examination state-of-the-art bioinformatics sequencing.

Language: Английский

Citations

1

FPGA-based accelerator for adaptive banded event alignment in nanopore sequencing data analysis DOI Creative Commons
Yilin Feng, Zheyu Li, Gulsum Gudukbay Akbulut

et al.

BMC Bioinformatics, Journal Year: 2025, Volume and Issue: 26(1)

Published: March 17, 2025

Adaptive Banded Event Alignment (ABEA) stands as a critical algorithmic component in sequence polishing and DNA methylation detection, employing dynamic programming to align raw Nanopore signal with reference reads. Motivated by the observation that, compared CPUs GPUs, cutting-edge FPGAs demonstrate—in certain cases—superior performance at reduced cost energy consumption, this paper presents an efficient FPGA-based accelerator for ABEA, leveraging inherent high parallelism sequential access pattern within ABEA. Our proposed ABEA significantly enhances original CPU-based implementation Nanopolish well state-of-art acceleration on GPU FPGA platforms. Specifically, targeting Xilinx VU9P, our achieves average throughput speedup of 10.05 $$\times$$ over CPU-only implementation, 1.81 only 7.2% energy, 10.11 existing accelerator. work demonstrates that intensive genome analysis can benefit from FPGAs, offering improvements both consumption.

Language: Английский

Citations

0

TargetCall: eliminating the wasted computation in basecalling via pre-basecalling filtering DOI Creative Commons
Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser

et al.

Frontiers in Genetics, Journal Year: 2024, Volume and Issue: 15

Published: Oct. 28, 2024

Basecalling is an essential step in nanopore sequencing analysis where the raw signals of sequencers are converted into nucleotide sequences, that is, reads. State-of-the-art basecallers use complex deep learning models to achieve high basecalling accuracy. This makes computationally inefficient and memory-hungry, bottlenecking entire genome pipeline. However, for many applications, most reads do not match reference interest (i.e., target reference) thus discarded later steps genomics pipeline, wasting computation. To overcome this issue, we propose TargetCall, first pre-basecalling filter eliminate wasted computation basecalling. TargetCall’s key idea discard will off-target reads) prior TargetCall consists two main components: (1) LightCall, a lightweight neural network basecaller produces noisy reads, (2) Similarity Check, which labels each these as on-target or by matching them reference. Our thorough experimental evaluations show 1) improves end-to-end runtime performance state-of-the-art 3.31× while maintaining id="m2">(98.88%) recall keeping 2) maintains accuracy downstream analysis, 3) achieves better performance, throughput, recall, precision, generality than works. available at https://github.com/CMU-SAFARI/TargetCall .

Language: Английский

Citations

2

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis DOI Open Access
Can Fırtına, Kamlesh Pillai, Gurpreet S. Kalsi

et al.

ACM Transactions on Architecture and Code Optimization, Journal Year: 2023, Volume and Issue: 21(1), P. 1 - 29

Published: Dec. 28, 2023

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences represented graph structures, where states and edges capture modifications (i.e., insertions, deletions, substitutions) by assigning probabilities them. These subsequently used compute the similarity score a sequence pHMM graph. The Baum-Welch algorithm, prevalent highly accurate method, utilizes these optimize scores. Accurate computation of is essential for correct identification similarities. However, algorithm computationally intensive, existing solutions offer either software-only hardware-only approaches with fixed designs. When we analyze state-of-the-art works, an urgent need flexible, high-performance, energy-efficient hardware-software co-design address major inefficiencies pHMMs. We introduce ApHMM , first flexible acceleration framework designed significantly reduce both computational energy overheads associated employs tackle (1) designing hardware accommodate designs, (2) exploiting predictable data dependency patterns through on-chip memory memoization techniques, (3) rapidly filtering out unnecessary computations using hardware-based filter, (4) minimizing redundant computations. achieves substantial speedups 15.55×–260.03×, 1.83×–5.34×, 27.97× when compared CPU, GPU, FPGA implementations respectively. outperforms CPU three key applications: error correction, family search, multiple alignment, 1.29×–59.94×, 1.03×–1.75×, 1.03×–1.95×, respectively, while improving their efficiency 64.24×–115.46×, 1.75×, 1.96×.

Language: Английский

Citations

4