
bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: Nov. 3, 2024
Abstract Long-read sequencing techniques can sequence transcripts from end to end, greatly improving our ability study the transcription process and enabling more detailed analysis of diseases such as cancer. While several well-established tools exist for long-read transcriptome analysis, most are reference-based and, therefore, limited by reference genome. This prevents organisms without high-quality genomes samples or genes with high variability (e.g., cancer some gene families) being analyzed their full potential. In settings, using a reference-free method is favorable. The computational problem clustering long reads region common origin pipelines. Such enables large datasets be split up roughly family an independent each cluster. There this. However, none those efficiently amount that now generated technologies. We present isONclust3, improved algorithm over isONclust isONclust2, cluster massive longread at level. Like isONclust, IsONclust3 represents set minimizers. unlike other approaches, isONclust3 dynamically updates representation during adding high-confidence minimizers new assigned show yields results higher comparable quality state-of-the-art algorithms but 10-100 times faster on datasets. Also, 256Gb computing node, was only tool could 37 million PacBio reads, which typical throughput recent Revio machine.
Language: Английский