Accelerating minimap2 for long-read sequencing applications on modern CPUs

  • Chaisson, MJ et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 101–16 (2019).

    Article Google Scholar

  • Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 171–19 (2016).

    Article Google Scholar

  • Beyter, D. et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet. 53779–886 (2021).

  • Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592737–746 (2021).

    Article Google Scholar

  • De Coster, W., Weissensteiner, MH & Sedlazeck, FJ Towards population-scale long-read sequencing. Nat. Rev. Genet. 22572–587 (2021).

  • Promethion Brochure (Nanophore Technologies, 2021); https://nanoporetech.com/sites/default/files/s3/literature/PromethION-brochure.pdf

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 343094–3100 (2018).

    Article Google Scholar

  • Guo, L., Lau, J., Ruan, Z., Wei, P. & Cong, J. Hardware acceleration of long read pairwise overlapping in genome sequencing: a race between FPGA and GPU. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines 127–135 (IEEE, 2019).

  • Zeni, A. et al. LOGAN: high-performance GPU-based X-drop long-read alignment. In 2020 IEEE International Parallel and Distributed Processing Symposium 462–471 (IEEE, 2020).

  • Feng, Z., Qiu, S., Wang, L. & Luo, Q. Accelerating long read alignment on three processors. In Proc. 48th International Conference on Parallel Processing 1–10 (ACM, 2019).

  • Roberts, M., Hayes, W., Hunt, BR, Mount, SM & Yorke, JA Reducing storage requirements for biological sequence comparison. Bioinformatics 203363–3369 (2004).

    Article Google Scholar

  • Abouelhoda, MI & Ohlebusch, E. Chaining algorithms for multiple genome comparison. J. Discrete Algorithms 3321–341 (2005).

    MathSciNet Article Google Scholar

  • Jain, C., Gibney, D. & Thankachan, SV Co-linear chaining with overlaps and gap costs. Preprint at https://www.biorxiv.org/content/10.1101/2021.02.03.429492v2 (2021).

  • Ho, D. et al. LISA: learned indexes for DNA sequence analysis. Preprint at https://arxiv.org/abs/1910.04728 (2020).

  • Schneider, VA et al. Evaluation of GRCh38 and de novo haploid genome assembly demonstrates the enduring quality of the reference assembly. Genome Res. 27849–864 (2017).

    Article Google Scholar

  • Nurk, S., Koren, S., Rhie, A., Rautiainen, M. et al. The complete sequence of a human genome. Preprint at https://doi.org/10.1101/2021.05.26.445798 (2021).

  • Cheng, H., Concepcion, GT, Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18170–175 (2021).

    Article Google Scholar

  • Payne, A. et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat. Biotechnol. 39442–450 (2021).

    Article Google Scholar

  • Kovaka, S., Fan, Y., Ni, B., Timp, W. & Schatz, MC Targeted nanopore sequencing by real-time mapping of raw electrical signal with uncalled. Nat. Biotechnol. 39431–441 (2021).

    Article Google Scholar

  • Zhang, H. et al. Real-time mapping of nanopore raw signals. Bioinformatics https://doi.org/10.1093/bioinformatics/btab264 (2021).

  • Jain, C., Rhie, A., Hansen, N., Koren, S. & Phillippy, AM A long read mapping method for highly repetitive reference sequences. Preprint at https://www.biorxiv.org/content/10.1101/2020.11.01.363887v1.full (2020).

  • Sedlazeck, FJ et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15th461–468 (2018).

    Article Google Scholar

  • Ren, J. & Chaisson, M. lRA: the long read aligner for sequences and contigs. Preprint at https://doi.org/10.1371/journal.pcbi.1009078 (2020).

  • Kraska, T., Beutel, A., Chi, EH, Dean, J. & Polyzotis, N. The case for learned index structures. In ACM International Conference on Management of Data 489–504 (ACM, 2018).

  • Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R. & Kraska, T. FITing-Tree: a data-aware index structure. In SIGMOD ’19: Proceedings of the 2019 International Conference on Management of Data 1189–1206 (ACM, 2019); https://doi.org/10.1145/3299869.3319860

  • Ferragina, P. & Vinciguerra, G. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB 131162–1175 (2020).

    Google Scholar

  • Ding, J. et al. ALEX: An Updatable Adaptive Learned Index. In SIGMOD ’20: Proceedings of the 2020 International Conference on Management of Data 969-984 (ACM, 2020). https://doi.org/10.1145/3318464.3389711

  • Wu, Y., Yu, J., Tian, ​​Y., Sidle, R. & Barber, R. Designing succinct secondary indexing mechanism by exploiting column correlations. In SIGMOD ’19: Proceedings of the 2019 International Conference on Management of Data 1223–1240 (ACM, 2019). https://doi.org/10.1145/3299869.3319861

  • Kirsche, M., Das, A. & Schatz, MC Sapling: accelerating suffix array queries with learned data models. Bioinformatics 37744–749 (2021).

    Article Google Scholar

  • Marcus, R. et al. Benchmarking learned indexes. In PVLDB Vol. 14, 1–13 (2021).

  • Marcus, R., Zhang, E. & Kraska, T. CDFShop: exploring and optimizing learned index structures. In SIGMOD ’20: Proc. 2020 ACM SIGMOD International Conference on Management of Data 2789–2792 (ACM, 2020); https://doi.org/10.1145/3318464.3384706

  • Suzuki, H. & Kasahara, M. Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC Bioinformatics 1933–47 (2018).

    Article Google Scholar

  • Cheng, H., Concepcion, G., Feng, X., Zhang, H. & Li, H. Human Assemblies Evaluated in the Hifiasm Paper (Zenodo, 2020); https://doi.org/10.5281/zenodo.4393631

  • Kalikar, S., Jain, C., Md, V. & Misra, S. mm2-fast Source Code Used in the Manuscript—Accelerating Minimap2 for Long-Read Sequencing Applications on Modern CPUs (Zenodo, 2022); https://doi.org/10.5281/zenodo.5888171

  • Kalikar, S., Jain, C., Md, V. & Misra, S. Scripts Used for the Experiments in the Manuscript—Accelerating Minimap2 for Long-Read Sequencing Applications on Modern CPUs (Zenodo, 2022); https://doi.org/10.5281/zenodo.5884451

  • Leave a Reply

    %d bloggers like this: