New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads

dc.contributor.authorLaura Natalia González-García
dc.contributor.authorDavid Guevara-Barrientos
dc.contributor.authorDaniela Lozano‐Arce
dc.contributor.authorJuanita Gil
dc.contributor.authorJorge Díaz-Riaño
dc.contributor.authorErick Duarte
dc.contributor.authorGermán I. Andrade
dc.contributor.authorJuan Camilo Bojacá
dc.contributor.authorMaria Camila Hoyos-Sanchez
dc.contributor.authorChristian Chavarro
dc.coverage.spatialBolivia
dc.date.accessioned2026-03-22T14:41:28Z
dc.date.available2026-03-22T14:41:28Z
dc.date.issued2023
dc.descriptionCitaciones: 7
dc.description.abstractBuilding de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers selected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reimplementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
dc.identifier.doi10.26508/lsa.202201719
dc.identifier.urihttps://doi.org/10.26508/lsa.202201719
dc.identifier.urihttps://andeanlibrary.org/handle/123456789/47982
dc.language.isoen
dc.relation.ispartofLife Science Alliance
dc.sourceUniversidad de Los Andes
dc.subjectSequence assembly
dc.subjectHash function
dc.subjectAlgorithm
dc.subjectGenome
dc.subjectPloidy
dc.subjectComputer science
dc.subjectHybrid genome assembly
dc.subjectDNA sequencing
dc.subjectGraph
dc.subjectSoftware
dc.titleNew algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads
dc.typearticle

Files