Machine Learning Models for Accurate Prioritization of Variants of Uncertain Significance

dc.contributor.authorDaniel Mahecha
dc.contributor.authorHaydemar Núñez
dc.contributor.authorMaría Claudia Lattig
dc.contributor.authorJorge Duitama
dc.coverage.spatialBolivia
dc.date.accessioned2026-03-22T20:42:33Z
dc.date.available2026-03-22T20:42:33Z
dc.date.issued2020
dc.descriptionCitaciones: 1
dc.description.abstractThe growing use of new generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of Variants of Uncertain Significance (VUS). In this manuscript we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron (MLP). To train the models, we extracted 82,463 high quality variants from ClinVar, using 9 conservation scores, the loss of function tool and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross validation with a grid search. The three models were tested on a set of 5,537 variants that had been classified as VUS any time along the last three years but had been reclassified in august 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF based model yielded the best performance across different variant types and was used to create VusPrize, an open source software tool for prioritization of variants of uncertain significance. We believe that our model can improve the process of genetic diagnosis on research and clinical settings.
dc.identifier.doi10.22541/au.160629133.32270917/v1
dc.identifier.urihttps://doi.org/10.22541/au.160629133.32270917/v1
dc.identifier.urihttps://andeanlibrary.org/handle/123456789/83608
dc.language.isoen
dc.sourceFundación Santa Fe de Bogotá
dc.subjectHyperparameter optimization
dc.subjectHyperparameter
dc.subjectSupport vector machine
dc.subjectMachine learning
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectRandom forest
dc.subjectPerceptron
dc.subjectPrioritization
dc.subjectSet (abstract data type)
dc.titleMachine Learning Models for Accurate Prioritization of Variants of Uncertain Significance
dc.typepreprint

Files