Cargando…

Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants

Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations a...

Descripción completa

Detalles Bibliográficos
Autores principales: Gagliano, Sarah A., Ravji, Reena, Barnes, Michael R., Weale, Michael E., Knight, Jo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642511/
https://www.ncbi.nlm.nih.gov/pubmed/26300220
http://dx.doi.org/10.1038/srep13373
_version_ 1782400368589668352
author Gagliano, Sarah A.
Ravji, Reena
Barnes, Michael R.
Weale, Michael E.
Knight, Jo
author_facet Gagliano, Sarah A.
Ravji, Reena
Barnes, Michael R.
Weale, Michael E.
Knight, Jo
author_sort Gagliano, Sarah A.
collection PubMed
description Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64–0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.
format Online
Article
Text
id pubmed-4642511
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46425112015-11-20 Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants Gagliano, Sarah A. Ravji, Reena Barnes, Michael R. Weale, Michael E. Knight, Jo Sci Rep Article Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64–0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies. Nature Publishing Group 2015-08-24 /pmc/articles/PMC4642511/ /pubmed/26300220 http://dx.doi.org/10.1038/srep13373 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Gagliano, Sarah A.
Ravji, Reena
Barnes, Michael R.
Weale, Michael E.
Knight, Jo
Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
title Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
title_full Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
title_fullStr Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
title_full_unstemmed Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
title_short Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
title_sort smoking gun or circumstantial evidence? comparison of statistical learning methods using functional annotations for prioritizing risk variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642511/
https://www.ncbi.nlm.nih.gov/pubmed/26300220
http://dx.doi.org/10.1038/srep13373
work_keys_str_mv AT gaglianosaraha smokinggunorcircumstantialevidencecomparisonofstatisticallearningmethodsusingfunctionalannotationsforprioritizingriskvariants
AT ravjireena smokinggunorcircumstantialevidencecomparisonofstatisticallearningmethodsusingfunctionalannotationsforprioritizingriskvariants
AT barnesmichaelr smokinggunorcircumstantialevidencecomparisonofstatisticallearningmethodsusingfunctionalannotationsforprioritizingriskvariants
AT wealemichaele smokinggunorcircumstantialevidencecomparisonofstatisticallearningmethodsusingfunctionalannotationsforprioritizingriskvariants
AT knightjo smokinggunorcircumstantialevidencecomparisonofstatisticallearningmethodsusingfunctionalannotationsforprioritizingriskvariants