Cargando…

Reference genome assessment from a population scale perspective: an accurate profile of variability and noise

MOTIVATION: Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they a...

Descripción completa

Detalles Bibliográficos
Autores principales: Carbonell-Caballero, José, Amadoz, Alicia, Alonso, Roberto, Hidalgo, Marta R, Çubuk, Cankut, Conesa, David, López-Quílez, Antonio, Dopazo, Joaquín
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870781/
https://www.ncbi.nlm.nih.gov/pubmed/28961772
http://dx.doi.org/10.1093/bioinformatics/btx482
_version_ 1783309548715507712
author Carbonell-Caballero, José
Amadoz, Alicia
Alonso, Roberto
Hidalgo, Marta R
Çubuk, Cankut
Conesa, David
López-Quílez, Antonio
Dopazo, Joaquín
author_facet Carbonell-Caballero, José
Amadoz, Alicia
Alonso, Roberto
Hidalgo, Marta R
Çubuk, Cankut
Conesa, David
López-Quílez, Antonio
Dopazo, Joaquín
author_sort Carbonell-Caballero, José
collection PubMed
description MOTIVATION: Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome. RESULTS: The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples. AVAILABILITY AND IMPLEMENTATION: This tool is freely available at http://gitlab.com/carbonell/ces. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870781
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58707812018-03-29 Reference genome assessment from a population scale perspective: an accurate profile of variability and noise Carbonell-Caballero, José Amadoz, Alicia Alonso, Roberto Hidalgo, Marta R Çubuk, Cankut Conesa, David López-Quílez, Antonio Dopazo, Joaquín Bioinformatics Original Papers MOTIVATION: Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome. RESULTS: The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples. AVAILABILITY AND IMPLEMENTATION: This tool is freely available at http://gitlab.com/carbonell/ces. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-11-15 2017-07-29 /pmc/articles/PMC5870781/ /pubmed/28961772 http://dx.doi.org/10.1093/bioinformatics/btx482 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Carbonell-Caballero, José
Amadoz, Alicia
Alonso, Roberto
Hidalgo, Marta R
Çubuk, Cankut
Conesa, David
López-Quílez, Antonio
Dopazo, Joaquín
Reference genome assessment from a population scale perspective: an accurate profile of variability and noise
title Reference genome assessment from a population scale perspective: an accurate profile of variability and noise
title_full Reference genome assessment from a population scale perspective: an accurate profile of variability and noise
title_fullStr Reference genome assessment from a population scale perspective: an accurate profile of variability and noise
title_full_unstemmed Reference genome assessment from a population scale perspective: an accurate profile of variability and noise
title_short Reference genome assessment from a population scale perspective: an accurate profile of variability and noise
title_sort reference genome assessment from a population scale perspective: an accurate profile of variability and noise
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870781/
https://www.ncbi.nlm.nih.gov/pubmed/28961772
http://dx.doi.org/10.1093/bioinformatics/btx482
work_keys_str_mv AT carbonellcaballerojose referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT amadozalicia referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT alonsoroberto referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT hidalgomartar referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT cubukcankut referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT conesadavid referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT lopezquilezantonio referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise
AT dopazojoaquin referencegenomeassessmentfromapopulationscaleperspectiveanaccurateprofileofvariabilityandnoise