Cargando…

Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data

Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the c...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiara, Matteo, Pavesi, Giulio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5500642/
https://www.ncbi.nlm.nih.gov/pubmed/28736571
http://dx.doi.org/10.3389/fgene.2017.00094
_version_ 1783248672255901696
author Chiara, Matteo
Pavesi, Giulio
author_facet Chiara, Matteo
Pavesi, Giulio
author_sort Chiara, Matteo
collection PubMed
description Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human “variome,” through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS) for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes) and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs.
format Online
Article
Text
id pubmed-5500642
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-55006422017-07-21 Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data Chiara, Matteo Pavesi, Giulio Front Genet Genetics Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human “variome,” through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS) for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes) and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs. Frontiers Media S.A. 2017-07-07 /pmc/articles/PMC5500642/ /pubmed/28736571 http://dx.doi.org/10.3389/fgene.2017.00094 Text en Copyright © 2017 Chiara and Pavesi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chiara, Matteo
Pavesi, Giulio
Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
title Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
title_full Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
title_fullStr Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
title_full_unstemmed Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
title_short Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
title_sort evaluation of quality assessment protocols for high throughput genome resequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5500642/
https://www.ncbi.nlm.nih.gov/pubmed/28736571
http://dx.doi.org/10.3389/fgene.2017.00094
work_keys_str_mv AT chiaramatteo evaluationofqualityassessmentprotocolsforhighthroughputgenomeresequencingdata
AT pavesigiulio evaluationofqualityassessmentprotocolsforhighthroughputgenomeresequencingdata