Cargando…
Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data
Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the c...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5500642/ https://www.ncbi.nlm.nih.gov/pubmed/28736571 http://dx.doi.org/10.3389/fgene.2017.00094 |
_version_ | 1783248672255901696 |
---|---|
author | Chiara, Matteo Pavesi, Giulio |
author_facet | Chiara, Matteo Pavesi, Giulio |
author_sort | Chiara, Matteo |
collection | PubMed |
description | Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human “variome,” through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS) for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes) and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs. |
format | Online Article Text |
id | pubmed-5500642 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-55006422017-07-21 Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data Chiara, Matteo Pavesi, Giulio Front Genet Genetics Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human “variome,” through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS) for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes) and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs. Frontiers Media S.A. 2017-07-07 /pmc/articles/PMC5500642/ /pubmed/28736571 http://dx.doi.org/10.3389/fgene.2017.00094 Text en Copyright © 2017 Chiara and Pavesi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Chiara, Matteo Pavesi, Giulio Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data |
title | Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data |
title_full | Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data |
title_fullStr | Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data |
title_full_unstemmed | Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data |
title_short | Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data |
title_sort | evaluation of quality assessment protocols for high throughput genome resequencing data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5500642/ https://www.ncbi.nlm.nih.gov/pubmed/28736571 http://dx.doi.org/10.3389/fgene.2017.00094 |
work_keys_str_mv | AT chiaramatteo evaluationofqualityassessmentprotocolsforhighthroughputgenomeresequencingdata AT pavesigiulio evaluationofqualityassessmentprotocolsforhighthroughputgenomeresequencingdata |