Cargando…

FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

BACKGROUND: Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great...

Descripción completa

Detalles Bibliográficos
Autores principales: Pope, Bernard J, Nguyen-Dumont, Tú, Odefrey, Fabrice, Hammet, Fleur, Bell, Russell, Tao, Kayoko, Tavtigian, Sean V, Goldgar, David E, Lonie, Andrew, Southey, Melissa C, Park, Daniel J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3599469/
https://www.ncbi.nlm.nih.gov/pubmed/23441864
http://dx.doi.org/10.1186/1471-2105-14-65
_version_ 1782262970307313664
author Pope, Bernard J
Nguyen-Dumont, Tú
Odefrey, Fabrice
Hammet, Fleur
Bell, Russell
Tao, Kayoko
Tavtigian, Sean V
Goldgar, David E
Lonie, Andrew
Southey, Melissa C
Park, Daniel J
author_facet Pope, Bernard J
Nguyen-Dumont, Tú
Odefrey, Fabrice
Hammet, Fleur
Bell, Russell
Tao, Kayoko
Tavtigian, Sean V
Goldgar, David E
Lonie, Andrew
Southey, Melissa C
Park, Daniel J
author_sort Pope, Bernard J
collection PubMed
description BACKGROUND: Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. RESULTS: FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools. CONCLUSIONS: FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes.
format Online
Article
Text
id pubmed-3599469
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35994692013-03-17 FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets Pope, Bernard J Nguyen-Dumont, Tú Odefrey, Fabrice Hammet, Fleur Bell, Russell Tao, Kayoko Tavtigian, Sean V Goldgar, David E Lonie, Andrew Southey, Melissa C Park, Daniel J BMC Bioinformatics Methodology Article BACKGROUND: Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. RESULTS: FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools. CONCLUSIONS: FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes. BioMed Central 2013-02-25 /pmc/articles/PMC3599469/ /pubmed/23441864 http://dx.doi.org/10.1186/1471-2105-14-65 Text en Copyright ©2013 Pope et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Pope, Bernard J
Nguyen-Dumont, Tú
Odefrey, Fabrice
Hammet, Fleur
Bell, Russell
Tao, Kayoko
Tavtigian, Sean V
Goldgar, David E
Lonie, Andrew
Southey, Melissa C
Park, Daniel J
FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
title FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
title_full FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
title_fullStr FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
title_full_unstemmed FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
title_short FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
title_sort favr (filtering and annotation of variants that are rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3599469/
https://www.ncbi.nlm.nih.gov/pubmed/23441864
http://dx.doi.org/10.1186/1471-2105-14-65
work_keys_str_mv AT popebernardj favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT nguyendumonttu favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT odefreyfabrice favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT hammetfleur favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT bellrussell favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT taokayoko favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT tavtigianseanv favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT goldgardavide favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT lonieandrew favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT southeymelissac favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets
AT parkdanielj favrfilteringandannotationofvariantsthatareraremethodstofacilitatetheanalysisofraregermlinegeneticvariantsfrommassivelyparallelsequencingdatasets