Cargando…

Comprehensive variation discovery in single human genomes

Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Weisenfeld, Neil I., Yin, Shuangye, Sharpe, Ted, Lau, Bayo, Hegarty, Ryan, Holmes, Laurie, Sogoloff, Brian, Tabbaa, Diana, Williams, Louise, Russ, Carsten, Nusbaum, Chad, Lander, Eric S., MacCallum, Iain, Jaffe, David B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4244235/
https://www.ncbi.nlm.nih.gov/pubmed/25326702
http://dx.doi.org/10.1038/ng.3121
_version_ 1782346213817843712
author Weisenfeld, Neil I.
Yin, Shuangye
Sharpe, Ted
Lau, Bayo
Hegarty, Ryan
Holmes, Laurie
Sogoloff, Brian
Tabbaa, Diana
Williams, Louise
Russ, Carsten
Nusbaum, Chad
Lander, Eric S.
MacCallum, Iain
Jaffe, David B.
author_facet Weisenfeld, Neil I.
Yin, Shuangye
Sharpe, Ted
Lau, Bayo
Hegarty, Ryan
Holmes, Laurie
Sogoloff, Brian
Tabbaa, Diana
Williams, Louise
Russ, Carsten
Nusbaum, Chad
Lander, Eric S.
MacCallum, Iain
Jaffe, David B.
author_sort Weisenfeld, Neil I.
collection PubMed
description Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome, however calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from finished sequence of 103 randomly chosen Fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity several-fold, with the greatest impact in challenging regions of the human genome.
format Online
Article
Text
id pubmed-4244235
institution National Center for Biotechnology Information
language English
publishDate 2014
record_format MEDLINE/PubMed
spelling pubmed-42442352015-06-01 Comprehensive variation discovery in single human genomes Weisenfeld, Neil I. Yin, Shuangye Sharpe, Ted Lau, Bayo Hegarty, Ryan Holmes, Laurie Sogoloff, Brian Tabbaa, Diana Williams, Louise Russ, Carsten Nusbaum, Chad Lander, Eric S. MacCallum, Iain Jaffe, David B. Nat Genet Article Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome, however calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from finished sequence of 103 randomly chosen Fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity several-fold, with the greatest impact in challenging regions of the human genome. 2014-10-19 2014-12 /pmc/articles/PMC4244235/ /pubmed/25326702 http://dx.doi.org/10.1038/ng.3121 Text en http://www.nature.com/authors/editorial_policies/license.html#terms Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Weisenfeld, Neil I.
Yin, Shuangye
Sharpe, Ted
Lau, Bayo
Hegarty, Ryan
Holmes, Laurie
Sogoloff, Brian
Tabbaa, Diana
Williams, Louise
Russ, Carsten
Nusbaum, Chad
Lander, Eric S.
MacCallum, Iain
Jaffe, David B.
Comprehensive variation discovery in single human genomes
title Comprehensive variation discovery in single human genomes
title_full Comprehensive variation discovery in single human genomes
title_fullStr Comprehensive variation discovery in single human genomes
title_full_unstemmed Comprehensive variation discovery in single human genomes
title_short Comprehensive variation discovery in single human genomes
title_sort comprehensive variation discovery in single human genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4244235/
https://www.ncbi.nlm.nih.gov/pubmed/25326702
http://dx.doi.org/10.1038/ng.3121
work_keys_str_mv AT weisenfeldneili comprehensivevariationdiscoveryinsinglehumangenomes
AT yinshuangye comprehensivevariationdiscoveryinsinglehumangenomes
AT sharpeted comprehensivevariationdiscoveryinsinglehumangenomes
AT laubayo comprehensivevariationdiscoveryinsinglehumangenomes
AT hegartyryan comprehensivevariationdiscoveryinsinglehumangenomes
AT holmeslaurie comprehensivevariationdiscoveryinsinglehumangenomes
AT sogoloffbrian comprehensivevariationdiscoveryinsinglehumangenomes
AT tabbaadiana comprehensivevariationdiscoveryinsinglehumangenomes
AT williamslouise comprehensivevariationdiscoveryinsinglehumangenomes
AT russcarsten comprehensivevariationdiscoveryinsinglehumangenomes
AT nusbaumchad comprehensivevariationdiscoveryinsinglehumangenomes
AT landererics comprehensivevariationdiscoveryinsinglehumangenomes
AT maccallumiain comprehensivevariationdiscoveryinsinglehumangenomes
AT jaffedavidb comprehensivevariationdiscoveryinsinglehumangenomes