Cargando…

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities

BACKGROUND: Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, exc...

Descripción completa

Detalles Bibliográficos
Autores principales: Bainbridge, Matthew N, Wang, Min, Wu, Yuanqing, Newsham, Irene, Muzny, Donna M, Jefferies, John L, Albert, Thomas J, Burgess, Daniel L, Gibbs, Richard A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218830/
https://www.ncbi.nlm.nih.gov/pubmed/21787409
http://dx.doi.org/10.1186/gb-2011-12-7-r68
_version_ 1782216737389805568
author Bainbridge, Matthew N
Wang, Min
Wu, Yuanqing
Newsham, Irene
Muzny, Donna M
Jefferies, John L
Albert, Thomas J
Burgess, Daniel L
Gibbs, Richard A
author_facet Bainbridge, Matthew N
Wang, Min
Wu, Yuanqing
Newsham, Irene
Muzny, Donna M
Jefferies, John L
Albert, Thomas J
Burgess, Daniel L
Gibbs, Richard A
author_sort Bainbridge, Matthew N
collection PubMed
description BACKGROUND: Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood. RESULTS: We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS. CONCLUSIONS: We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.
format Online
Article
Text
id pubmed-3218830
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32188302012-07-25 Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities Bainbridge, Matthew N Wang, Min Wu, Yuanqing Newsham, Irene Muzny, Donna M Jefferies, John L Albert, Thomas J Burgess, Daniel L Gibbs, Richard A Genome Biol Research BACKGROUND: Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood. RESULTS: We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS. CONCLUSIONS: We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS. BioMed Central 2011 2011-07-25 /pmc/articles/PMC3218830/ /pubmed/21787409 http://dx.doi.org/10.1186/gb-2011-12-7-r68 Text en Copyright ©2011 Bainbridge et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
spellingShingle Research
Bainbridge, Matthew N
Wang, Min
Wu, Yuanqing
Newsham, Irene
Muzny, Donna M
Jefferies, John L
Albert, Thomas J
Burgess, Daniel L
Gibbs, Richard A
Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities
title Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities
title_full Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities
title_fullStr Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities
title_full_unstemmed Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities
title_short Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities
title_sort targeted enrichment beyond the consensus coding dna sequence exome reveals exons with higher variant densities
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218830/
https://www.ncbi.nlm.nih.gov/pubmed/21787409
http://dx.doi.org/10.1186/gb-2011-12-7-r68
work_keys_str_mv AT bainbridgematthewn targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT wangmin targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT wuyuanqing targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT newshamirene targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT muznydonnam targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT jefferiesjohnl targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT albertthomasj targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT burgessdaniell targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities
AT gibbsricharda targetedenrichmentbeyondtheconsensuscodingdnasequenceexomerevealsexonswithhighervariantdensities