Cargando…

Accurate and exact CNV identification from targeted high-throughput sequence data

BACKGROUND: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Nord, Alex S, Lee, Ming, King, Mary-Claire, Walsh, Tom
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088570/
https://www.ncbi.nlm.nih.gov/pubmed/21486468
http://dx.doi.org/10.1186/1471-2164-12-184
_version_ 1782202909080944640
author Nord, Alex S
Lee, Ming
King, Mary-Claire
Walsh, Tom
author_facet Nord, Alex S
Lee, Ming
King, Mary-Claire
Walsh, Tom
author_sort Nord, Alex S
collection PubMed
description BACKGROUND: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. RESULTS: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. CONCLUSIONS: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.
format Text
id pubmed-3088570
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30885702011-05-06 Accurate and exact CNV identification from targeted high-throughput sequence data Nord, Alex S Lee, Ming King, Mary-Claire Walsh, Tom BMC Genomics Methodology Article BACKGROUND: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. RESULTS: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. CONCLUSIONS: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner. BioMed Central 2011-04-12 /pmc/articles/PMC3088570/ /pubmed/21486468 http://dx.doi.org/10.1186/1471-2164-12-184 Text en Copyright ©2011 Nord et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Nord, Alex S
Lee, Ming
King, Mary-Claire
Walsh, Tom
Accurate and exact CNV identification from targeted high-throughput sequence data
title Accurate and exact CNV identification from targeted high-throughput sequence data
title_full Accurate and exact CNV identification from targeted high-throughput sequence data
title_fullStr Accurate and exact CNV identification from targeted high-throughput sequence data
title_full_unstemmed Accurate and exact CNV identification from targeted high-throughput sequence data
title_short Accurate and exact CNV identification from targeted high-throughput sequence data
title_sort accurate and exact cnv identification from targeted high-throughput sequence data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088570/
https://www.ncbi.nlm.nih.gov/pubmed/21486468
http://dx.doi.org/10.1186/1471-2164-12-184
work_keys_str_mv AT nordalexs accurateandexactcnvidentificationfromtargetedhighthroughputsequencedata
AT leeming accurateandexactcnvidentificationfromtargetedhighthroughputsequencedata
AT kingmaryclaire accurateandexactcnvidentificationfromtargetedhighthroughputsequencedata
AT walshtom accurateandexactcnvidentificationfromtargetedhighthroughputsequencedata