Cargando…

Identification of recurrent regions of copy-number variants across multiple individuals

BACKGROUND: Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Curren...

Descripción completa

Detalles Bibliográficos
Autores principales: Mei, Teo Shu, Salim, Agus, Calza, Stefano, Seng, Ku Chee, Seng, Chia Kee, Pawitan, Yudi
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851607/
https://www.ncbi.nlm.nih.gov/pubmed/20307285
http://dx.doi.org/10.1186/1471-2105-11-147
_version_ 1782179882540728320
author Mei, Teo Shu
Salim, Agus
Calza, Stefano
Seng, Ku Chee
Seng, Chia Kee
Pawitan, Yudi
author_facet Mei, Teo Shu
Salim, Agus
Calza, Stefano
Seng, Ku Chee
Seng, Chia Kee
Pawitan, Yudi
author_sort Mei, Teo Shu
collection PubMed
description BACKGROUND: Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed. RESULTS: In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions. CONCLUSIONS: The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies.
format Text
id pubmed-2851607
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28516072010-04-09 Identification of recurrent regions of copy-number variants across multiple individuals Mei, Teo Shu Salim, Agus Calza, Stefano Seng, Ku Chee Seng, Chia Kee Pawitan, Yudi BMC Bioinformatics Research article BACKGROUND: Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed. RESULTS: In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions. CONCLUSIONS: The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies. BioMed Central 2010-03-22 /pmc/articles/PMC2851607/ /pubmed/20307285 http://dx.doi.org/10.1186/1471-2105-11-147 Text en Copyright ©2010 Mei et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Mei, Teo Shu
Salim, Agus
Calza, Stefano
Seng, Ku Chee
Seng, Chia Kee
Pawitan, Yudi
Identification of recurrent regions of copy-number variants across multiple individuals
title Identification of recurrent regions of copy-number variants across multiple individuals
title_full Identification of recurrent regions of copy-number variants across multiple individuals
title_fullStr Identification of recurrent regions of copy-number variants across multiple individuals
title_full_unstemmed Identification of recurrent regions of copy-number variants across multiple individuals
title_short Identification of recurrent regions of copy-number variants across multiple individuals
title_sort identification of recurrent regions of copy-number variants across multiple individuals
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851607/
https://www.ncbi.nlm.nih.gov/pubmed/20307285
http://dx.doi.org/10.1186/1471-2105-11-147
work_keys_str_mv AT meiteoshu identificationofrecurrentregionsofcopynumbervariantsacrossmultipleindividuals
AT salimagus identificationofrecurrentregionsofcopynumbervariantsacrossmultipleindividuals
AT calzastefano identificationofrecurrentregionsofcopynumbervariantsacrossmultipleindividuals
AT sengkuchee identificationofrecurrentregionsofcopynumbervariantsacrossmultipleindividuals
AT sengchiakee identificationofrecurrentregionsofcopynumbervariantsacrossmultipleindividuals
AT pawitanyudi identificationofrecurrentregionsofcopynumbervariantsacrossmultipleindividuals