Cargando…

Human copy number variants are enriched in regions of low mappability

Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappabil...

Descripción completa

Detalles Bibliográficos
Autores principales: Monlong, Jean, Cossette, Patrick, Meloche, Caroline, Rouleau, Guy, Girard, Simon L, Bourque, Guillaume
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101599/
https://www.ncbi.nlm.nih.gov/pubmed/30137632
http://dx.doi.org/10.1093/nar/gky538
_version_ 1783349047837327360
author Monlong, Jean
Cossette, Patrick
Meloche, Caroline
Rouleau, Guy
Girard, Simon L
Bourque, Guillaume
author_facet Monlong, Jean
Cossette, Patrick
Meloche, Caroline
Rouleau, Guy
Girard, Simon L
Bourque, Guillaume
author_sort Monlong, Jean
collection PubMed
description Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.
format Online
Article
Text
id pubmed-6101599
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61015992018-08-27 Human copy number variants are enriched in regions of low mappability Monlong, Jean Cossette, Patrick Meloche, Caroline Rouleau, Guy Girard, Simon L Bourque, Guillaume Nucleic Acids Res Genomics Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease. Oxford University Press 2018-08-21 2018-06-21 /pmc/articles/PMC6101599/ /pubmed/30137632 http://dx.doi.org/10.1093/nar/gky538 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Genomics
Monlong, Jean
Cossette, Patrick
Meloche, Caroline
Rouleau, Guy
Girard, Simon L
Bourque, Guillaume
Human copy number variants are enriched in regions of low mappability
title Human copy number variants are enriched in regions of low mappability
title_full Human copy number variants are enriched in regions of low mappability
title_fullStr Human copy number variants are enriched in regions of low mappability
title_full_unstemmed Human copy number variants are enriched in regions of low mappability
title_short Human copy number variants are enriched in regions of low mappability
title_sort human copy number variants are enriched in regions of low mappability
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101599/
https://www.ncbi.nlm.nih.gov/pubmed/30137632
http://dx.doi.org/10.1093/nar/gky538
work_keys_str_mv AT monlongjean humancopynumbervariantsareenrichedinregionsoflowmappability
AT cossettepatrick humancopynumbervariantsareenrichedinregionsoflowmappability
AT melochecaroline humancopynumbervariantsareenrichedinregionsoflowmappability
AT rouleauguy humancopynumbervariantsareenrichedinregionsoflowmappability
AT girardsimonl humancopynumbervariantsareenrichedinregionsoflowmappability
AT bourqueguillaume humancopynumbervariantsareenrichedinregionsoflowmappability