Cargando…
Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to di...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073954/ https://www.ncbi.nlm.nih.gov/pubmed/32013076 http://dx.doi.org/10.3390/genes11020141 |
_version_ | 1783506728119173120 |
---|---|
author | Shen, Feichen Kidd, Jeffrey M. |
author_facet | Shen, Feichen Kidd, Jeffrey M. |
author_sort | Shen, Feichen |
collection | PubMed |
description | Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus. |
format | Online Article Text |
id | pubmed-7073954 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-70739542020-03-19 Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 Shen, Feichen Kidd, Jeffrey M. Genes (Basel) Article Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus. MDPI 2020-01-29 /pmc/articles/PMC7073954/ /pubmed/32013076 http://dx.doi.org/10.3390/genes11020141 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Shen, Feichen Kidd, Jeffrey M. Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 |
title | Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 |
title_full | Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 |
title_fullStr | Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 |
title_full_unstemmed | Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 |
title_short | Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 |
title_sort | rapid, paralog-sensitive cnv analysis of 2457 human genomes using quick-mer2 |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073954/ https://www.ncbi.nlm.nih.gov/pubmed/32013076 http://dx.doi.org/10.3390/genes11020141 |
work_keys_str_mv | AT shenfeichen rapidparalogsensitivecnvanalysisof2457humangenomesusingquickmer2 AT kiddjeffreym rapidparalogsensitivecnvanalysisof2457humangenomesusingquickmer2 |