Cargando…

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to di...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Feichen, Kidd, Jeffrey M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073954/
https://www.ncbi.nlm.nih.gov/pubmed/32013076
http://dx.doi.org/10.3390/genes11020141
_version_ 1783506728119173120
author Shen, Feichen
Kidd, Jeffrey M.
author_facet Shen, Feichen
Kidd, Jeffrey M.
author_sort Shen, Feichen
collection PubMed
description Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.
format Online
Article
Text
id pubmed-7073954
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-70739542020-03-19 Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 Shen, Feichen Kidd, Jeffrey M. Genes (Basel) Article Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus. MDPI 2020-01-29 /pmc/articles/PMC7073954/ /pubmed/32013076 http://dx.doi.org/10.3390/genes11020141 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Shen, Feichen
Kidd, Jeffrey M.
Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
title Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
title_full Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
title_fullStr Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
title_full_unstemmed Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
title_short Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2
title_sort rapid, paralog-sensitive cnv analysis of 2457 human genomes using quick-mer2
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073954/
https://www.ncbi.nlm.nih.gov/pubmed/32013076
http://dx.doi.org/10.3390/genes11020141
work_keys_str_mv AT shenfeichen rapidparalogsensitivecnvanalysisof2457humangenomesusingquickmer2
AT kiddjeffreym rapidparalogsensitivecnvanalysisof2457humangenomesusingquickmer2