Cargando…

Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing

The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been...

Descripción completa

Detalles Bibliográficos
Autores principales: Prodanov, Timofey, Bansal, Vikas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184528/
https://www.ncbi.nlm.nih.gov/pubmed/35680869
http://dx.doi.org/10.1038/s41467-022-30930-3
_version_ 1784724540401647616
author Prodanov, Timofey
Bansal, Vikas
author_facet Prodanov, Timofey
Bansal, Vikas
author_sort Prodanov, Timofey
collection PubMed
description The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.
format Online
Article
Text
id pubmed-9184528
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-91845282022-06-11 Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing Prodanov, Timofey Bansal, Vikas Nat Commun Article The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes. Nature Publishing Group UK 2022-06-09 /pmc/articles/PMC9184528/ /pubmed/35680869 http://dx.doi.org/10.1038/s41467-022-30930-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Prodanov, Timofey
Bansal, Vikas
Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
title Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
title_full Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
title_fullStr Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
title_full_unstemmed Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
title_short Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
title_sort robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184528/
https://www.ncbi.nlm.nih.gov/pubmed/35680869
http://dx.doi.org/10.1038/s41467-022-30930-3
work_keys_str_mv AT prodanovtimofey robustandaccurateestimationofparalogspecificcopynumberforduplicatedgenesusingwholegenomesequencing
AT bansalvikas robustandaccurateestimationofparalogspecificcopynumberforduplicatedgenesusingwholegenomesequencing