Cargando…

Landscape and variation of novel retroduplications in 26 human populations

Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integra...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yan, Li, Shantao, Abyzov, Alexej, Gerstein, Mark B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5510864/
https://www.ncbi.nlm.nih.gov/pubmed/28662076
http://dx.doi.org/10.1371/journal.pcbi.1005567
_version_ 1783250241744535552
author Zhang, Yan
Li, Shantao
Abyzov, Alexej
Gerstein, Mark B.
author_facet Zhang, Yan
Li, Shantao
Abyzov, Alexej
Gerstein, Mark B.
author_sort Zhang, Yan
collection PubMed
description Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
format Online
Article
Text
id pubmed-5510864
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55108642017-08-07 Landscape and variation of novel retroduplications in 26 human populations Zhang, Yan Li, Shantao Abyzov, Alexej Gerstein, Mark B. PLoS Comput Biol Research Article Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling. Public Library of Science 2017-06-29 /pmc/articles/PMC5510864/ /pubmed/28662076 http://dx.doi.org/10.1371/journal.pcbi.1005567 Text en © 2017 Zhang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Yan
Li, Shantao
Abyzov, Alexej
Gerstein, Mark B.
Landscape and variation of novel retroduplications in 26 human populations
title Landscape and variation of novel retroduplications in 26 human populations
title_full Landscape and variation of novel retroduplications in 26 human populations
title_fullStr Landscape and variation of novel retroduplications in 26 human populations
title_full_unstemmed Landscape and variation of novel retroduplications in 26 human populations
title_short Landscape and variation of novel retroduplications in 26 human populations
title_sort landscape and variation of novel retroduplications in 26 human populations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5510864/
https://www.ncbi.nlm.nih.gov/pubmed/28662076
http://dx.doi.org/10.1371/journal.pcbi.1005567
work_keys_str_mv AT zhangyan landscapeandvariationofnovelretroduplicationsin26humanpopulations
AT lishantao landscapeandvariationofnovelretroduplicationsin26humanpopulations
AT abyzovalexej landscapeandvariationofnovelretroduplicationsin26humanpopulations
AT gersteinmarkb landscapeandvariationofnovelretroduplicationsin26humanpopulations