Cargando…

A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations

Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retrovi...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Weiling, Lin, Lin, Malhotra, Raunaq, Yang, Lei, Acharya, Raj, Poss, Mary
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456218/
https://www.ncbi.nlm.nih.gov/pubmed/30921327
http://dx.doi.org/10.1371/journal.pcbi.1006564
_version_ 1783409733386895360
author Li, Weiling
Lin, Lin
Malhotra, Raunaq
Yang, Lei
Acharya, Raj
Poss, Mary
author_facet Li, Weiling
Lin, Lin
Malhotra, Raunaq
Yang, Lei
Acharya, Raj
Poss, Mary
author_sort Li, Weiling
collection PubMed
description Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.
format Online
Article
Text
id pubmed-6456218
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64562182019-05-03 A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations Li, Weiling Lin, Lin Malhotra, Raunaq Yang, Lei Acharya, Raj Poss, Mary PLoS Comput Biol Research Article Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set of k-mers unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases. Public Library of Science 2019-03-28 /pmc/articles/PMC6456218/ /pubmed/30921327 http://dx.doi.org/10.1371/journal.pcbi.1006564 Text en © 2019 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Li, Weiling
Lin, Lin
Malhotra, Raunaq
Yang, Lei
Acharya, Raj
Poss, Mary
A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations
title A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations
title_full A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations
title_fullStr A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations
title_full_unstemmed A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations
title_short A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations
title_sort computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-k in human populations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456218/
https://www.ncbi.nlm.nih.gov/pubmed/30921327
http://dx.doi.org/10.1371/journal.pcbi.1006564
work_keys_str_mv AT liweiling acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT linlin acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT malhotraraunaq acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT yanglei acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT acharyaraj acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT possmary acomputationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT liweiling computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT linlin computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT malhotraraunaq computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT yanglei computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT acharyaraj computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations
AT possmary computationalframeworktoassessgenomewidedistributionofpolymorphichumanendogenousretroviruskinhumanpopulations