Cargando…

Reconstruction of full-length LINE-1 progenitors from ancestral genomes

Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molec...

Descripción completa

Detalles Bibliográficos
Autores principales: Campitelli, Laura F, Yellan, Isaac, Albu, Mihai, Barazandeh, Marjan, Patel, Zain M, Blanchette, Mathieu, Hughes, Timothy R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252281/
https://www.ncbi.nlm.nih.gov/pubmed/35552404
http://dx.doi.org/10.1093/genetics/iyac074
_version_ 1784740230072369152
author Campitelli, Laura F
Yellan, Isaac
Albu, Mihai
Barazandeh, Marjan
Patel, Zain M
Blanchette, Mathieu
Hughes, Timothy R
author_facet Campitelli, Laura F
Yellan, Isaac
Albu, Mihai
Barazandeh, Marjan
Patel, Zain M
Blanchette, Mathieu
Hughes, Timothy R
author_sort Campitelli, Laura F
collection PubMed
description Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.
format Online
Article
Text
id pubmed-9252281
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92522812022-07-05 Reconstruction of full-length LINE-1 progenitors from ancestral genomes Campitelli, Laura F Yellan, Isaac Albu, Mihai Barazandeh, Marjan Patel, Zain M Blanchette, Mathieu Hughes, Timothy R Genetics Investigation Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks. Oxford University Press 2022-05-12 /pmc/articles/PMC9252281/ /pubmed/35552404 http://dx.doi.org/10.1093/genetics/iyac074 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Campitelli, Laura F
Yellan, Isaac
Albu, Mihai
Barazandeh, Marjan
Patel, Zain M
Blanchette, Mathieu
Hughes, Timothy R
Reconstruction of full-length LINE-1 progenitors from ancestral genomes
title Reconstruction of full-length LINE-1 progenitors from ancestral genomes
title_full Reconstruction of full-length LINE-1 progenitors from ancestral genomes
title_fullStr Reconstruction of full-length LINE-1 progenitors from ancestral genomes
title_full_unstemmed Reconstruction of full-length LINE-1 progenitors from ancestral genomes
title_short Reconstruction of full-length LINE-1 progenitors from ancestral genomes
title_sort reconstruction of full-length line-1 progenitors from ancestral genomes
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252281/
https://www.ncbi.nlm.nih.gov/pubmed/35552404
http://dx.doi.org/10.1093/genetics/iyac074
work_keys_str_mv AT campitellilauraf reconstructionoffulllengthline1progenitorsfromancestralgenomes
AT yellanisaac reconstructionoffulllengthline1progenitorsfromancestralgenomes
AT albumihai reconstructionoffulllengthline1progenitorsfromancestralgenomes
AT barazandehmarjan reconstructionoffulllengthline1progenitorsfromancestralgenomes
AT patelzainm reconstructionoffulllengthline1progenitorsfromancestralgenomes
AT blanchettemathieu reconstructionoffulllengthline1progenitorsfromancestralgenomes
AT hughestimothyr reconstructionoffulllengthline1progenitorsfromancestralgenomes