Cargando…
Discovery of non-reference processed pseudogenes in the Swedish population
The vast majority of the human genome is non-coding. There is a diversity of non-coding features, some of which have functional importance. Although the non-coding regions constitute the majority of the genome, they remain understudied, and for a long time, these regions have been referred to as jun...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10267823/ https://www.ncbi.nlm.nih.gov/pubmed/37323659 http://dx.doi.org/10.3389/fgene.2023.1176626 |
_version_ | 1785059008101482496 |
---|---|
author | Ten Berk de Boer, Esmee Bilgrav Saether, Kristine Eisfeldt, Jesper |
author_facet | Ten Berk de Boer, Esmee Bilgrav Saether, Kristine Eisfeldt, Jesper |
author_sort | Ten Berk de Boer, Esmee |
collection | PubMed |
description | The vast majority of the human genome is non-coding. There is a diversity of non-coding features, some of which have functional importance. Although the non-coding regions constitute the majority of the genome, they remain understudied, and for a long time, these regions have been referred to as junk DNA. Pseudogenes are one of these features. A pseudogene is a non-functional copy of a protein-coding gene. Pseudogenes may arise through a variety of genetic mechanisms. Processed pseudogenes are formed through reverse transcription of mRNA by LINE elements, after which the cDNA is integrated into the genome. Processed pseudogenes are known to be variable across populations; however, the variability and distribution remains unknown. Herein, we apply a custom-designed processed pseudogene pipeline on the whole genome sequencing data of 3,500 individuals; 2,500 individuals from the thousand genomes dataset, as well as 1,000 Swedish individuals. Through these analyses, we discover over 3,000 pseudogenes missing from the GRCh38 reference. Utilising our pipeline, we position 74% of the detected processed pseudogenes—allowing for analyses of formation. Notably, we find that common structural variant callers, such as Delly, classify the processed pseudogenes as deletion events, which are later predicted to be truncating variants. By compiling lists of non-reference processed pseudogenes and their frequencies, we find a great variability of pseudogenes; indicating that non-reference processed pseudogenes may be useful for DNA testing and as population-specific markers. In summary, our findings highlight a great diversity of processed pseudogenes, that processed pseudogenes are actively formed in the human genome; and that our pipeline may be used to reduce false positive structural variation caused by the misalignment and subsequent misclassification of non-reference processed pseudogenes. |
format | Online Article Text |
id | pubmed-10267823 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102678232023-06-15 Discovery of non-reference processed pseudogenes in the Swedish population Ten Berk de Boer, Esmee Bilgrav Saether, Kristine Eisfeldt, Jesper Front Genet Genetics The vast majority of the human genome is non-coding. There is a diversity of non-coding features, some of which have functional importance. Although the non-coding regions constitute the majority of the genome, they remain understudied, and for a long time, these regions have been referred to as junk DNA. Pseudogenes are one of these features. A pseudogene is a non-functional copy of a protein-coding gene. Pseudogenes may arise through a variety of genetic mechanisms. Processed pseudogenes are formed through reverse transcription of mRNA by LINE elements, after which the cDNA is integrated into the genome. Processed pseudogenes are known to be variable across populations; however, the variability and distribution remains unknown. Herein, we apply a custom-designed processed pseudogene pipeline on the whole genome sequencing data of 3,500 individuals; 2,500 individuals from the thousand genomes dataset, as well as 1,000 Swedish individuals. Through these analyses, we discover over 3,000 pseudogenes missing from the GRCh38 reference. Utilising our pipeline, we position 74% of the detected processed pseudogenes—allowing for analyses of formation. Notably, we find that common structural variant callers, such as Delly, classify the processed pseudogenes as deletion events, which are later predicted to be truncating variants. By compiling lists of non-reference processed pseudogenes and their frequencies, we find a great variability of pseudogenes; indicating that non-reference processed pseudogenes may be useful for DNA testing and as population-specific markers. In summary, our findings highlight a great diversity of processed pseudogenes, that processed pseudogenes are actively formed in the human genome; and that our pipeline may be used to reduce false positive structural variation caused by the misalignment and subsequent misclassification of non-reference processed pseudogenes. Frontiers Media S.A. 2023-05-30 /pmc/articles/PMC10267823/ /pubmed/37323659 http://dx.doi.org/10.3389/fgene.2023.1176626 Text en Copyright © 2023 Ten Berk de Boer, Bilgrav Saether and Eisfeldt. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Ten Berk de Boer, Esmee Bilgrav Saether, Kristine Eisfeldt, Jesper Discovery of non-reference processed pseudogenes in the Swedish population |
title | Discovery of non-reference processed pseudogenes in the Swedish population |
title_full | Discovery of non-reference processed pseudogenes in the Swedish population |
title_fullStr | Discovery of non-reference processed pseudogenes in the Swedish population |
title_full_unstemmed | Discovery of non-reference processed pseudogenes in the Swedish population |
title_short | Discovery of non-reference processed pseudogenes in the Swedish population |
title_sort | discovery of non-reference processed pseudogenes in the swedish population |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10267823/ https://www.ncbi.nlm.nih.gov/pubmed/37323659 http://dx.doi.org/10.3389/fgene.2023.1176626 |
work_keys_str_mv | AT tenberkdeboeresmee discoveryofnonreferenceprocessedpseudogenesintheswedishpopulation AT bilgravsaetherkristine discoveryofnonreferenceprocessedpseudogenesintheswedishpopulation AT eisfeldtjesper discoveryofnonreferenceprocessedpseudogenesintheswedishpopulation |