Cargando…

Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences

In recent years it became clear that in eukaryotic genome evolution gene loss is prevalent over gene gain. However, the absence of genes in an annotated genome is not always equivalent to the loss of genes. Due to sequencing issues, or incorrect gene prediction, genes can be falsely inferred as abse...

Descripción completa

Detalles Bibliográficos
Autores principales: Deutekom, Eva S., Vosseberg, Julian, van Dam, Teunis J. P., Snel, Berend
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736253/
https://www.ncbi.nlm.nih.gov/pubmed/31461468
http://dx.doi.org/10.1371/journal.pcbi.1007301
_version_ 1783450481440325632
author Deutekom, Eva S.
Vosseberg, Julian
van Dam, Teunis J. P.
Snel, Berend
author_facet Deutekom, Eva S.
Vosseberg, Julian
van Dam, Teunis J. P.
Snel, Berend
author_sort Deutekom, Eva S.
collection PubMed
description In recent years it became clear that in eukaryotic genome evolution gene loss is prevalent over gene gain. However, the absence of genes in an annotated genome is not always equivalent to the loss of genes. Due to sequencing issues, or incorrect gene prediction, genes can be falsely inferred as absent. This implies that loss estimates are overestimated and, more generally, that falsely inferred absences impact genomic comparative studies. However, reliable estimates of how prevalent this issue is are lacking. Here we quantified the impact of gene prediction on gene loss estimates in eukaryotes by analysing 209 phylogenetically diverse eukaryotic organisms and comparing their predicted proteomes to that of their respective six-frame translated genomes. We observe that 4.61% of domains per species were falsely inferred to be absent for Pfam domains predicted to have been present in the last eukaryotic common ancestor. Between phylogenetically different categories this estimate varies substantially: for clade-specific loss (ancestral loss) we found 1.30% and for species-specific loss 16.88% to be falsely inferred as absent. For BUSCO 1-to-1 orthologous families, 18.30% were falsely inferred to be absent. Finally, we showed that falsely inferred absences indeed impact loss estimates, with the number of losses decreasing by 11.78%. Our work strengthens the increasing number of studies showing that gene loss is an important factor in eukaryotic genome evolution. However, while we demonstrate that on average inferring gene absences from predicted proteomes is reliable, caution is warranted when inferring species-specific absences.
format Online
Article
Text
id pubmed-6736253
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67362532019-09-20 Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences Deutekom, Eva S. Vosseberg, Julian van Dam, Teunis J. P. Snel, Berend PLoS Comput Biol Research Article In recent years it became clear that in eukaryotic genome evolution gene loss is prevalent over gene gain. However, the absence of genes in an annotated genome is not always equivalent to the loss of genes. Due to sequencing issues, or incorrect gene prediction, genes can be falsely inferred as absent. This implies that loss estimates are overestimated and, more generally, that falsely inferred absences impact genomic comparative studies. However, reliable estimates of how prevalent this issue is are lacking. Here we quantified the impact of gene prediction on gene loss estimates in eukaryotes by analysing 209 phylogenetically diverse eukaryotic organisms and comparing their predicted proteomes to that of their respective six-frame translated genomes. We observe that 4.61% of domains per species were falsely inferred to be absent for Pfam domains predicted to have been present in the last eukaryotic common ancestor. Between phylogenetically different categories this estimate varies substantially: for clade-specific loss (ancestral loss) we found 1.30% and for species-specific loss 16.88% to be falsely inferred as absent. For BUSCO 1-to-1 orthologous families, 18.30% were falsely inferred to be absent. Finally, we showed that falsely inferred absences indeed impact loss estimates, with the number of losses decreasing by 11.78%. Our work strengthens the increasing number of studies showing that gene loss is an important factor in eukaryotic genome evolution. However, while we demonstrate that on average inferring gene absences from predicted proteomes is reliable, caution is warranted when inferring species-specific absences. Public Library of Science 2019-08-28 /pmc/articles/PMC6736253/ /pubmed/31461468 http://dx.doi.org/10.1371/journal.pcbi.1007301 Text en © 2019 Deutekom et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Deutekom, Eva S.
Vosseberg, Julian
van Dam, Teunis J. P.
Snel, Berend
Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences
title Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences
title_full Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences
title_fullStr Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences
title_full_unstemmed Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences
title_short Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences
title_sort measuring the impact of gene prediction on gene loss estimates in eukaryotes by quantifying falsely inferred absences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736253/
https://www.ncbi.nlm.nih.gov/pubmed/31461468
http://dx.doi.org/10.1371/journal.pcbi.1007301
work_keys_str_mv AT deutekomevas measuringtheimpactofgenepredictionongenelossestimatesineukaryotesbyquantifyingfalselyinferredabsences
AT vossebergjulian measuringtheimpactofgenepredictionongenelossestimatesineukaryotesbyquantifyingfalselyinferredabsences
AT vandamteunisjp measuringtheimpactofgenepredictionongenelossestimatesineukaryotesbyquantifyingfalselyinferredabsences
AT snelberend measuringtheimpactofgenepredictionongenelossestimatesineukaryotesbyquantifyingfalselyinferredabsences