Cargando…

The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study

The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically...

Descripción completa

Detalles Bibliográficos
Autores principales: Dalquen, Daniel A., Altenhoff, Adrian M., Gonnet, Gaston H., Dessimoz, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3581572/
https://www.ncbi.nlm.nih.gov/pubmed/23451112
http://dx.doi.org/10.1371/journal.pone.0056925
_version_ 1782260441643220992
author Dalquen, Daniel A.
Altenhoff, Adrian M.
Gonnet, Gaston H.
Dessimoz, Christophe
author_facet Dalquen, Daniel A.
Altenhoff, Adrian M.
Gonnet, Gaston H.
Dessimoz, Christophe
author_sort Dalquen, Daniel A.
collection PubMed
description The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another. Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
format Online
Article
Text
id pubmed-3581572
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35815722013-02-28 The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study Dalquen, Daniel A. Altenhoff, Adrian M. Gonnet, Gaston H. Dessimoz, Christophe PLoS One Research Article The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another. Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts. Public Library of Science 2013-02-25 /pmc/articles/PMC3581572/ /pubmed/23451112 http://dx.doi.org/10.1371/journal.pone.0056925 Text en © 2013 Dalquen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dalquen, Daniel A.
Altenhoff, Adrian M.
Gonnet, Gaston H.
Dessimoz, Christophe
The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study
title The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study
title_full The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study
title_fullStr The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study
title_full_unstemmed The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study
title_short The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study
title_sort impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3581572/
https://www.ncbi.nlm.nih.gov/pubmed/23451112
http://dx.doi.org/10.1371/journal.pone.0056925
work_keys_str_mv AT dalquendaniela theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT altenhoffadrianm theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT gonnetgastonh theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT dessimozchristophe theimpactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT dalquendaniela impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT altenhoffadrianm impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT gonnetgastonh impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy
AT dessimozchristophe impactofgeneduplicationinsertiondeletionlateralgenetransferandsequencingerroronorthologyinferenceasimulationstudy