Cargando…

A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships

BACKGROUND: Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLO...

Descripción completa

Detalles Bibliográficos
Autores principales: Lemaitre, Claire, Barré, Aurélien, Citti, Christine, Tardy, Florence, Thiaucourt, François, Sirand-Pugnet, Pascal, Thébault, Patricia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248887/
https://www.ncbi.nlm.nih.gov/pubmed/22115330
http://dx.doi.org/10.1186/1471-2105-12-457
_version_ 1782220286504992768
author Lemaitre, Claire
Barré, Aurélien
Citti, Christine
Tardy, Florence
Thiaucourt, François
Sirand-Pugnet, Pascal
Thébault, Patricia
author_facet Lemaitre, Claire
Barré, Aurélien
Citti, Christine
Tardy, Florence
Thiaucourt, François
Sirand-Pugnet, Pascal
Thébault, Patricia
author_sort Lemaitre, Claire
collection PubMed
description BACKGROUND: Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to accurately estimate alignment scores and statistical significance with sequences sharing marked compositional biases. RESULTS: We present a general and simple methodology to build matrices that are especially fitted to the compositional bias of proteins. Our approach is inspired from the one used to build the BLOSUM matrices and is based on learning substitution and amino acid frequencies on real sequences with the corresponding compositional bias. We applied it to the large scale comparison of Mollicute AT-rich genomes. The new matrix, MOLLI60, was used to predict pairwise orthology relationships, as well as homolog families among 24 Mollicute genomes. We show that this new matrix enables to better discriminate between true and false orthologs and improves the clustering of homologous proteins, with respect to the use of the classical matrix BLOSUM62. CONCLUSIONS: We show in this paper that well-fitted matrices can improve the predictions of orthologous and homologous relationships among proteins with a similar compositional bias. With the ever-increasing number of sequenced genomes, our approach could prove valuable in numerous comparative studies focusing on atypical genomes.
format Online
Article
Text
id pubmed-3248887
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32488872012-01-03 A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships Lemaitre, Claire Barré, Aurélien Citti, Christine Tardy, Florence Thiaucourt, François Sirand-Pugnet, Pascal Thébault, Patricia BMC Bioinformatics Research Article BACKGROUND: Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to accurately estimate alignment scores and statistical significance with sequences sharing marked compositional biases. RESULTS: We present a general and simple methodology to build matrices that are especially fitted to the compositional bias of proteins. Our approach is inspired from the one used to build the BLOSUM matrices and is based on learning substitution and amino acid frequencies on real sequences with the corresponding compositional bias. We applied it to the large scale comparison of Mollicute AT-rich genomes. The new matrix, MOLLI60, was used to predict pairwise orthology relationships, as well as homolog families among 24 Mollicute genomes. We show that this new matrix enables to better discriminate between true and false orthologs and improves the clustering of homologous proteins, with respect to the use of the classical matrix BLOSUM62. CONCLUSIONS: We show in this paper that well-fitted matrices can improve the predictions of orthologous and homologous relationships among proteins with a similar compositional bias. With the ever-increasing number of sequenced genomes, our approach could prove valuable in numerous comparative studies focusing on atypical genomes. BioMed Central 2011-11-24 /pmc/articles/PMC3248887/ /pubmed/22115330 http://dx.doi.org/10.1186/1471-2105-12-457 Text en Copyright ©2011 Lemaitre et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lemaitre, Claire
Barré, Aurélien
Citti, Christine
Tardy, Florence
Thiaucourt, François
Sirand-Pugnet, Pascal
Thébault, Patricia
A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
title A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
title_full A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
title_fullStr A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
title_full_unstemmed A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
title_short A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
title_sort novel substitution matrix fitted to the compositional bias in mollicutes improves the prediction of homologous relationships
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248887/
https://www.ncbi.nlm.nih.gov/pubmed/22115330
http://dx.doi.org/10.1186/1471-2105-12-457
work_keys_str_mv AT lemaitreclaire anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT barreaurelien anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT cittichristine anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT tardyflorence anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT thiaucourtfrancois anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT sirandpugnetpascal anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT thebaultpatricia anovelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT lemaitreclaire novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT barreaurelien novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT cittichristine novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT tardyflorence novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT thiaucourtfrancois novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT sirandpugnetpascal novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships
AT thebaultpatricia novelsubstitutionmatrixfittedtothecompositionalbiasinmollicutesimprovesthepredictionofhomologousrelationships