Cargando…
Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture
Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challeng...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3857319/ https://www.ncbi.nlm.nih.gov/pubmed/24349410 http://dx.doi.org/10.1371/journal.pone.0082981 |
_version_ | 1782295150753480704 |
---|---|
author | Tanca, Alessandro Palomba, Antonio Deligios, Massimo Cubeddu, Tiziana Fraumene, Cristina Biosa, Grazia Pagnozzi, Daniela Addis, Maria Filippa Uzzau, Sergio |
author_facet | Tanca, Alessandro Palomba, Antonio Deligios, Massimo Cubeddu, Tiziana Fraumene, Cristina Biosa, Grazia Pagnozzi, Daniela Addis, Maria Filippa Uzzau, Sergio |
author_sort | Tanca, Alessandro |
collection | PubMed |
description | Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data. |
format | Online Article Text |
id | pubmed-3857319 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38573192013-12-13 Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture Tanca, Alessandro Palomba, Antonio Deligios, Massimo Cubeddu, Tiziana Fraumene, Cristina Biosa, Grazia Pagnozzi, Daniela Addis, Maria Filippa Uzzau, Sergio PLoS One Research Article Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data. Public Library of Science 2013-12-09 /pmc/articles/PMC3857319/ /pubmed/24349410 http://dx.doi.org/10.1371/journal.pone.0082981 Text en © 2013 Tanca et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Tanca, Alessandro Palomba, Antonio Deligios, Massimo Cubeddu, Tiziana Fraumene, Cristina Biosa, Grazia Pagnozzi, Daniela Addis, Maria Filippa Uzzau, Sergio Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture |
title | Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture |
title_full | Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture |
title_fullStr | Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture |
title_full_unstemmed | Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture |
title_short | Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture |
title_sort | evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3857319/ https://www.ncbi.nlm.nih.gov/pubmed/24349410 http://dx.doi.org/10.1371/journal.pone.0082981 |
work_keys_str_mv | AT tancaalessandro evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT palombaantonio evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT deligiosmassimo evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT cubeddutiziana evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT fraumenecristina evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT biosagrazia evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT pagnozzidaniela evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT addismariafilippa evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture AT uzzausergio evaluatingtheimpactofdifferentsequencedatabasesonmetaproteomeanalysisinsightsfromalabassembledmicrobialmixture |