Cargando…

Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data

BACKGROUND: Taxa counting is a major problem faced by analysis of metagenomic data. The most popular method relies on analysis of 16S rRNA sequences, but some studies employ also protein based analyses. It would be advantageous to have a method that is applicable directly to short sequences, of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Persi, Erez, Weingart, Uri, Freilich, Shiri, Horn, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3319421/
https://www.ncbi.nlm.nih.gov/pubmed/22325056
http://dx.doi.org/10.1186/1471-2164-13-65
_version_ 1782228719468806144
author Persi, Erez
Weingart, Uri
Freilich, Shiri
Horn, David
author_facet Persi, Erez
Weingart, Uri
Freilich, Shiri
Horn, David
author_sort Persi, Erez
collection PubMed
description BACKGROUND: Taxa counting is a major problem faced by analysis of metagenomic data. The most popular method relies on analysis of 16S rRNA sequences, but some studies employ also protein based analyses. It would be advantageous to have a method that is applicable directly to short sequences, of the kind extracted from samples in modern metagenomic research. This is achieved by the technique proposed here. RESULTS: We employ specific peptides, deduced from aminoacyl tRNA synthetases, as markers for the occurrence of single genes in data. Sequences carrying these markers are aligned and compared with each other to provide a lower limit for taxa counts in metagenomic data. The method is compared with 16S rRNA searches on a set of known genomes. The taxa counting problem is analyzed mathematically and a heuristic algorithm is proposed. When applied to genomic contigs of a recent human gut microbiome study, the taxa counting method provides information on numbers of different species and strains. We then apply our method to short read data and demonstrate how it can be calibrated to cope with errors. Comparison to known databases leads to estimates of the percentage of novelties, and the type of phyla involved. CONCLUSIONS: A major advantage of our method is its simplicity: it relies on searching sequences for the occurrence of just 4000 specific peptides belonging to the S61 subgroup of aaRS enzymes. When compared to other methods, it provides additional insight into the taxonomic contents of metagenomic data. Furthermore, it can be directly applied to short read data, avoiding the need for genomic contig reconstruction, and taking into account short reads that are otherwise discarded as singletons. Hence it is very suitable for a fast analysis of next generation sequencing data.
format Online
Article
Text
id pubmed-3319421
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33194212012-04-05 Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data Persi, Erez Weingart, Uri Freilich, Shiri Horn, David BMC Genomics Methodology Article BACKGROUND: Taxa counting is a major problem faced by analysis of metagenomic data. The most popular method relies on analysis of 16S rRNA sequences, but some studies employ also protein based analyses. It would be advantageous to have a method that is applicable directly to short sequences, of the kind extracted from samples in modern metagenomic research. This is achieved by the technique proposed here. RESULTS: We employ specific peptides, deduced from aminoacyl tRNA synthetases, as markers for the occurrence of single genes in data. Sequences carrying these markers are aligned and compared with each other to provide a lower limit for taxa counts in metagenomic data. The method is compared with 16S rRNA searches on a set of known genomes. The taxa counting problem is analyzed mathematically and a heuristic algorithm is proposed. When applied to genomic contigs of a recent human gut microbiome study, the taxa counting method provides information on numbers of different species and strains. We then apply our method to short read data and demonstrate how it can be calibrated to cope with errors. Comparison to known databases leads to estimates of the percentage of novelties, and the type of phyla involved. CONCLUSIONS: A major advantage of our method is its simplicity: it relies on searching sequences for the occurrence of just 4000 specific peptides belonging to the S61 subgroup of aaRS enzymes. When compared to other methods, it provides additional insight into the taxonomic contents of metagenomic data. Furthermore, it can be directly applied to short read data, avoiding the need for genomic contig reconstruction, and taking into account short reads that are otherwise discarded as singletons. Hence it is very suitable for a fast analysis of next generation sequencing data. BioMed Central 2012-02-10 /pmc/articles/PMC3319421/ /pubmed/22325056 http://dx.doi.org/10.1186/1471-2164-13-65 Text en Copyright ©2012 Persi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Persi, Erez
Weingart, Uri
Freilich, Shiri
Horn, David
Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data
title Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data
title_full Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data
title_fullStr Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data
title_full_unstemmed Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data
title_short Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data
title_sort peptide markers of aminoacyl trna synthetases facilitate taxa counting in metagenomic data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3319421/
https://www.ncbi.nlm.nih.gov/pubmed/22325056
http://dx.doi.org/10.1186/1471-2164-13-65
work_keys_str_mv AT persierez peptidemarkersofaminoacyltrnasynthetasesfacilitatetaxacountinginmetagenomicdata
AT weingarturi peptidemarkersofaminoacyltrnasynthetasesfacilitatetaxacountinginmetagenomicdata
AT freilichshiri peptidemarkersofaminoacyltrnasynthetasesfacilitatetaxacountinginmetagenomicdata
AT horndavid peptidemarkersofaminoacyltrnasynthetasesfacilitatetaxacountinginmetagenomicdata