Cargando…
Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets
Automatic sequence annotation is an essential component of modern ‘omics’ studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can b...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4131829/ https://www.ncbi.nlm.nih.gov/pubmed/24501397 http://dx.doi.org/10.1093/dnares/dsu001 |
_version_ | 1782330526630150144 |
---|---|
author | Muñoz-Mérida, Antonio Viguera, Enrique Claros, M. Gonzalo Trelles, Oswaldo Pérez-Pulido, Antonio J. |
author_facet | Muñoz-Mérida, Antonio Viguera, Enrique Claros, M. Gonzalo Trelles, Oswaldo Pérez-Pulido, Antonio J. |
author_sort | Muñoz-Mérida, Antonio |
collection | PubMed |
description | Automatic sequence annotation is an essential component of modern ‘omics’ studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ∼85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. |
format | Online Article Text |
id | pubmed-4131829 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-41318292014-08-18 Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets Muñoz-Mérida, Antonio Viguera, Enrique Claros, M. Gonzalo Trelles, Oswaldo Pérez-Pulido, Antonio J. DNA Res Full Papers Automatic sequence annotation is an essential component of modern ‘omics’ studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ∼85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. Oxford University Press 2014-08 2014-02-05 /pmc/articles/PMC4131829/ /pubmed/24501397 http://dx.doi.org/10.1093/dnares/dsu001 Text en © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com. |
spellingShingle | Full Papers Muñoz-Mérida, Antonio Viguera, Enrique Claros, M. Gonzalo Trelles, Oswaldo Pérez-Pulido, Antonio J. Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets |
title | Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets |
title_full | Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets |
title_fullStr | Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets |
title_full_unstemmed | Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets |
title_short | Sma3s: A Three-Step Modular Annotator for Large Sequence Datasets |
title_sort | sma3s: a three-step modular annotator for large sequence datasets |
topic | Full Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4131829/ https://www.ncbi.nlm.nih.gov/pubmed/24501397 http://dx.doi.org/10.1093/dnares/dsu001 |
work_keys_str_mv | AT munozmeridaantonio sma3sathreestepmodularannotatorforlargesequencedatasets AT vigueraenrique sma3sathreestepmodularannotatorforlargesequencedatasets AT clarosmgonzalo sma3sathreestepmodularannotatorforlargesequencedatasets AT trellesoswaldo sma3sathreestepmodularannotatorforlargesequencedatasets AT perezpulidoantonioj sma3sathreestepmodularannotatorforlargesequencedatasets |