Cargando…

An analysis of the Sargasso Sea resource and the consequences for database composition

BACKGROUND: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species th...

Descripción completa

Detalles Bibliográficos
Autores principales: Tress, Michael L, Cozzetto, Domenico, Tramontano, Anna, Valencia, Alfonso
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1513258/
https://www.ncbi.nlm.nih.gov/pubmed/16623953
http://dx.doi.org/10.1186/1471-2105-7-213
_version_ 1782128469914681344
author Tress, Michael L
Cozzetto, Domenico
Tramontano, Anna
Valencia, Alfonso
author_facet Tress, Michael L
Cozzetto, Domenico
Tramontano, Anna
Valencia, Alfonso
author_sort Tress, Michael L
collection PubMed
description BACKGROUND: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method. These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource. RESULTS: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments. CONCLUSION: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques.
format Text
id pubmed-1513258
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15132582006-07-20 An analysis of the Sargasso Sea resource and the consequences for database composition Tress, Michael L Cozzetto, Domenico Tramontano, Anna Valencia, Alfonso BMC Bioinformatics Research Article BACKGROUND: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method. These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource. RESULTS: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments. CONCLUSION: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques. BioMed Central 2006-04-19 /pmc/articles/PMC1513258/ /pubmed/16623953 http://dx.doi.org/10.1186/1471-2105-7-213 Text en Copyright © 2006 Tress et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tress, Michael L
Cozzetto, Domenico
Tramontano, Anna
Valencia, Alfonso
An analysis of the Sargasso Sea resource and the consequences for database composition
title An analysis of the Sargasso Sea resource and the consequences for database composition
title_full An analysis of the Sargasso Sea resource and the consequences for database composition
title_fullStr An analysis of the Sargasso Sea resource and the consequences for database composition
title_full_unstemmed An analysis of the Sargasso Sea resource and the consequences for database composition
title_short An analysis of the Sargasso Sea resource and the consequences for database composition
title_sort analysis of the sargasso sea resource and the consequences for database composition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1513258/
https://www.ncbi.nlm.nih.gov/pubmed/16623953
http://dx.doi.org/10.1186/1471-2105-7-213
work_keys_str_mv AT tressmichaell ananalysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT cozzettodomenico ananalysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT tramontanoanna ananalysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT valenciaalfonso ananalysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT tressmichaell analysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT cozzettodomenico analysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT tramontanoanna analysisofthesargassosearesourceandtheconsequencesfordatabasecomposition
AT valenciaalfonso analysisofthesargassosearesourceandtheconsequencesfordatabasecomposition