Cargando…

MultiSeq: unifying sequence and structure data for evolutionary analysis

BACKGROUND: Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Roberts, Elijah, Eargle, John, Wright, Dan, Luthey-Schulten, Zaida
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1586216/ https://www.ncbi.nlm.nih.gov/pubmed/16914055 http://dx.doi.org/10.1186/1471-2105-7-382

_version_	1782130355711508480
author	Roberts, Elijah Eargle, John Wright, Dan Luthey-Schulten, Zaida
author_facet	Roberts, Elijah Eargle, John Wright, Dan Luthey-Schulten, Zaida
author_sort	Roberts, Elijah
collection	PubMed
description	BACKGROUND: Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. RESULTS: Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. CONCLUSION: MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software:
format	Text
id	pubmed-1586216
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15862162006-10-05 MultiSeq: unifying sequence and structure data for evolutionary analysis Roberts, Elijah Eargle, John Wright, Dan Luthey-Schulten, Zaida BMC Bioinformatics Software BACKGROUND: Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. RESULTS: Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. CONCLUSION: MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software: BioMed Central 2006-08-16 /pmc/articles/PMC1586216/ /pubmed/16914055 http://dx.doi.org/10.1186/1471-2105-7-382 Text en Copyright © 2006 Roberts et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Roberts, Elijah Eargle, John Wright, Dan Luthey-Schulten, Zaida MultiSeq: unifying sequence and structure data for evolutionary analysis
title	MultiSeq: unifying sequence and structure data for evolutionary analysis
title_full	MultiSeq: unifying sequence and structure data for evolutionary analysis
title_fullStr	MultiSeq: unifying sequence and structure data for evolutionary analysis
title_full_unstemmed	MultiSeq: unifying sequence and structure data for evolutionary analysis
title_short	MultiSeq: unifying sequence and structure data for evolutionary analysis
title_sort	multiseq: unifying sequence and structure data for evolutionary analysis
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1586216/ https://www.ncbi.nlm.nih.gov/pubmed/16914055 http://dx.doi.org/10.1186/1471-2105-7-382
work_keys_str_mv	AT robertselijah multisequnifyingsequenceandstructuredataforevolutionaryanalysis AT earglejohn multisequnifyingsequenceandstructuredataforevolutionaryanalysis AT wrightdan multisequnifyingsequenceandstructuredataforevolutionaryanalysis AT lutheyschultenzaida multisequnifyingsequenceandstructuredataforevolutionaryanalysis

MultiSeq: unifying sequence and structure data for evolutionary analysis

Ejemplares similares