Cargando…

eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity

Classifying proteins into families and superfamilies allows identification of functionally important conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools that recognize similar patterns in novel sequences, and thus enable the prediction o...

Descripción completa

Detalles Bibliográficos
Autores principales: Su, Qiaojuan Jane, Lu, Lin, Saxonov, Serge, Brutlag, Douglas L.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540014/
https://www.ncbi.nlm.nih.gov/pubmed/15608172
http://dx.doi.org/10.1093/nar/gki060
_version_ 1782122103696261120
author Su, Qiaojuan Jane
Lu, Lin
Saxonov, Serge
Brutlag, Douglas L.
author_facet Su, Qiaojuan Jane
Lu, Lin
Saxonov, Serge
Brutlag, Douglas L.
author_sort Su, Qiaojuan Jane
collection PubMed
description Classifying proteins into families and superfamilies allows identification of functionally important conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools that recognize similar patterns in novel sequences, and thus enable the prediction of protein function for genomes. The eBLOCKs database enumerates a cascade of protein blocks with varied conservation levels for each functional domain. A biologically important region is most stringently conserved among a smaller family of highly similar proteins. The same region is often found in a larger group of more remotely related proteins with a reduced stringency. Through enumeration, highly specific signatures can be generated from blocks with more columns and fewer family members, while highly sensitive signatures can be derived from blocks with fewer columns and more members as in a superfamily. By applying PSI-BLAST and a modified K-means clustering algorithm, eBLOCKs automatically groups protein sequences according to different levels of similarity. Multiple sequence alignments are made and trimmed into a series of ungapped blocks. Motifs and position-specific scoring matrices were derived from eBLOCKs and made available for sequence search and annotation. The eBLOCKs database provides a tool for high-throughput genome annotation with maximal specificity and sensitivity. The eBLOCKs database is freely available on the World Wide Web at http://motif.stanford.edu/eblocks/ to all users for online usage. Academic and not-for-profit institutions wishing copies of the program may contact Douglas L. Brutlag (brutlag@stanford.edu). Commercial firms wishing copies of the program for internal installation may contact Jacqueline Tay at the Stanford Office of Technology Licensing (jacqueline.tay@stanford.edu; http://otl.stanford.edu/).
format Text
id pubmed-540014
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-5400142005-01-04 eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity Su, Qiaojuan Jane Lu, Lin Saxonov, Serge Brutlag, Douglas L. Nucleic Acids Res Articles Classifying proteins into families and superfamilies allows identification of functionally important conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools that recognize similar patterns in novel sequences, and thus enable the prediction of protein function for genomes. The eBLOCKs database enumerates a cascade of protein blocks with varied conservation levels for each functional domain. A biologically important region is most stringently conserved among a smaller family of highly similar proteins. The same region is often found in a larger group of more remotely related proteins with a reduced stringency. Through enumeration, highly specific signatures can be generated from blocks with more columns and fewer family members, while highly sensitive signatures can be derived from blocks with fewer columns and more members as in a superfamily. By applying PSI-BLAST and a modified K-means clustering algorithm, eBLOCKs automatically groups protein sequences according to different levels of similarity. Multiple sequence alignments are made and trimmed into a series of ungapped blocks. Motifs and position-specific scoring matrices were derived from eBLOCKs and made available for sequence search and annotation. The eBLOCKs database provides a tool for high-throughput genome annotation with maximal specificity and sensitivity. The eBLOCKs database is freely available on the World Wide Web at http://motif.stanford.edu/eblocks/ to all users for online usage. Academic and not-for-profit institutions wishing copies of the program may contact Douglas L. Brutlag (brutlag@stanford.edu). Commercial firms wishing copies of the program for internal installation may contact Jacqueline Tay at the Stanford Office of Technology Licensing (jacqueline.tay@stanford.edu; http://otl.stanford.edu/). Oxford University Press 2005-01-01 2004-12-17 /pmc/articles/PMC540014/ /pubmed/15608172 http://dx.doi.org/10.1093/nar/gki060 Text en Copyright © 2005 Oxford University Press
spellingShingle Articles
Su, Qiaojuan Jane
Lu, Lin
Saxonov, Serge
Brutlag, Douglas L.
eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
title eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
title_full eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
title_fullStr eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
title_full_unstemmed eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
title_short eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
title_sort eblocks: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540014/
https://www.ncbi.nlm.nih.gov/pubmed/15608172
http://dx.doi.org/10.1093/nar/gki060
work_keys_str_mv AT suqiaojuanjane eblocksenumeratingconservedproteinblockstoachievemaximalsensitivityandspecificity
AT lulin eblocksenumeratingconservedproteinblockstoachievemaximalsensitivityandspecificity
AT saxonovserge eblocksenumeratingconservedproteinblockstoachievemaximalsensitivityandspecificity
AT brutlagdouglasl eblocksenumeratingconservedproteinblockstoachievemaximalsensitivityandspecificity