Cargando…
Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045609/ https://www.ncbi.nlm.nih.gov/pubmed/20961958 http://dx.doi.org/10.1093/nar/gkq958 |
_version_ | 1782198855724433408 |
---|---|
author | Laganeckas, Mindaugas Margelevičius, Mindaugas Venclovas, Česlovas |
author_facet | Laganeckas, Mindaugas Margelevičius, Mindaugas Venclovas, Česlovas |
author_sort | Laganeckas, Mindaugas |
collection | PubMed |
description | PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile–profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/. |
format | Text |
id | pubmed-3045609 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-30456092011-02-28 Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments Laganeckas, Mindaugas Margelevičius, Mindaugas Venclovas, Česlovas Nucleic Acids Res Computational Biology PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile–profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/. Oxford University Press 2011-03 2010-10-20 /pmc/articles/PMC3045609/ /pubmed/20961958 http://dx.doi.org/10.1093/nar/gkq958 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Laganeckas, Mindaugas Margelevičius, Mindaugas Venclovas, Česlovas Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments |
title | Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments |
title_full | Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments |
title_fullStr | Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments |
title_full_unstemmed | Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments |
title_short | Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments |
title_sort | identification of new homologs of pd-(d/e)xk nucleases by support vector machines trained on data derived from profile–profile alignments |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045609/ https://www.ncbi.nlm.nih.gov/pubmed/20961958 http://dx.doi.org/10.1093/nar/gkq958 |
work_keys_str_mv | AT laganeckasmindaugas identificationofnewhomologsofpddexknucleasesbysupportvectormachinestrainedondataderivedfromprofileprofilealignments AT margeleviciusmindaugas identificationofnewhomologsofpddexknucleasesbysupportvectormachinestrainedondataderivedfromprofileprofilealignments AT venclovasceslovas identificationofnewhomologsofpddexknucleasesbysupportvectormachinestrainedondataderivedfromprofileprofilealignments |