Cargando…

Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments

PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK...

Descripción completa

Detalles Bibliográficos
Autores principales: Laganeckas, Mindaugas, Margelevičius, Mindaugas, Venclovas, Česlovas
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045609/
https://www.ncbi.nlm.nih.gov/pubmed/20961958
http://dx.doi.org/10.1093/nar/gkq958
_version_ 1782198855724433408
author Laganeckas, Mindaugas
Margelevičius, Mindaugas
Venclovas, Česlovas
author_facet Laganeckas, Mindaugas
Margelevičius, Mindaugas
Venclovas, Česlovas
author_sort Laganeckas, Mindaugas
collection PubMed
description PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile–profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/.
format Text
id pubmed-3045609
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-30456092011-02-28 Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments Laganeckas, Mindaugas Margelevičius, Mindaugas Venclovas, Česlovas Nucleic Acids Res Computational Biology PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile–profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/. Oxford University Press 2011-03 2010-10-20 /pmc/articles/PMC3045609/ /pubmed/20961958 http://dx.doi.org/10.1093/nar/gkq958 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Laganeckas, Mindaugas
Margelevičius, Mindaugas
Venclovas, Česlovas
Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
title Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
title_full Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
title_fullStr Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
title_full_unstemmed Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
title_short Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile–profile alignments
title_sort identification of new homologs of pd-(d/e)xk nucleases by support vector machines trained on data derived from profile–profile alignments
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045609/
https://www.ncbi.nlm.nih.gov/pubmed/20961958
http://dx.doi.org/10.1093/nar/gkq958
work_keys_str_mv AT laganeckasmindaugas identificationofnewhomologsofpddexknucleasesbysupportvectormachinestrainedondataderivedfromprofileprofilealignments
AT margeleviciusmindaugas identificationofnewhomologsofpddexknucleasesbysupportvectormachinestrainedondataderivedfromprofileprofilealignments
AT venclovasceslovas identificationofnewhomologsofpddexknucleasesbysupportvectormachinestrainedondataderivedfromprofileprofilealignments