Cargando…

GISMO—gene identification using a support vector machine for ORF classification

We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well...

Descripción completa

Detalles Bibliográficos
Autores principales: Krause, Lutz, McHardy, Alice C., Nattkemper, Tim W., Pühler, Alfred, Stoye, Jens, Meyer, Folker
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1802617/
https://www.ncbi.nlm.nih.gov/pubmed/17175534
http://dx.doi.org/10.1093/nar/gkl1083
_version_ 1782132395532615680
author Krause, Lutz
McHardy, Alice C.
Nattkemper, Tim W.
Pühler, Alfred
Stoye, Jens
Meyer, Folker
author_facet Krause, Lutz
McHardy, Alice C.
Nattkemper, Tim W.
Pühler, Alfred
Stoye, Jens
Meyer, Folker
author_sort Krause, Lutz
collection PubMed
description We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.
format Text
id pubmed-1802617
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-18026172007-03-01 GISMO—gene identification using a support vector machine for ORF classification Krause, Lutz McHardy, Alice C. Nattkemper, Tim W. Pühler, Alfred Stoye, Jens Meyer, Folker Nucleic Acids Res Genomics We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license. Oxford University Press 2007-01 2006-12-14 /pmc/articles/PMC1802617/ /pubmed/17175534 http://dx.doi.org/10.1093/nar/gkl1083 Text en © 2006 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Krause, Lutz
McHardy, Alice C.
Nattkemper, Tim W.
Pühler, Alfred
Stoye, Jens
Meyer, Folker
GISMO—gene identification using a support vector machine for ORF classification
title GISMO—gene identification using a support vector machine for ORF classification
title_full GISMO—gene identification using a support vector machine for ORF classification
title_fullStr GISMO—gene identification using a support vector machine for ORF classification
title_full_unstemmed GISMO—gene identification using a support vector machine for ORF classification
title_short GISMO—gene identification using a support vector machine for ORF classification
title_sort gismo—gene identification using a support vector machine for orf classification
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1802617/
https://www.ncbi.nlm.nih.gov/pubmed/17175534
http://dx.doi.org/10.1093/nar/gkl1083
work_keys_str_mv AT krauselutz gismogeneidentificationusingasupportvectormachinefororfclassification
AT mchardyalicec gismogeneidentificationusingasupportvectormachinefororfclassification
AT nattkempertimw gismogeneidentificationusingasupportvectormachinefororfclassification
AT puhleralfred gismogeneidentificationusingasupportvectormachinefororfclassification
AT stoyejens gismogeneidentificationusingasupportvectormachinefororfclassification
AT meyerfolker gismogeneidentificationusingasupportvectormachinefororfclassification