Cargando…
Ensemble approach combining multiple methods improves human transcription start site prediction
BACKGROUND: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different feature...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3053590/ https://www.ncbi.nlm.nih.gov/pubmed/21118509 http://dx.doi.org/10.1186/1471-2164-11-677 |
_version_ | 1782199764723433472 |
---|---|
author | Dineen, David G Schröder, Markus Higgins, Desmond G Cunningham, Pádraig |
author_facet | Dineen, David G Schröder, Markus Higgins, Desmond G Cunningham, Pádraig |
author_sort | Dineen, David G |
collection | PubMed |
description | BACKGROUND: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. RESULTS: We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. CONCLUSIONS: Supervised learning methods are a useful way to combine predictions from diverse sources. |
format | Text |
id | pubmed-3053590 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30535902011-03-12 Ensemble approach combining multiple methods improves human transcription start site prediction Dineen, David G Schröder, Markus Higgins, Desmond G Cunningham, Pádraig BMC Genomics Research Article BACKGROUND: The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. RESULTS: We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. CONCLUSIONS: Supervised learning methods are a useful way to combine predictions from diverse sources. BioMed Central 2010-11-30 /pmc/articles/PMC3053590/ /pubmed/21118509 http://dx.doi.org/10.1186/1471-2164-11-677 Text en Copyright ©2010 Dineen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Dineen, David G Schröder, Markus Higgins, Desmond G Cunningham, Pádraig Ensemble approach combining multiple methods improves human transcription start site prediction |
title | Ensemble approach combining multiple methods improves human transcription start site prediction |
title_full | Ensemble approach combining multiple methods improves human transcription start site prediction |
title_fullStr | Ensemble approach combining multiple methods improves human transcription start site prediction |
title_full_unstemmed | Ensemble approach combining multiple methods improves human transcription start site prediction |
title_short | Ensemble approach combining multiple methods improves human transcription start site prediction |
title_sort | ensemble approach combining multiple methods improves human transcription start site prediction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3053590/ https://www.ncbi.nlm.nih.gov/pubmed/21118509 http://dx.doi.org/10.1186/1471-2164-11-677 |
work_keys_str_mv | AT dineendavidg ensembleapproachcombiningmultiplemethodsimproveshumantranscriptionstartsiteprediction AT schrodermarkus ensembleapproachcombiningmultiplemethodsimproveshumantranscriptionstartsiteprediction AT higginsdesmondg ensembleapproachcombiningmultiplemethodsimproveshumantranscriptionstartsiteprediction AT cunninghampadraig ensembleapproachcombiningmultiplemethodsimproveshumantranscriptionstartsiteprediction |