Cargando…

Automated Training for Algorithms That Learn from Genomic Data

Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not inc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cilingir, Gokcen, Broschat, Shira L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4324891/ https://www.ncbi.nlm.nih.gov/pubmed/25695053 http://dx.doi.org/10.1155/2015/234236

_version_	1782356743823556608
author	Cilingir, Gokcen Broschat, Shira L.
author_facet	Cilingir, Gokcen Broschat, Shira L.
author_sort	Cilingir, Gokcen
collection	PubMed
description	Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable.
format	Online Article Text
id	pubmed-4324891
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-43248912015-02-18 Automated Training for Algorithms That Learn from Genomic Data Cilingir, Gokcen Broschat, Shira L. Biomed Res Int Research Article Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable. Hindawi Publishing Corporation 2015 2015-01-28 /pmc/articles/PMC4324891/ /pubmed/25695053 http://dx.doi.org/10.1155/2015/234236 Text en Copyright © 2015 G. Cilingir and S. L. Broschat. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Cilingir, Gokcen Broschat, Shira L. Automated Training for Algorithms That Learn from Genomic Data
title	Automated Training for Algorithms That Learn from Genomic Data
title_full	Automated Training for Algorithms That Learn from Genomic Data
title_fullStr	Automated Training for Algorithms That Learn from Genomic Data
title_full_unstemmed	Automated Training for Algorithms That Learn from Genomic Data
title_short	Automated Training for Algorithms That Learn from Genomic Data
title_sort	automated training for algorithms that learn from genomic data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4324891/ https://www.ncbi.nlm.nih.gov/pubmed/25695053 http://dx.doi.org/10.1155/2015/234236
work_keys_str_mv	AT cilingirgokcen automatedtrainingforalgorithmsthatlearnfromgenomicdata AT broschatshiral automatedtrainingforalgorithmsthatlearnfromgenomicdata

Automated Training for Algorithms That Learn from Genomic Data

Ejemplares similares