Cargando…

A semi-automated methodology for finding lipid-related GO terms

Motivation: Although semantic similarity in Gene Ontology (GO) and other approaches may be used to find similar GO terms, there is yet a method to systematically find a class of GO terms sharing a common property with high accuracy (e.g. involving human curation). Results: We have developed a method...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Mengyuan, Low, Hong Sang, Wenk, Markus R., Wong, Limsoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4160098/
https://www.ncbi.nlm.nih.gov/pubmed/25209026
http://dx.doi.org/10.1093/database/bau089
_version_ 1782334337890385920
author Fan, Mengyuan
Low, Hong Sang
Wenk, Markus R.
Wong, Limsoon
author_facet Fan, Mengyuan
Low, Hong Sang
Wenk, Markus R.
Wong, Limsoon
author_sort Fan, Mengyuan
collection PubMed
description Motivation: Although semantic similarity in Gene Ontology (GO) and other approaches may be used to find similar GO terms, there is yet a method to systematically find a class of GO terms sharing a common property with high accuracy (e.g. involving human curation). Results: We have developed a methodology to address this issue and applied it to identify lipid-related GO terms, owing to the important and varied roles of lipids in many biological processes. Our methodology finds lipid-related GO terms in a semi-automated manner, requiring only moderate manual curation. We first obtain a list of lipid-related gold-standard GO terms by keyword search and manual curation. Then, based on the hypothesis that co-annotated GO terms share similar properties, we develop a machine learning method that expands the list of lipid-related terms from the gold standard. Those terms predicted most likely to be lipid related are examined by a human curator following specific curation rules to confirm the class labels. The structure of GO is also exploited to help reduce the curation effort. The prediction and curation cycle is repeated until no further lipid-related term is found. Our approach has covered a high proportion, if not all, of lipid-related terms with relatively high efficiency. Database URL: http://compbio.ddns.comp.nus.edu.sg/∼lipidgo
format Online
Article
Text
id pubmed-4160098
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41600982014-09-11 A semi-automated methodology for finding lipid-related GO terms Fan, Mengyuan Low, Hong Sang Wenk, Markus R. Wong, Limsoon Database (Oxford) Original Article Motivation: Although semantic similarity in Gene Ontology (GO) and other approaches may be used to find similar GO terms, there is yet a method to systematically find a class of GO terms sharing a common property with high accuracy (e.g. involving human curation). Results: We have developed a methodology to address this issue and applied it to identify lipid-related GO terms, owing to the important and varied roles of lipids in many biological processes. Our methodology finds lipid-related GO terms in a semi-automated manner, requiring only moderate manual curation. We first obtain a list of lipid-related gold-standard GO terms by keyword search and manual curation. Then, based on the hypothesis that co-annotated GO terms share similar properties, we develop a machine learning method that expands the list of lipid-related terms from the gold standard. Those terms predicted most likely to be lipid related are examined by a human curator following specific curation rules to confirm the class labels. The structure of GO is also exploited to help reduce the curation effort. The prediction and curation cycle is repeated until no further lipid-related term is found. Our approach has covered a high proportion, if not all, of lipid-related terms with relatively high efficiency. Database URL: http://compbio.ddns.comp.nus.edu.sg/∼lipidgo Oxford University Press 2014-09-10 /pmc/articles/PMC4160098/ /pubmed/25209026 http://dx.doi.org/10.1093/database/bau089 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Fan, Mengyuan
Low, Hong Sang
Wenk, Markus R.
Wong, Limsoon
A semi-automated methodology for finding lipid-related GO terms
title A semi-automated methodology for finding lipid-related GO terms
title_full A semi-automated methodology for finding lipid-related GO terms
title_fullStr A semi-automated methodology for finding lipid-related GO terms
title_full_unstemmed A semi-automated methodology for finding lipid-related GO terms
title_short A semi-automated methodology for finding lipid-related GO terms
title_sort semi-automated methodology for finding lipid-related go terms
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4160098/
https://www.ncbi.nlm.nih.gov/pubmed/25209026
http://dx.doi.org/10.1093/database/bau089
work_keys_str_mv AT fanmengyuan asemiautomatedmethodologyforfindinglipidrelatedgoterms
AT lowhongsang asemiautomatedmethodologyforfindinglipidrelatedgoterms
AT wenkmarkusr asemiautomatedmethodologyforfindinglipidrelatedgoterms
AT wonglimsoon asemiautomatedmethodologyforfindinglipidrelatedgoterms
AT fanmengyuan semiautomatedmethodologyforfindinglipidrelatedgoterms
AT lowhongsang semiautomatedmethodologyforfindinglipidrelatedgoterms
AT wenkmarkusr semiautomatedmethodologyforfindinglipidrelatedgoterms
AT wonglimsoon semiautomatedmethodologyforfindinglipidrelatedgoterms