Cargando…

Identifying genotype-phenotype relationships in biomedical text

BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Su...

Descripción completa

Detalles Bibliográficos
Autores principales:	Khordad, Maryam, Mercer, Robert E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719522/ https://www.ncbi.nlm.nih.gov/pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8

_version_	1783284506277445632
author	Khordad, Maryam Mercer, Robert E.
author_facet	Khordad, Maryam Mercer, Robert E.
author_sort	Khordad, Maryam
collection	PubMed
description	BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required.
format	Online Article Text
id	pubmed-5719522
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57195222017-12-08 Identifying genotype-phenotype relationships in biomedical text Khordad, Maryam Mercer, Robert E. J Biomed Semantics Research BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required. BioMed Central 2017-12-06 /pmc/articles/PMC5719522/ /pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Khordad, Maryam Mercer, Robert E. Identifying genotype-phenotype relationships in biomedical text
title	Identifying genotype-phenotype relationships in biomedical text
title_full	Identifying genotype-phenotype relationships in biomedical text
title_fullStr	Identifying genotype-phenotype relationships in biomedical text
title_full_unstemmed	Identifying genotype-phenotype relationships in biomedical text
title_short	Identifying genotype-phenotype relationships in biomedical text
title_sort	identifying genotype-phenotype relationships in biomedical text
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719522/ https://www.ncbi.nlm.nih.gov/pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8
work_keys_str_mv	AT khordadmaryam identifyinggenotypephenotyperelationshipsinbiomedicaltext AT mercerroberte identifyinggenotypephenotyperelationshipsinbiomedicaltext

Identifying genotype-phenotype relationships in biomedical text

Ejemplares similares