Cargando…

Identifying genotype-phenotype relationships in biomedical text

BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Su...

Descripción completa

Detalles Bibliográficos
Autores principales: Khordad, Maryam, Mercer, Robert E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719522/
https://www.ncbi.nlm.nih.gov/pubmed/29212530
http://dx.doi.org/10.1186/s13326-017-0163-8
_version_ 1783284506277445632
author Khordad, Maryam
Mercer, Robert E.
author_facet Khordad, Maryam
Mercer, Robert E.
author_sort Khordad, Maryam
collection PubMed
description BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required.
format Online
Article
Text
id pubmed-5719522
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57195222017-12-08 Identifying genotype-phenotype relationships in biomedical text Khordad, Maryam Mercer, Robert E. J Biomed Semantics Research BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required. BioMed Central 2017-12-06 /pmc/articles/PMC5719522/ /pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Khordad, Maryam
Mercer, Robert E.
Identifying genotype-phenotype relationships in biomedical text
title Identifying genotype-phenotype relationships in biomedical text
title_full Identifying genotype-phenotype relationships in biomedical text
title_fullStr Identifying genotype-phenotype relationships in biomedical text
title_full_unstemmed Identifying genotype-phenotype relationships in biomedical text
title_short Identifying genotype-phenotype relationships in biomedical text
title_sort identifying genotype-phenotype relationships in biomedical text
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719522/
https://www.ncbi.nlm.nih.gov/pubmed/29212530
http://dx.doi.org/10.1186/s13326-017-0163-8
work_keys_str_mv AT khordadmaryam identifyinggenotypephenotyperelationshipsinbiomedicaltext
AT mercerroberte identifyinggenotypephenotyperelationshipsinbiomedicaltext