Cargando…
Identifying genotype-phenotype relationships in biomedical text
BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Su...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719522/ https://www.ncbi.nlm.nih.gov/pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8 |
_version_ | 1783284506277445632 |
---|---|
author | Khordad, Maryam Mercer, Robert E. |
author_facet | Khordad, Maryam Mercer, Robert E. |
author_sort | Khordad, Maryam |
collection | PubMed |
description | BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required. |
format | Online Article Text |
id | pubmed-5719522 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57195222017-12-08 Identifying genotype-phenotype relationships in biomedical text Khordad, Maryam Mercer, Robert E. J Biomed Semantics Research BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set. RESULTS: The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77). CONCLUSIONS: Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required. BioMed Central 2017-12-06 /pmc/articles/PMC5719522/ /pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Khordad, Maryam Mercer, Robert E. Identifying genotype-phenotype relationships in biomedical text |
title | Identifying genotype-phenotype relationships in biomedical text |
title_full | Identifying genotype-phenotype relationships in biomedical text |
title_fullStr | Identifying genotype-phenotype relationships in biomedical text |
title_full_unstemmed | Identifying genotype-phenotype relationships in biomedical text |
title_short | Identifying genotype-phenotype relationships in biomedical text |
title_sort | identifying genotype-phenotype relationships in biomedical text |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719522/ https://www.ncbi.nlm.nih.gov/pubmed/29212530 http://dx.doi.org/10.1186/s13326-017-0163-8 |
work_keys_str_mv | AT khordadmaryam identifyinggenotypephenotyperelationshipsinbiomedicaltext AT mercerroberte identifyinggenotypephenotyperelationshipsinbiomedicaltext |