Cargando…
Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies
Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7783389/ https://www.ncbi.nlm.nih.gov/pubmed/33414815 http://dx.doi.org/10.3389/fgene.2020.618862 |
_version_ | 1783632103021215744 |
---|---|
author | Biziukova, Nadezhda Tarasova, Olga Ivanov, Sergey Poroikov, Vladimir |
author_facet | Biziukova, Nadezhda Tarasova, Olga Ivanov, Sergey Poroikov, Vladimir |
author_sort | Biziukova, Nadezhda |
collection | PubMed |
description | Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment. |
format | Online Article Text |
id | pubmed-7783389 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-77833892021-01-06 Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies Biziukova, Nadezhda Tarasova, Olga Ivanov, Sergey Poroikov, Vladimir Front Genet Genetics Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment. Frontiers Media S.A. 2020-12-22 /pmc/articles/PMC7783389/ /pubmed/33414815 http://dx.doi.org/10.3389/fgene.2020.618862 Text en Copyright © 2020 Biziukova, Tarasova, Ivanov and Poroikov. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Biziukova, Nadezhda Tarasova, Olga Ivanov, Sergey Poroikov, Vladimir Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies |
title | Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies |
title_full | Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies |
title_fullStr | Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies |
title_full_unstemmed | Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies |
title_short | Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies |
title_sort | automated extraction of information from texts of scientific publications: insights into hiv treatment strategies |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7783389/ https://www.ncbi.nlm.nih.gov/pubmed/33414815 http://dx.doi.org/10.3389/fgene.2020.618862 |
work_keys_str_mv | AT biziukovanadezhda automatedextractionofinformationfromtextsofscientificpublicationsinsightsintohivtreatmentstrategies AT tarasovaolga automatedextractionofinformationfromtextsofscientificpublicationsinsightsintohivtreatmentstrategies AT ivanovsergey automatedextractionofinformationfromtextsofscientificpublicationsinsightsintohivtreatmentstrategies AT poroikovvladimir automatedextractionofinformationfromtextsofscientificpublicationsinsightsintohivtreatmentstrategies |