Cargando…

Algorithms and semantic infrastructure for mutation impact extraction and grounding

BACKGROUND: Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-...

Descripción completa

Detalles Bibliográficos
Autores principales: Laurila, Jonas B, Naderi, Nona, Witte, René, Riazanov, Alexandre, Kouznetsov, Alexandre, Baker, Christopher JO
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3005927/
https://www.ncbi.nlm.nih.gov/pubmed/21143808
http://dx.doi.org/10.1186/1471-2164-11-S4-S24
_version_ 1782194148697178112
author Laurila, Jonas B
Naderi, Nona
Witte, René
Riazanov, Alexandre
Kouznetsov, Alexandre
Baker, Christopher JO
author_facet Laurila, Jonas B
Naderi, Nona
Witte, René
Riazanov, Alexandre
Kouznetsov, Alexandre
Baker, Christopher JO
author_sort Laurila, Jonas B
collection PubMed
description BACKGROUND: Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. RESULTS: We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. CONCLUSION: We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.
format Text
id pubmed-3005927
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30059272010-12-22 Algorithms and semantic infrastructure for mutation impact extraction and grounding Laurila, Jonas B Naderi, Nona Witte, René Riazanov, Alexandre Kouznetsov, Alexandre Baker, Christopher JO BMC Genomics Proceedings BACKGROUND: Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. RESULTS: We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. CONCLUSION: We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers. BioMed Central 2010-12-02 /pmc/articles/PMC3005927/ /pubmed/21143808 http://dx.doi.org/10.1186/1471-2164-11-S4-S24 Text en Copyright ©2010 Laurila et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Laurila, Jonas B
Naderi, Nona
Witte, René
Riazanov, Alexandre
Kouznetsov, Alexandre
Baker, Christopher JO
Algorithms and semantic infrastructure for mutation impact extraction and grounding
title Algorithms and semantic infrastructure for mutation impact extraction and grounding
title_full Algorithms and semantic infrastructure for mutation impact extraction and grounding
title_fullStr Algorithms and semantic infrastructure for mutation impact extraction and grounding
title_full_unstemmed Algorithms and semantic infrastructure for mutation impact extraction and grounding
title_short Algorithms and semantic infrastructure for mutation impact extraction and grounding
title_sort algorithms and semantic infrastructure for mutation impact extraction and grounding
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3005927/
https://www.ncbi.nlm.nih.gov/pubmed/21143808
http://dx.doi.org/10.1186/1471-2164-11-S4-S24
work_keys_str_mv AT laurilajonasb algorithmsandsemanticinfrastructureformutationimpactextractionandgrounding
AT naderinona algorithmsandsemanticinfrastructureformutationimpactextractionandgrounding
AT witterene algorithmsandsemanticinfrastructureformutationimpactextractionandgrounding
AT riazanovalexandre algorithmsandsemanticinfrastructureformutationimpactextractionandgrounding
AT kouznetsovalexandre algorithmsandsemanticinfrastructureformutationimpactextractionandgrounding
AT bakerchristopherjo algorithmsandsemanticinfrastructureformutationimpactextractionandgrounding