Cargando…

Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

BACKGROUND: Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thomas, Philippe E, Klinger, Roman, Furlong, Laura I, Hofmann-Apitius, Martin, Friedrich, Christoph M
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194196/ https://www.ncbi.nlm.nih.gov/pubmed/21992066 http://dx.doi.org/10.1186/1471-2105-12-S4-S4

_version_	1782213928123629568
author	Thomas, Philippe E Klinger, Roman Furlong, Laura I Hofmann-Apitius, Martin Friedrich, Christoph M
author_facet	Thomas, Philippe E Klinger, Roman Furlong, Laura I Hofmann-Apitius, Martin Friedrich, Christoph M
author_sort	Thomas, Philippe E
collection	PubMed
description	BACKGROUND: Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. RESULTS: This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. CONCLUSIONS: Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus.
format	Online Article Text
id	pubmed-3194196
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31941962011-10-17 Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers Thomas, Philippe E Klinger, Roman Furlong, Laura I Hofmann-Apitius, Martin Friedrich, Christoph M BMC Bioinformatics Research BACKGROUND: Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. RESULTS: This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. CONCLUSIONS: Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. BioMed Central 2011-07-05 /pmc/articles/PMC3194196/ /pubmed/21992066 http://dx.doi.org/10.1186/1471-2105-12-S4-S4 Text en Copyright ©2011 Thomas et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Thomas, Philippe E Klinger, Roman Furlong, Laura I Hofmann-Apitius, Martin Friedrich, Christoph M Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
title	Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
title_full	Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
title_fullStr	Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
title_full_unstemmed	Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
title_short	Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
title_sort	challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194196/ https://www.ncbi.nlm.nih.gov/pubmed/21992066 http://dx.doi.org/10.1186/1471-2105-12-S4-S4
work_keys_str_mv	AT thomasphilippee challengesintheassociationofhumansinglenucleotidepolymorphismmentionswithuniquedatabaseidentifiers AT klingerroman challengesintheassociationofhumansinglenucleotidepolymorphismmentionswithuniquedatabaseidentifiers AT furlonglaurai challengesintheassociationofhumansinglenucleotidepolymorphismmentionswithuniquedatabaseidentifiers AT hofmannapitiusmartin challengesintheassociationofhumansinglenucleotidepolymorphismmentionswithuniquedatabaseidentifiers AT friedrichchristophm challengesintheassociationofhumansinglenucleotidepolymorphismmentionswithuniquedatabaseidentifiers

Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers

Ejemplares similares