Cargando…

The Booly aliasing resource: a database of grouped biological identifiers

Redundancy among sequence identifiers is a recurring problem in bioinformatics. Here, we present a rapid and efficient method of fingerprinting identifiers to ascertain whether two or more aliases are identical. A number of tools and approaches have been developed to resolve differing names for the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Do, Long Hoang, Bier, Ethan
Formato:	Texto
Lenguaje:	English
Publicado:	Biomedical Informatics 2011
Materias:	Database
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3082858/ https://www.ncbi.nlm.nih.gov/pubmed/21544171

_version_	1782202340924719104
author	Do, Long Hoang Bier, Ethan
author_facet	Do, Long Hoang Bier, Ethan
author_sort	Do, Long Hoang
collection	PubMed
description	Redundancy among sequence identifiers is a recurring problem in bioinformatics. Here, we present a rapid and efficient method of fingerprinting identifiers to ascertain whether two or more aliases are identical. A number of tools and approaches have been developed to resolve differing names for the same genes and proteins, however, these methods each have their own limitations associated with their various goals. We have taken a different approach to the aliasing problem by simplifying the way aliases are stored and curated with the objective of simultaneously achieving speed and flexibility. Our approach (Booly-hashing) is to link identifiers with their corresponding hash keys derived from unique fingerprints such as gene or protein sequences. This tool has proven invaluable for designing a new data integration platform known as Booly, and has wide applicability to situations in which a dedicated efficient aliasing system is required. Compared with other aliasing techniques, Booly-hashing methodology provides 1) reduced run time complexity, 2) increased flexibility (aliasing of other data types, e.g. pharmaceutical drugs), 3) no required assumptions regarding gene clusters or hierarchies, and 4) simplicity in data addition, updating, and maintenance. The new Booly-hashing aliasing model has been incorporated as a central component of the Booly data integration platform we have recently developed and shoud be broadly applicable to other situations in which an efficient streamlined aliasing systems is required. This aliasing tool and database, which allows users to quickly group the same genes and proteins together can be accessed at: http://booly.ucsd.edu/alias. AVAILABILITY: The database is available for free at http://booly.ucsd.edu/alias
format	Text
id	pubmed-3082858
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Biomedical Informatics
record_format	MEDLINE/PubMed
spelling	pubmed-30828582011-05-04 The Booly aliasing resource: a database of grouped biological identifiers Do, Long Hoang Bier, Ethan Bioinformation Database Redundancy among sequence identifiers is a recurring problem in bioinformatics. Here, we present a rapid and efficient method of fingerprinting identifiers to ascertain whether two or more aliases are identical. A number of tools and approaches have been developed to resolve differing names for the same genes and proteins, however, these methods each have their own limitations associated with their various goals. We have taken a different approach to the aliasing problem by simplifying the way aliases are stored and curated with the objective of simultaneously achieving speed and flexibility. Our approach (Booly-hashing) is to link identifiers with their corresponding hash keys derived from unique fingerprints such as gene or protein sequences. This tool has proven invaluable for designing a new data integration platform known as Booly, and has wide applicability to situations in which a dedicated efficient aliasing system is required. Compared with other aliasing techniques, Booly-hashing methodology provides 1) reduced run time complexity, 2) increased flexibility (aliasing of other data types, e.g. pharmaceutical drugs), 3) no required assumptions regarding gene clusters or hierarchies, and 4) simplicity in data addition, updating, and maintenance. The new Booly-hashing aliasing model has been incorporated as a central component of the Booly data integration platform we have recently developed and shoud be broadly applicable to other situations in which an efficient streamlined aliasing systems is required. This aliasing tool and database, which allows users to quickly group the same genes and proteins together can be accessed at: http://booly.ucsd.edu/alias. AVAILABILITY: The database is available for free at http://booly.ucsd.edu/alias Biomedical Informatics 2011-03-26 /pmc/articles/PMC3082858/ /pubmed/21544171 Text en © 2011 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle	Database Do, Long Hoang Bier, Ethan The Booly aliasing resource: a database of grouped biological identifiers
title	The Booly aliasing resource: a database of grouped biological identifiers
title_full	The Booly aliasing resource: a database of grouped biological identifiers
title_fullStr	The Booly aliasing resource: a database of grouped biological identifiers
title_full_unstemmed	The Booly aliasing resource: a database of grouped biological identifiers
title_short	The Booly aliasing resource: a database of grouped biological identifiers
title_sort	booly aliasing resource: a database of grouped biological identifiers
topic	Database
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3082858/ https://www.ncbi.nlm.nih.gov/pubmed/21544171
work_keys_str_mv	AT dolonghoang theboolyaliasingresourceadatabaseofgroupedbiologicalidentifiers AT bierethan theboolyaliasingresourceadatabaseofgroupedbiologicalidentifiers AT dolonghoang boolyaliasingresourceadatabaseofgroupedbiologicalidentifiers AT bierethan boolyaliasingresourceadatabaseofgroupedbiologicalidentifiers

The Booly aliasing resource: a database of grouped biological identifiers

Ejemplares similares