Cargando…

Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases

There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Assmus, Jens, Kleffe, Jürgen, Schmitt, Armin O., Brockmann, Gudrun A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3642179/
https://www.ncbi.nlm.nih.gov/pubmed/23658777
http://dx.doi.org/10.1371/journal.pone.0062803
_version_ 1782268117751169024
author Assmus, Jens
Kleffe, Jürgen
Schmitt, Armin O.
Brockmann, Gudrun A.
author_facet Assmus, Jens
Kleffe, Jürgen
Schmitt, Armin O.
Brockmann, Gudrun A.
author_sort Assmus, Jens
collection PubMed
description There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence.
format Online
Article
Text
id pubmed-3642179
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36421792013-05-08 Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases Assmus, Jens Kleffe, Jürgen Schmitt, Armin O. Brockmann, Gudrun A. PLoS One Research Article There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence. Public Library of Science 2013-05-02 /pmc/articles/PMC3642179/ /pubmed/23658777 http://dx.doi.org/10.1371/journal.pone.0062803 Text en © 2013 Assmus et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Assmus, Jens
Kleffe, Jürgen
Schmitt, Armin O.
Brockmann, Gudrun A.
Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
title Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
title_full Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
title_fullStr Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
title_full_unstemmed Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
title_short Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
title_sort equivalent indels – ambiguous functional classes and redundancy in databases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3642179/
https://www.ncbi.nlm.nih.gov/pubmed/23658777
http://dx.doi.org/10.1371/journal.pone.0062803
work_keys_str_mv AT assmusjens equivalentindelsambiguousfunctionalclassesandredundancyindatabases
AT kleffejurgen equivalentindelsambiguousfunctionalclassesandredundancyindatabases
AT schmittarmino equivalentindelsambiguousfunctionalclassesandredundancyindatabases
AT brockmanngudruna equivalentindelsambiguousfunctionalclassesandredundancyindatabases