Cargando…

UPS-indel: a Universal Positioning System for Indels

Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel callin...

Descripción completa

Detalles Bibliográficos
Autores principales: Hasan, Mohammad Shabbir, Wu, Xiaowei, Watson, Layne T., Zhang, Liqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658412/
https://www.ncbi.nlm.nih.gov/pubmed/29074871
http://dx.doi.org/10.1038/s41598-017-14400-1
_version_ 1783273992094744576
author Hasan, Mohammad Shabbir
Wu, Xiaowei
Watson, Layne T.
Zhang, Liqing
author_facet Hasan, Mohammad Shabbir
Wu, Xiaowei
Watson, Layne T.
Zhang, Liqing
author_sort Hasan, Mohammad Shabbir
collection PubMed
description Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.
format Online
Article
Text
id pubmed-5658412
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-56584122017-10-31 UPS-indel: a Universal Positioning System for Indels Hasan, Mohammad Shabbir Wu, Xiaowei Watson, Layne T. Zhang, Liqing Sci Rep Article Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive. Nature Publishing Group UK 2017-10-26 /pmc/articles/PMC5658412/ /pubmed/29074871 http://dx.doi.org/10.1038/s41598-017-14400-1 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Hasan, Mohammad Shabbir
Wu, Xiaowei
Watson, Layne T.
Zhang, Liqing
UPS-indel: a Universal Positioning System for Indels
title UPS-indel: a Universal Positioning System for Indels
title_full UPS-indel: a Universal Positioning System for Indels
title_fullStr UPS-indel: a Universal Positioning System for Indels
title_full_unstemmed UPS-indel: a Universal Positioning System for Indels
title_short UPS-indel: a Universal Positioning System for Indels
title_sort ups-indel: a universal positioning system for indels
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658412/
https://www.ncbi.nlm.nih.gov/pubmed/29074871
http://dx.doi.org/10.1038/s41598-017-14400-1
work_keys_str_mv AT hasanmohammadshabbir upsindelauniversalpositioningsystemforindels
AT wuxiaowei upsindelauniversalpositioningsystemforindels
AT watsonlaynet upsindelauniversalpositioningsystemforindels
AT zhangliqing upsindelauniversalpositioningsystemforindels