Cargando…
UPS-indel: a Universal Positioning System for Indels
Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel callin...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658412/ https://www.ncbi.nlm.nih.gov/pubmed/29074871 http://dx.doi.org/10.1038/s41598-017-14400-1 |
_version_ | 1783273992094744576 |
---|---|
author | Hasan, Mohammad Shabbir Wu, Xiaowei Watson, Layne T. Zhang, Liqing |
author_facet | Hasan, Mohammad Shabbir Wu, Xiaowei Watson, Layne T. Zhang, Liqing |
author_sort | Hasan, Mohammad Shabbir |
collection | PubMed |
description | Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive. |
format | Online Article Text |
id | pubmed-5658412 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-56584122017-10-31 UPS-indel: a Universal Positioning System for Indels Hasan, Mohammad Shabbir Wu, Xiaowei Watson, Layne T. Zhang, Liqing Sci Rep Article Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive. Nature Publishing Group UK 2017-10-26 /pmc/articles/PMC5658412/ /pubmed/29074871 http://dx.doi.org/10.1038/s41598-017-14400-1 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Hasan, Mohammad Shabbir Wu, Xiaowei Watson, Layne T. Zhang, Liqing UPS-indel: a Universal Positioning System for Indels |
title | UPS-indel: a Universal Positioning System for Indels |
title_full | UPS-indel: a Universal Positioning System for Indels |
title_fullStr | UPS-indel: a Universal Positioning System for Indels |
title_full_unstemmed | UPS-indel: a Universal Positioning System for Indels |
title_short | UPS-indel: a Universal Positioning System for Indels |
title_sort | ups-indel: a universal positioning system for indels |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658412/ https://www.ncbi.nlm.nih.gov/pubmed/29074871 http://dx.doi.org/10.1038/s41598-017-14400-1 |
work_keys_str_mv | AT hasanmohammadshabbir upsindelauniversalpositioningsystemforindels AT wuxiaowei upsindelauniversalpositioningsystemforindels AT watsonlaynet upsindelauniversalpositioningsystemforindels AT zhangliqing upsindelauniversalpositioningsystemforindels |