Cargando…

Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data

High-throughput sequencing (HTS) has revolutionized science by enabling super-fast detection of genomic variants at base-pair resolution. Consequently, it poses the challenging problem of identification of technical artifacts, i.e. hidden non-random error patterns. Understanding the properties of se...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Subrata, Biswas, Nidhan K, Basu, Analabha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415152/
https://www.ncbi.nlm.nih.gov/pubmed/37378434
http://dx.doi.org/10.1093/nar/gkad539
_version_ 1785087459476897792
author Das, Subrata
Biswas, Nidhan K
Basu, Analabha
author_facet Das, Subrata
Biswas, Nidhan K
Basu, Analabha
author_sort Das, Subrata
collection PubMed
description High-throughput sequencing (HTS) has revolutionized science by enabling super-fast detection of genomic variants at base-pair resolution. Consequently, it poses the challenging problem of identification of technical artifacts, i.e. hidden non-random error patterns. Understanding the properties of sequencing artifacts holds the key in separating true variants from false positives. Here, we develop Mapinsights, a toolkit that performs quality control (QC) analysis of sequence alignment files, capable of detecting outliers based on sequencing artifacts of HTS data at a deeper resolution compared with existing methods. Mapinsights performs a cluster analysis based on novel and existing QC features derived from the sequence alignment for outlier detection. We applied Mapinsights on community standard open-source datasets and identified various quality issues including technical errors related to sequencing cycles, sequencing chemistry, sequencing libraries and across various orthogonal sequencing platforms. Mapinsights also enables identification of anomalies related to sequencing depth. A logistic regression-based model built on the features of Mapinsights shows high accuracy in detecting ‘low-confidence’ variant sites. Quantitative estimates and probabilistic arguments provided by Mapinsights can be utilized in identifying errors, bias and outlier samples, and also aid in improving the authenticity of variant calls.
format Online
Article
Text
id pubmed-10415152
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104151522023-08-12 Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data Das, Subrata Biswas, Nidhan K Basu, Analabha Nucleic Acids Res Methods High-throughput sequencing (HTS) has revolutionized science by enabling super-fast detection of genomic variants at base-pair resolution. Consequently, it poses the challenging problem of identification of technical artifacts, i.e. hidden non-random error patterns. Understanding the properties of sequencing artifacts holds the key in separating true variants from false positives. Here, we develop Mapinsights, a toolkit that performs quality control (QC) analysis of sequence alignment files, capable of detecting outliers based on sequencing artifacts of HTS data at a deeper resolution compared with existing methods. Mapinsights performs a cluster analysis based on novel and existing QC features derived from the sequence alignment for outlier detection. We applied Mapinsights on community standard open-source datasets and identified various quality issues including technical errors related to sequencing cycles, sequencing chemistry, sequencing libraries and across various orthogonal sequencing platforms. Mapinsights also enables identification of anomalies related to sequencing depth. A logistic regression-based model built on the features of Mapinsights shows high accuracy in detecting ‘low-confidence’ variant sites. Quantitative estimates and probabilistic arguments provided by Mapinsights can be utilized in identifying errors, bias and outlier samples, and also aid in improving the authenticity of variant calls. Oxford University Press 2023-06-28 /pmc/articles/PMC10415152/ /pubmed/37378434 http://dx.doi.org/10.1093/nar/gkad539 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Das, Subrata
Biswas, Nidhan K
Basu, Analabha
Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
title Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
title_full Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
title_fullStr Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
title_full_unstemmed Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
title_short Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
title_sort mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415152/
https://www.ncbi.nlm.nih.gov/pubmed/37378434
http://dx.doi.org/10.1093/nar/gkad539
work_keys_str_mv AT dassubrata mapinsightsdeepexplorationofqualityissuesanderrorprofilesinhighthroughputsequencedata
AT biswasnidhank mapinsightsdeepexplorationofqualityissuesanderrorprofilesinhighthroughputsequencedata
AT basuanalabha mapinsightsdeepexplorationofqualityissuesanderrorprofilesinhighthroughputsequencedata