Cargando…

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESUL...

Descripción completa

Detalles Bibliográficos
Autores principales: Naumenko, Fedor M., Abnizova, Irina I., Beka, Nathan, Genaev, Mikhail A., Orlov, Yuriy L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836841/
https://www.ncbi.nlm.nih.gov/pubmed/29504893
http://dx.doi.org/10.1186/s12864-018-4475-6
_version_ 1783304015688237056
author Naumenko, Fedor M.
Abnizova, Irina I.
Beka, Nathan
Genaev, Mikhail A.
Orlov, Yuriy L.
author_facet Naumenko, Fedor M.
Abnizova, Irina I.
Beka, Nathan
Genaev, Mikhail A.
Orlov, Yuriy L.
author_sort Naumenko, Fedor M.
collection PubMed
description BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESULTS: We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. CONCLUSIONS: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.
format Online
Article
Text
id pubmed-5836841
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58368412018-03-07 Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome Naumenko, Fedor M. Abnizova, Irina I. Beka, Nathan Genaev, Mikhail A. Orlov, Yuriy L. BMC Genomics Research BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESULTS: We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. CONCLUSIONS: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data. BioMed Central 2018-02-09 /pmc/articles/PMC5836841/ /pubmed/29504893 http://dx.doi.org/10.1186/s12864-018-4475-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Naumenko, Fedor M.
Abnizova, Irina I.
Beka, Nathan
Genaev, Mikhail A.
Orlov, Yuriy L.
Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_full Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_fullStr Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_full_unstemmed Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_short Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
title_sort novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836841/
https://www.ncbi.nlm.nih.gov/pubmed/29504893
http://dx.doi.org/10.1186/s12864-018-4475-6
work_keys_str_mv AT naumenkofedorm novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT abnizovairinai novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT bekanathan novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT genaevmikhaila novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome
AT orlovyuriyl novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome