Cargando…
Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome
BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESUL...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836841/ https://www.ncbi.nlm.nih.gov/pubmed/29504893 http://dx.doi.org/10.1186/s12864-018-4475-6 |
_version_ | 1783304015688237056 |
---|---|
author | Naumenko, Fedor M. Abnizova, Irina I. Beka, Nathan Genaev, Mikhail A. Orlov, Yuriy L. |
author_facet | Naumenko, Fedor M. Abnizova, Irina I. Beka, Nathan Genaev, Mikhail A. Orlov, Yuriy L. |
author_sort | Naumenko, Fedor M. |
collection | PubMed |
description | BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESULTS: We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. CONCLUSIONS: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data. |
format | Online Article Text |
id | pubmed-5836841 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58368412018-03-07 Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome Naumenko, Fedor M. Abnizova, Irina I. Beka, Nathan Genaev, Mikhail A. Orlov, Yuriy L. BMC Genomics Research BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESULTS: We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. CONCLUSIONS: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data. BioMed Central 2018-02-09 /pmc/articles/PMC5836841/ /pubmed/29504893 http://dx.doi.org/10.1186/s12864-018-4475-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Naumenko, Fedor M. Abnizova, Irina I. Beka, Nathan Genaev, Mikhail A. Orlov, Yuriy L. Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_full | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_fullStr | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_full_unstemmed | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_short | Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
title_sort | novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836841/ https://www.ncbi.nlm.nih.gov/pubmed/29504893 http://dx.doi.org/10.1186/s12864-018-4475-6 |
work_keys_str_mv | AT naumenkofedorm novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT abnizovairinai novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT bekanathan novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT genaevmikhaila novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome AT orlovyuriyl novelreaddensitydistributionscoreshowspossiblealignerartefactswhenmappingasinglechromosome |