Cargando…

New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies

During the last (15) years, improved omics sequencing technologies have expanded the scale and resolution of various biological applications, generating high-throughput datasets that require carefully chosen software tools to be processed. Therefore, following the sequencing development, bioinformat...

Descripción completa

Detalles Bibliográficos
Autores principales: Donato, Luigi, Scimone, Concetta, Rinaldi, Carmela, D’Angelo, Rosalia, Sidoti, Antonina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer London 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208613/
https://www.ncbi.nlm.nih.gov/pubmed/34155424
http://dx.doi.org/10.1007/s00521-021-06188-z
_version_ 1783708959640649728
author Donato, Luigi
Scimone, Concetta
Rinaldi, Carmela
D’Angelo, Rosalia
Sidoti, Antonina
author_facet Donato, Luigi
Scimone, Concetta
Rinaldi, Carmela
D’Angelo, Rosalia
Sidoti, Antonina
author_sort Donato, Luigi
collection PubMed
description During the last (15) years, improved omics sequencing technologies have expanded the scale and resolution of various biological applications, generating high-throughput datasets that require carefully chosen software tools to be processed. Therefore, following the sequencing development, bioinformatics researchers have been challenged to implement alignment algorithms for next-generation sequencing reads. However, nowadays selection of aligners based on genome characteristics is poorly studied, so our benchmarking study extended the “state of art” comparing 17 different aligners. The chosen tools were assessed on empirical human DNA- and RNA-Seq data, as well as on simulated datasets in human and mouse, evaluating a set of parameters previously not considered in such kind of benchmarks. As expected, we found that each tool was the best in specific conditions. For Ion Torrent single-end RNA-Seq samples, the most suitable aligners were CLC and BWA-MEM, which reached the best results in terms of efficiency, accuracy, duplication rate, saturation profile and running time. About Illumina paired-end osteomyelitis transcriptomics data, instead, the best performer algorithm, together with the already cited CLC, resulted Novoalign, which excelled in accuracy and saturation analyses. Segemehl and DNASTAR performed the best on both DNA-Seq data, with Segemehl particularly suitable for exome data. In conclusion, our study could guide users in the selection of a suitable aligner based on genome and transcriptome characteristics. However, several other aspects, emerged from our work, should be considered in the evolution of alignment research area, such as the involvement of artificial intelligence to support cloud computing and mapping to multiple genomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00521-021-06188-z.
format Online
Article
Text
id pubmed-8208613
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer London
record_format MEDLINE/PubMed
spelling pubmed-82086132021-06-17 New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies Donato, Luigi Scimone, Concetta Rinaldi, Carmela D’Angelo, Rosalia Sidoti, Antonina Neural Comput Appl Original Article During the last (15) years, improved omics sequencing technologies have expanded the scale and resolution of various biological applications, generating high-throughput datasets that require carefully chosen software tools to be processed. Therefore, following the sequencing development, bioinformatics researchers have been challenged to implement alignment algorithms for next-generation sequencing reads. However, nowadays selection of aligners based on genome characteristics is poorly studied, so our benchmarking study extended the “state of art” comparing 17 different aligners. The chosen tools were assessed on empirical human DNA- and RNA-Seq data, as well as on simulated datasets in human and mouse, evaluating a set of parameters previously not considered in such kind of benchmarks. As expected, we found that each tool was the best in specific conditions. For Ion Torrent single-end RNA-Seq samples, the most suitable aligners were CLC and BWA-MEM, which reached the best results in terms of efficiency, accuracy, duplication rate, saturation profile and running time. About Illumina paired-end osteomyelitis transcriptomics data, instead, the best performer algorithm, together with the already cited CLC, resulted Novoalign, which excelled in accuracy and saturation analyses. Segemehl and DNASTAR performed the best on both DNA-Seq data, with Segemehl particularly suitable for exome data. In conclusion, our study could guide users in the selection of a suitable aligner based on genome and transcriptome characteristics. However, several other aspects, emerged from our work, should be considered in the evolution of alignment research area, such as the involvement of artificial intelligence to support cloud computing and mapping to multiple genomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00521-021-06188-z. Springer London 2021-06-16 2021 /pmc/articles/PMC8208613/ /pubmed/34155424 http://dx.doi.org/10.1007/s00521-021-06188-z Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Donato, Luigi
Scimone, Concetta
Rinaldi, Carmela
D’Angelo, Rosalia
Sidoti, Antonina
New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies
title New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies
title_full New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies
title_fullStr New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies
title_full_unstemmed New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies
title_short New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies
title_sort new evaluation methods of read mapping by 17 aligners on simulated and empirical ngs data: an updated comparison of dna- and rna-seq data from illumina and ion torrent technologies
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208613/
https://www.ncbi.nlm.nih.gov/pubmed/34155424
http://dx.doi.org/10.1007/s00521-021-06188-z
work_keys_str_mv AT donatoluigi newevaluationmethodsofreadmappingby17alignersonsimulatedandempiricalngsdataanupdatedcomparisonofdnaandrnaseqdatafromilluminaandiontorrenttechnologies
AT scimoneconcetta newevaluationmethodsofreadmappingby17alignersonsimulatedandempiricalngsdataanupdatedcomparisonofdnaandrnaseqdatafromilluminaandiontorrenttechnologies
AT rinaldicarmela newevaluationmethodsofreadmappingby17alignersonsimulatedandempiricalngsdataanupdatedcomparisonofdnaandrnaseqdatafromilluminaandiontorrenttechnologies
AT dangelorosalia newevaluationmethodsofreadmappingby17alignersonsimulatedandempiricalngsdataanupdatedcomparisonofdnaandrnaseqdatafromilluminaandiontorrenttechnologies
AT sidotiantonina newevaluationmethodsofreadmappingby17alignersonsimulatedandempiricalngsdataanupdatedcomparisonofdnaandrnaseqdatafromilluminaandiontorrenttechnologies