Cargando…

One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphis...

Descripción completa

Detalles Bibliográficos
Autores principales: Valiente-Mullor, Carlos, Beamud, Beatriz, Ansari, Iván, Francés-Cuesta, Carlos, García-González, Neris, Mejía, Lorena, Ruiz-Hueso, Paula, González-Candelas, Fernando
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7870062/
https://www.ncbi.nlm.nih.gov/pubmed/33503026
http://dx.doi.org/10.1371/journal.pcbi.1008678
_version_ 1783648736691355648
author Valiente-Mullor, Carlos
Beamud, Beatriz
Ansari, Iván
Francés-Cuesta, Carlos
García-González, Neris
Mejía, Lorena
Ruiz-Hueso, Paula
González-Candelas, Fernando
author_facet Valiente-Mullor, Carlos
Beamud, Beatriz
Ansari, Iván
Francés-Cuesta, Carlos
García-González, Neris
Mejía, Lorena
Ruiz-Hueso, Paula
González-Candelas, Fernando
author_sort Valiente-Mullor, Carlos
collection PubMed
description Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.
format Online
Article
Text
id pubmed-7870062
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-78700622021-02-11 One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads Valiente-Mullor, Carlos Beamud, Beatriz Ansari, Iván Francés-Cuesta, Carlos García-González, Neris Mejía, Lorena Ruiz-Hueso, Paula González-Candelas, Fernando PLoS Comput Biol Research Article Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended. Public Library of Science 2021-01-27 /pmc/articles/PMC7870062/ /pubmed/33503026 http://dx.doi.org/10.1371/journal.pcbi.1008678 Text en © 2021 Valiente-Mullor et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Valiente-Mullor, Carlos
Beamud, Beatriz
Ansari, Iván
Francés-Cuesta, Carlos
García-González, Neris
Mejía, Lorena
Ruiz-Hueso, Paula
González-Candelas, Fernando
One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads
title One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads
title_full One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads
title_fullStr One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads
title_full_unstemmed One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads
title_short One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads
title_sort one is not enough: on the effects of reference genome for the mapping and subsequent analyses of short-reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7870062/
https://www.ncbi.nlm.nih.gov/pubmed/33503026
http://dx.doi.org/10.1371/journal.pcbi.1008678
work_keys_str_mv AT valientemullorcarlos oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT beamudbeatriz oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT ansariivan oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT francescuestacarlos oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT garciagonzalezneris oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT mejialorena oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT ruizhuesopaula oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads
AT gonzalezcandelasfernando oneisnotenoughontheeffectsofreferencegenomeforthemappingandsubsequentanalysesofshortreads