Cargando…

The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies

Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map...

Descripción completa

Detalles Bibliográficos
Autores principales: Price, Adam, Gibas, Cynthia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5507458/
https://www.ncbi.nlm.nih.gov/pubmed/28700635
http://dx.doi.org/10.1371/journal.pone.0180904
_version_ 1783249737056518144
author Price, Adam
Gibas, Cynthia
author_facet Price, Adam
Gibas, Cynthia
author_sort Price, Adam
collection PubMed
description Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use.
format Online
Article
Text
id pubmed-5507458
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55074582017-07-25 The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies Price, Adam Gibas, Cynthia PLoS One Research Article Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use. Public Library of Science 2017-07-11 /pmc/articles/PMC5507458/ /pubmed/28700635 http://dx.doi.org/10.1371/journal.pone.0180904 Text en © 2017 Price, Gibas http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Price, Adam
Gibas, Cynthia
The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
title The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
title_full The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
title_fullStr The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
title_full_unstemmed The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
title_short The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
title_sort quantitative impact of read mapping to non-native reference genomes in comparative rna-seq studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5507458/
https://www.ncbi.nlm.nih.gov/pubmed/28700635
http://dx.doi.org/10.1371/journal.pone.0180904
work_keys_str_mv AT priceadam thequantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies
AT gibascynthia thequantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies
AT priceadam quantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies
AT gibascynthia quantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies