Cargando…
The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies
Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5507458/ https://www.ncbi.nlm.nih.gov/pubmed/28700635 http://dx.doi.org/10.1371/journal.pone.0180904 |
_version_ | 1783249737056518144 |
---|---|
author | Price, Adam Gibas, Cynthia |
author_facet | Price, Adam Gibas, Cynthia |
author_sort | Price, Adam |
collection | PubMed |
description | Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use. |
format | Online Article Text |
id | pubmed-5507458 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-55074582017-07-25 The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies Price, Adam Gibas, Cynthia PLoS One Research Article Sequence read alignment to a reference genome is a fundamental step in many genomics studies. Accuracy in this fundamental step is crucial for correct interpretation of biological data. In cases where two or more closely related bacterial strains are being studied, a common approach is to simply map reads from all strains to a common reference genome, whether because there is no closed reference for some strains or for ease of comparison. The assumption is that the differences between bacterial strains are insignificant enough that the results of differential expression analysis will not be influenced by choice of reference. Genes that are common among the strains under study are used for differential expression analysis, while the remaining genes, which may fail to express in one sample or the other because they are simply absent, are analyzed separately. In this study, we investigate the practice of using a common reference in transcriptomic analysis. We analyze two multi-strain transcriptomic data sets that were initially presented in the literature as comparisons based on a common reference, but which have available closed genomic sequence for all strains, allowing a detailed examination of the impact of reference choice. We provide a method for identifying regions that are most affected by non-native alignments, leading to false positives in differential expression analysis, and perform an in depth analysis identifying the extent of expression loss. We also simulate several data sets to identify best practices for non-native reference use. Public Library of Science 2017-07-11 /pmc/articles/PMC5507458/ /pubmed/28700635 http://dx.doi.org/10.1371/journal.pone.0180904 Text en © 2017 Price, Gibas http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Price, Adam Gibas, Cynthia The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies |
title | The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies |
title_full | The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies |
title_fullStr | The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies |
title_full_unstemmed | The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies |
title_short | The quantitative impact of read mapping to non-native reference genomes in comparative RNA-Seq studies |
title_sort | quantitative impact of read mapping to non-native reference genomes in comparative rna-seq studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5507458/ https://www.ncbi.nlm.nih.gov/pubmed/28700635 http://dx.doi.org/10.1371/journal.pone.0180904 |
work_keys_str_mv | AT priceadam thequantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies AT gibascynthia thequantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies AT priceadam quantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies AT gibascynthia quantitativeimpactofreadmappingtononnativereferencegenomesincomparativernaseqstudies |