Cargando…

Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses

The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaminow, Benjamin, Ballouz, Sara, Gillis, Jesse, Dobin, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8997357/
https://www.ncbi.nlm.nih.gov/pubmed/35256454
http://dx.doi.org/10.1101/gr.275613.121
_version_ 1784684684614041600
author Kaminow, Benjamin
Ballouz, Sara
Gillis, Jesse
Dobin, Alexander
author_facet Kaminow, Benjamin
Ballouz, Sara
Gillis, Jesse
Dobin, Alexander
author_sort Kaminow, Benjamin
collection PubMed
description The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
format Online
Article
Text
id pubmed-8997357
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-89973572022-04-22 Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses Kaminow, Benjamin Ballouz, Sara Gillis, Jesse Dobin, Alexander Genome Res Method The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions. Cold Spring Harbor Laboratory Press 2022-04 /pmc/articles/PMC8997357/ /pubmed/35256454 http://dx.doi.org/10.1101/gr.275613.121 Text en © 2022 Kaminow et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Method
Kaminow, Benjamin
Ballouz, Sara
Gillis, Jesse
Dobin, Alexander
Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses
title Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses
title_full Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses
title_fullStr Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses
title_full_unstemmed Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses
title_short Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses
title_sort pan-human consensus genome significantly improves the accuracy of rna-seq analyses
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8997357/
https://www.ncbi.nlm.nih.gov/pubmed/35256454
http://dx.doi.org/10.1101/gr.275613.121
work_keys_str_mv AT kaminowbenjamin panhumanconsensusgenomesignificantlyimprovestheaccuracyofrnaseqanalyses
AT ballouzsara panhumanconsensusgenomesignificantlyimprovestheaccuracyofrnaseqanalyses
AT gillisjesse panhumanconsensusgenomesignificantlyimprovestheaccuracyofrnaseqanalyses
AT dobinalexander panhumanconsensusgenomesignificantlyimprovestheaccuracyofrnaseqanalyses