Cargando…

One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies

With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping s...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Shuai, Johnston, H. Richard, Zhang, Guosheng, Li, Yun, Hu, Yi-Juan, Qin, Zhaohui S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534450/
https://www.ncbi.nlm.nih.gov/pubmed/26267278
http://dx.doi.org/10.1371/journal.pcbi.1004448
_version_ 1782385457498161152
author Yuan, Shuai
Johnston, H. Richard
Zhang, Guosheng
Li, Yun
Hu, Yi-Juan
Qin, Zhaohui S.
author_facet Yuan, Shuai
Johnston, H. Richard
Zhang, Guosheng
Li, Yun
Hu, Yi-Juan
Qin, Zhaohui S.
author_sort Yuan, Shuai
collection PubMed
description With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.
format Online
Article
Text
id pubmed-4534450
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45344502015-08-24 One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies Yuan, Shuai Johnston, H. Richard Zhang, Guosheng Li, Yun Hu, Yi-Juan Qin, Zhaohui S. PLoS Comput Biol Research Article With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. Public Library of Science 2015-08-12 /pmc/articles/PMC4534450/ /pubmed/26267278 http://dx.doi.org/10.1371/journal.pcbi.1004448 Text en © 2015 Yuan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yuan, Shuai
Johnston, H. Richard
Zhang, Guosheng
Li, Yun
Hu, Yi-Juan
Qin, Zhaohui S.
One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
title One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
title_full One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
title_fullStr One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
title_full_unstemmed One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
title_short One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
title_sort one size doesn't fit all - refeditor: building personalized diploid reference genome to improve read mapping and genotype calling in next generation sequencing studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534450/
https://www.ncbi.nlm.nih.gov/pubmed/26267278
http://dx.doi.org/10.1371/journal.pcbi.1004448
work_keys_str_mv AT yuanshuai onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies
AT johnstonhrichard onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies
AT zhangguosheng onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies
AT liyun onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies
AT huyijuan onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies
AT qinzhaohuis onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies