Cargando…
One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping s...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534450/ https://www.ncbi.nlm.nih.gov/pubmed/26267278 http://dx.doi.org/10.1371/journal.pcbi.1004448 |
_version_ | 1782385457498161152 |
---|---|
author | Yuan, Shuai Johnston, H. Richard Zhang, Guosheng Li, Yun Hu, Yi-Juan Qin, Zhaohui S. |
author_facet | Yuan, Shuai Johnston, H. Richard Zhang, Guosheng Li, Yun Hu, Yi-Juan Qin, Zhaohui S. |
author_sort | Yuan, Shuai |
collection | PubMed |
description | With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. |
format | Online Article Text |
id | pubmed-4534450 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45344502015-08-24 One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies Yuan, Shuai Johnston, H. Richard Zhang, Guosheng Li, Yun Hu, Yi-Juan Qin, Zhaohui S. PLoS Comput Biol Research Article With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. Public Library of Science 2015-08-12 /pmc/articles/PMC4534450/ /pubmed/26267278 http://dx.doi.org/10.1371/journal.pcbi.1004448 Text en © 2015 Yuan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Yuan, Shuai Johnston, H. Richard Zhang, Guosheng Li, Yun Hu, Yi-Juan Qin, Zhaohui S. One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies |
title | One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies |
title_full | One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies |
title_fullStr | One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies |
title_full_unstemmed | One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies |
title_short | One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies |
title_sort | one size doesn't fit all - refeditor: building personalized diploid reference genome to improve read mapping and genotype calling in next generation sequencing studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534450/ https://www.ncbi.nlm.nih.gov/pubmed/26267278 http://dx.doi.org/10.1371/journal.pcbi.1004448 |
work_keys_str_mv | AT yuanshuai onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies AT johnstonhrichard onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies AT zhangguosheng onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies AT liyun onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies AT huyijuan onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies AT qinzhaohuis onesizedoesntfitallrefeditorbuildingpersonalizeddiploidreferencegenometoimprovereadmappingandgenotypecallinginnextgenerationsequencingstudies |