Cargando…

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, bu...

Descripción completa

Detalles Bibliográficos
Autores principales: Kobayashi, Masaaki, Ohyanagi, Hajime, Takanashi, Hideki, Asano, Satomi, Kudo, Toru, Kajiya-Kanegae, Hiromi, Nagano, Atsushi J., Tainaka, Hitoshi, Tokunaga, Tsuyoshi, Sazuka, Takashi, Iwata, Hiroyoshi, Tsutsumi, Nobuhiro, Yano, Kentaro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737671/
https://www.ncbi.nlm.nih.gov/pubmed/28498906
http://dx.doi.org/10.1093/dnares/dsx012
_version_ 1783287561313058816
author Kobayashi, Masaaki
Ohyanagi, Hajime
Takanashi, Hideki
Asano, Satomi
Kudo, Toru
Kajiya-Kanegae, Hiromi
Nagano, Atsushi J.
Tainaka, Hitoshi
Tokunaga, Tsuyoshi
Sazuka, Takashi
Iwata, Hiroyoshi
Tsutsumi, Nobuhiro
Yano, Kentaro
author_facet Kobayashi, Masaaki
Ohyanagi, Hajime
Takanashi, Hideki
Asano, Satomi
Kudo, Toru
Kajiya-Kanegae, Hiromi
Nagano, Atsushi J.
Tainaka, Hitoshi
Tokunaga, Tsuyoshi
Sazuka, Takashi
Iwata, Hiroyoshi
Tsutsumi, Nobuhiro
Yano, Kentaro
author_sort Kobayashi, Masaaki
collection PubMed
description Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).
format Online
Article
Text
id pubmed-5737671
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57376712018-01-04 Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data Kobayashi, Masaaki Ohyanagi, Hajime Takanashi, Hideki Asano, Satomi Kudo, Toru Kajiya-Kanegae, Hiromi Nagano, Atsushi J. Tainaka, Hitoshi Tokunaga, Tsuyoshi Sazuka, Takashi Iwata, Hiroyoshi Tsutsumi, Nobuhiro Yano, Kentaro DNA Res Full Papers Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)). Oxford University Press 2017-08 2017-05-11 /pmc/articles/PMC5737671/ /pubmed/28498906 http://dx.doi.org/10.1093/dnares/dsx012 Text en © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Full Papers
Kobayashi, Masaaki
Ohyanagi, Hajime
Takanashi, Hideki
Asano, Satomi
Kudo, Toru
Kajiya-Kanegae, Hiromi
Nagano, Atsushi J.
Tainaka, Hitoshi
Tokunaga, Tsuyoshi
Sazuka, Takashi
Iwata, Hiroyoshi
Tsutsumi, Nobuhiro
Yano, Kentaro
Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
title Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
title_full Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
title_fullStr Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
title_full_unstemmed Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
title_short Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
title_sort heap: a highly sensitive and accurate snp detection tool for low-coverage high-throughput sequencing data
topic Full Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737671/
https://www.ncbi.nlm.nih.gov/pubmed/28498906
http://dx.doi.org/10.1093/dnares/dsx012
work_keys_str_mv AT kobayashimasaaki heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT ohyanagihajime heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT takanashihideki heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT asanosatomi heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT kudotoru heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT kajiyakanegaehiromi heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT naganoatsushij heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT tainakahitoshi heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT tokunagatsuyoshi heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT sazukatakashi heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT iwatahiroyoshi heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT tsutsuminobuhiro heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata
AT yanokentaro heapahighlysensitiveandaccuratesnpdetectiontoolforlowcoveragehighthroughputsequencingdata