Cargando…

GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets

Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 sin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jeong, Seongmun, Kim, Jae-Yoon, Jeong, Soon-Chun, Kang, Sung-Taeg, Moon, Jung-Kyung, Kim, Namshin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5519076/ https://www.ncbi.nlm.nih.gov/pubmed/28727806 http://dx.doi.org/10.1371/journal.pone.0181420

_version_	1783251581596073984
author	Jeong, Seongmun Kim, Jae-Yoon Jeong, Soon-Chun Kang, Sung-Taeg Moon, Jung-Kyung Kim, Namshin
author_facet	Jeong, Seongmun Kim, Jae-Yoon Jeong, Soon-Chun Kang, Sung-Taeg Moon, Jung-Kyung Kim, Namshin
author_sort	Jeong, Seongmun
collection	PubMed
description	Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at https://github.com/lovemun/Genocore.
format	Online Article Text
id	pubmed-5519076
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-55190762017-08-07 GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets Jeong, Seongmun Kim, Jae-Yoon Jeong, Soon-Chun Kang, Sung-Taeg Moon, Jung-Kyung Kim, Namshin PLoS One Research Article Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at https://github.com/lovemun/Genocore. Public Library of Science 2017-07-20 /pmc/articles/PMC5519076/ /pubmed/28727806 http://dx.doi.org/10.1371/journal.pone.0181420 Text en © 2017 Jeong et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Jeong, Seongmun Kim, Jae-Yoon Jeong, Soon-Chun Kang, Sung-Taeg Moon, Jung-Kyung Kim, Namshin GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
title	GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
title_full	GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
title_fullStr	GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
title_full_unstemmed	GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
title_short	GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets
title_sort	genocore: a simple and fast algorithm for core subset selection from large genotype datasets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5519076/ https://www.ncbi.nlm.nih.gov/pubmed/28727806 http://dx.doi.org/10.1371/journal.pone.0181420
work_keys_str_mv	AT jeongseongmun genocoreasimpleandfastalgorithmforcoresubsetselectionfromlargegenotypedatasets AT kimjaeyoon genocoreasimpleandfastalgorithmforcoresubsetselectionfromlargegenotypedatasets AT jeongsoonchun genocoreasimpleandfastalgorithmforcoresubsetselectionfromlargegenotypedatasets AT kangsungtaeg genocoreasimpleandfastalgorithmforcoresubsetselectionfromlargegenotypedatasets AT moonjungkyung genocoreasimpleandfastalgorithmforcoresubsetselectionfromlargegenotypedatasets AT kimnamshin genocoreasimpleandfastalgorithmforcoresubsetselectionfromlargegenotypedatasets

GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets

Ejemplares similares