Cargando…
LinkImputeR: user-guided genotype calling and imputation for non-model organisms
BACKGROUND: Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504746/ https://www.ncbi.nlm.nih.gov/pubmed/28693460 http://dx.doi.org/10.1186/s12864-017-3873-5 |
_version_ | 1783249337755631616 |
---|---|
author | Money, Daniel Migicovsky, Zoë Gardner, Kyle Myles, Sean |
author_facet | Money, Daniel Migicovsky, Zoë Gardner, Kyle Myles, Sean |
author_sort | Money, Daniel |
collection | PubMed |
description | BACKGROUND: Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. RESULTS: Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. CONCLUSIONS: By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from NGS technologies. It enables the user to quickly and easily examine the effects of varying thresholds and filters on the number and quality of the resulting genotype calls. In this manner, users can decide on thresholds that are most suitable for their purposes. We show that LinkImputeR can significantly augment the value and utility of NGS data sets, especially in non-model organisms with poor genomic resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3873-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5504746 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55047462017-07-12 LinkImputeR: user-guided genotype calling and imputation for non-model organisms Money, Daniel Migicovsky, Zoë Gardner, Kyle Myles, Sean BMC Genomics Software BACKGROUND: Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. RESULTS: Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. CONCLUSIONS: By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from NGS technologies. It enables the user to quickly and easily examine the effects of varying thresholds and filters on the number and quality of the resulting genotype calls. In this manner, users can decide on thresholds that are most suitable for their purposes. We show that LinkImputeR can significantly augment the value and utility of NGS data sets, especially in non-model organisms with poor genomic resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3873-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-10 /pmc/articles/PMC5504746/ /pubmed/28693460 http://dx.doi.org/10.1186/s12864-017-3873-5 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Money, Daniel Migicovsky, Zoë Gardner, Kyle Myles, Sean LinkImputeR: user-guided genotype calling and imputation for non-model organisms |
title | LinkImputeR: user-guided genotype calling and imputation for non-model organisms |
title_full | LinkImputeR: user-guided genotype calling and imputation for non-model organisms |
title_fullStr | LinkImputeR: user-guided genotype calling and imputation for non-model organisms |
title_full_unstemmed | LinkImputeR: user-guided genotype calling and imputation for non-model organisms |
title_short | LinkImputeR: user-guided genotype calling and imputation for non-model organisms |
title_sort | linkimputer: user-guided genotype calling and imputation for non-model organisms |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504746/ https://www.ncbi.nlm.nih.gov/pubmed/28693460 http://dx.doi.org/10.1186/s12864-017-3873-5 |
work_keys_str_mv | AT moneydaniel linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms AT migicovskyzoe linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms AT gardnerkyle linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms AT mylessean linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms |