Cargando…
PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data
BACKGROUND: Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affec...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10074814/ https://www.ncbi.nlm.nih.gov/pubmed/37020193 http://dx.doi.org/10.1186/s12859-023-05169-4 |
_version_ | 1785019814296682496 |
---|---|
author | Solovieva, Elena Sakai, Hiroaki |
author_facet | Solovieva, Elena Sakai, Hiroaki |
author_sort | Solovieva, Elena |
collection | PubMed |
description | BACKGROUND: Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS: We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS: The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05169-4. |
format | Online Article Text |
id | pubmed-10074814 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100748142023-04-06 PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data Solovieva, Elena Sakai, Hiroaki BMC Bioinformatics Research BACKGROUND: Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS: We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS: The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05169-4. BioMed Central 2023-04-05 /pmc/articles/PMC10074814/ /pubmed/37020193 http://dx.doi.org/10.1186/s12859-023-05169-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Solovieva, Elena Sakai, Hiroaki PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
title | PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
title_full | PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
title_fullStr | PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
title_full_unstemmed | PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
title_short | PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
title_sort | psrelip: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10074814/ https://www.ncbi.nlm.nih.gov/pubmed/37020193 http://dx.doi.org/10.1186/s12859-023-05169-4 |
work_keys_str_mv | AT solovievaelena psrelipanintegratedpipelineforanalysisandvisualizationofpopulationstructureandrelatednessbasedongenomewidegeneticvariantdata AT sakaihiroaki psrelipanintegratedpipelineforanalysisandvisualizationofpopulationstructureandrelatednessbasedongenomewidegeneticvariantdata |