Cargando…

COMPILE: a GWAS computational pipeline for gene discovery in complex genomes

BACKGROUND: Genome-Wide Association Studies (GWAS) are used to identify genes and alleles that contribute to quantitative traits in large and genetically diverse populations. However, traits with complex genetic architectures create an enormous computational load for discovery of candidate genes wit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hill, Matthew J., Penning, Bryan W., McCann, Maureen C., Carpita, Nicholas C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9250234/ https://www.ncbi.nlm.nih.gov/pubmed/35778686 http://dx.doi.org/10.1186/s12870-022-03668-9

_version_	1784739766458122240
author	Hill, Matthew J. Penning, Bryan W. McCann, Maureen C. Carpita, Nicholas C.
author_facet	Hill, Matthew J. Penning, Bryan W. McCann, Maureen C. Carpita, Nicholas C.
author_sort	Hill, Matthew J.
collection	PubMed
description	BACKGROUND: Genome-Wide Association Studies (GWAS) are used to identify genes and alleles that contribute to quantitative traits in large and genetically diverse populations. However, traits with complex genetic architectures create an enormous computational load for discovery of candidate genes with acceptable statistical certainty. We developed a streamlined computational pipeline for GWAS (COMPILE) to accelerate identification and annotation of candidate maize genes associated with a quantitative trait, and then matches maize genes to their closest rice and Arabidopsis homologs by sequence similarity. RESULTS: COMPILE executed GWAS using a Mixed Linear Model that incorporated, without compression, recent advancements in population structure control, then linked significant Quantitative Trait Loci (QTL) to candidate genes and RNA regulatory elements contained in any genome. COMPILE was validated using published data to identify QTL associated with the traits of α-tocopherol biosynthesis and flowering time, and identified published candidate genes as well as additional genes and non-coding RNAs. We then applied COMPILE to 274 genotypes of the maize Goodman Association Panel to identify candidate loci contributing to resistance of maize stems to penetration by larvae of the European Corn Borer (Ostrinia nubilalis). Candidate genes included those that encode a gene of unknown function, WRKY and MYB-like transcriptional factors, receptor-kinase signaling, riboflavin synthesis, nucleotide-sugar interconversion, and prolyl hydroxylation. Expression of the gene of unknown function has been associated with pathogen stress in maize and in rice homologs closest in sequence identity. CONCLUSIONS: The relative speed of data analysis using COMPILE allowed comparison of population size and compression. Limitations in population size and diversity are major constraints for a trait and are not overcome by increasing marker density. COMPILE is customizable and is readily adaptable for application to species with robust genomic and proteome databases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12870-022-03668-9.
format	Online Article Text
id	pubmed-9250234
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-92502342022-07-03 COMPILE: a GWAS computational pipeline for gene discovery in complex genomes Hill, Matthew J. Penning, Bryan W. McCann, Maureen C. Carpita, Nicholas C. BMC Plant Biol Research BACKGROUND: Genome-Wide Association Studies (GWAS) are used to identify genes and alleles that contribute to quantitative traits in large and genetically diverse populations. However, traits with complex genetic architectures create an enormous computational load for discovery of candidate genes with acceptable statistical certainty. We developed a streamlined computational pipeline for GWAS (COMPILE) to accelerate identification and annotation of candidate maize genes associated with a quantitative trait, and then matches maize genes to their closest rice and Arabidopsis homologs by sequence similarity. RESULTS: COMPILE executed GWAS using a Mixed Linear Model that incorporated, without compression, recent advancements in population structure control, then linked significant Quantitative Trait Loci (QTL) to candidate genes and RNA regulatory elements contained in any genome. COMPILE was validated using published data to identify QTL associated with the traits of α-tocopherol biosynthesis and flowering time, and identified published candidate genes as well as additional genes and non-coding RNAs. We then applied COMPILE to 274 genotypes of the maize Goodman Association Panel to identify candidate loci contributing to resistance of maize stems to penetration by larvae of the European Corn Borer (Ostrinia nubilalis). Candidate genes included those that encode a gene of unknown function, WRKY and MYB-like transcriptional factors, receptor-kinase signaling, riboflavin synthesis, nucleotide-sugar interconversion, and prolyl hydroxylation. Expression of the gene of unknown function has been associated with pathogen stress in maize and in rice homologs closest in sequence identity. CONCLUSIONS: The relative speed of data analysis using COMPILE allowed comparison of population size and compression. Limitations in population size and diversity are major constraints for a trait and are not overcome by increasing marker density. COMPILE is customizable and is readily adaptable for application to species with robust genomic and proteome databases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12870-022-03668-9. BioMed Central 2022-07-02 /pmc/articles/PMC9250234/ /pubmed/35778686 http://dx.doi.org/10.1186/s12870-022-03668-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Hill, Matthew J. Penning, Bryan W. McCann, Maureen C. Carpita, Nicholas C. COMPILE: a GWAS computational pipeline for gene discovery in complex genomes
title	COMPILE: a GWAS computational pipeline for gene discovery in complex genomes
title_full	COMPILE: a GWAS computational pipeline for gene discovery in complex genomes
title_fullStr	COMPILE: a GWAS computational pipeline for gene discovery in complex genomes
title_full_unstemmed	COMPILE: a GWAS computational pipeline for gene discovery in complex genomes
title_short	COMPILE: a GWAS computational pipeline for gene discovery in complex genomes
title_sort	compile: a gwas computational pipeline for gene discovery in complex genomes
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9250234/ https://www.ncbi.nlm.nih.gov/pubmed/35778686 http://dx.doi.org/10.1186/s12870-022-03668-9
work_keys_str_mv	AT hillmatthewj compileagwascomputationalpipelineforgenediscoveryincomplexgenomes AT penningbryanw compileagwascomputationalpipelineforgenediscoveryincomplexgenomes AT mccannmaureenc compileagwascomputationalpipelineforgenediscoveryincomplexgenomes AT carpitanicholasc compileagwascomputationalpipelineforgenediscoveryincomplexgenomes

COMPILE: a GWAS computational pipeline for gene discovery in complex genomes

Ejemplares similares