Cargando…

LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes

A multitude of model and non-model species studies have now taken full advantage of powerful high-throughput genotyping advances such as SNP arrays and genotyping-by-sequencing (GBS) technology to investigate the genetic basis of trait variation. However, due to incomplete genome coverage by these t...

Descripción completa

Detalles Bibliográficos
Autores principales: Prunier, Julien, Lemaçon, Audrey, Bastien, Alexandre, Jafarikia, Mohsen, Porth, Ilga, Robert, Claude, Droit, Arnaud
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889475/
https://www.ncbi.nlm.nih.gov/pubmed/31850063
http://dx.doi.org/10.3389/fgene.2019.01192
_version_ 1783475425735868416
author Prunier, Julien
Lemaçon, Audrey
Bastien, Alexandre
Jafarikia, Mohsen
Porth, Ilga
Robert, Claude
Droit, Arnaud
author_facet Prunier, Julien
Lemaçon, Audrey
Bastien, Alexandre
Jafarikia, Mohsen
Porth, Ilga
Robert, Claude
Droit, Arnaud
author_sort Prunier, Julien
collection PubMed
description A multitude of model and non-model species studies have now taken full advantage of powerful high-throughput genotyping advances such as SNP arrays and genotyping-by-sequencing (GBS) technology to investigate the genetic basis of trait variation. However, due to incomplete genome coverage by these technologies, the identified SNPs are likely in linkage disequilibrium (LD) with the causal polymorphisms, rather than be causal themselves. In addition, researchers could benefit from annotations for the identified candidate SNPs and, simultaneously, for all neighboring genes in genetic linkage. In such case, LD extent estimation surrounding the candidate SNPs is required to determine the regions encompassing genes of interest. We describe here an automated pipeline, “LD-annot,” designed to delineate specific regions of interest for a given experiment and candidate polymorphisms on the basis of LD extent, and furthermore, provide annotations for all genes within such regions. LD-annot uses standard file formats, bioinformatics tools, and languages to provide identifiers, coordinates, and annotations for genes in genetic linkage with each candidate polymorphism. Although the focus lies upon SNP arrays and GBS data as they are being routinely deployed, this pipeline can be applied to a variety of datasets as long as genotypic data are available for a high number of polymorphisms and formatted into a vcf file. A checkpoint procedure in the pipeline allows to test several threshold values for linkage without having to rerun the entire pipeline, thus saving the user computational time and resources. We applied this new pipeline to four different sample sets: two breeding populations GBS datasets, one within-pedigree SNP set coming from whole genome sequencing (WGS), and a very large multi-varieties SNP dataset obtained from WGS, representing variable sample sizes, and numbers of polymorphisms. LD-annot performed within minutes, even when very high numbers of polymorphisms are investigated and thus will efficiently assist research efforts aimed at identifying biologically meaningful genetic polymorphisms underlying phenotypic variation. LD-annot tool is available under a GPL license from https://github.com/ArnaudDroitLab/LD-annot.
format Online
Article
Text
id pubmed-6889475
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-68894752019-12-17 LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes Prunier, Julien Lemaçon, Audrey Bastien, Alexandre Jafarikia, Mohsen Porth, Ilga Robert, Claude Droit, Arnaud Front Genet Genetics A multitude of model and non-model species studies have now taken full advantage of powerful high-throughput genotyping advances such as SNP arrays and genotyping-by-sequencing (GBS) technology to investigate the genetic basis of trait variation. However, due to incomplete genome coverage by these technologies, the identified SNPs are likely in linkage disequilibrium (LD) with the causal polymorphisms, rather than be causal themselves. In addition, researchers could benefit from annotations for the identified candidate SNPs and, simultaneously, for all neighboring genes in genetic linkage. In such case, LD extent estimation surrounding the candidate SNPs is required to determine the regions encompassing genes of interest. We describe here an automated pipeline, “LD-annot,” designed to delineate specific regions of interest for a given experiment and candidate polymorphisms on the basis of LD extent, and furthermore, provide annotations for all genes within such regions. LD-annot uses standard file formats, bioinformatics tools, and languages to provide identifiers, coordinates, and annotations for genes in genetic linkage with each candidate polymorphism. Although the focus lies upon SNP arrays and GBS data as they are being routinely deployed, this pipeline can be applied to a variety of datasets as long as genotypic data are available for a high number of polymorphisms and formatted into a vcf file. A checkpoint procedure in the pipeline allows to test several threshold values for linkage without having to rerun the entire pipeline, thus saving the user computational time and resources. We applied this new pipeline to four different sample sets: two breeding populations GBS datasets, one within-pedigree SNP set coming from whole genome sequencing (WGS), and a very large multi-varieties SNP dataset obtained from WGS, representing variable sample sizes, and numbers of polymorphisms. LD-annot performed within minutes, even when very high numbers of polymorphisms are investigated and thus will efficiently assist research efforts aimed at identifying biologically meaningful genetic polymorphisms underlying phenotypic variation. LD-annot tool is available under a GPL license from https://github.com/ArnaudDroitLab/LD-annot. Frontiers Media S.A. 2019-11-26 /pmc/articles/PMC6889475/ /pubmed/31850063 http://dx.doi.org/10.3389/fgene.2019.01192 Text en Copyright © 2019 Prunier, Lemaçon, Bastien, Jafarikia, Porth, Robert and Droit http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Prunier, Julien
Lemaçon, Audrey
Bastien, Alexandre
Jafarikia, Mohsen
Porth, Ilga
Robert, Claude
Droit, Arnaud
LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes
title LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes
title_full LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes
title_fullStr LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes
title_full_unstemmed LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes
title_short LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes
title_sort ld-annot: a bioinformatics tool to automatically provide candidate snps with annotations for genetically linked genes
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889475/
https://www.ncbi.nlm.nih.gov/pubmed/31850063
http://dx.doi.org/10.3389/fgene.2019.01192
work_keys_str_mv AT prunierjulien ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes
AT lemaconaudrey ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes
AT bastienalexandre ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes
AT jafarikiamohsen ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes
AT porthilga ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes
AT robertclaude ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes
AT droitarnaud ldannotabioinformaticstooltoautomaticallyprovidecandidatesnpswithannotationsforgeneticallylinkedgenes