Cargando…

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Jehl, Frédéric, Degalez, Fabien, Bernard, Maria, Lecerf, Frédéric, Lagoutte, Laetitia, Désert, Colette, Coulée, Manon, Bouchez, Olivier, Leroux, Sophie, Abasht, Behnam, Tixier-Boichard, Michèle, Bed’hom, Bertrand, Burlot, Thierry, Gourichon, David, Bardou, Philippe, Acloque, Hervé, Foissac, Sylvain, Djebali, Sarah, Giuffra, Elisabetta, Zerjal, Tatiana, Pitel, Frédérique, Klopp, Christophe, Lagarrigue, Sandrine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8273700/
https://www.ncbi.nlm.nih.gov/pubmed/34262593
http://dx.doi.org/10.3389/fgene.2021.655707
_version_ 1783721421305806848
author Jehl, Frédéric
Degalez, Fabien
Bernard, Maria
Lecerf, Frédéric
Lagoutte, Laetitia
Désert, Colette
Coulée, Manon
Bouchez, Olivier
Leroux, Sophie
Abasht, Behnam
Tixier-Boichard, Michèle
Bed’hom, Bertrand
Burlot, Thierry
Gourichon, David
Bardou, Philippe
Acloque, Hervé
Foissac, Sylvain
Djebali, Sarah
Giuffra, Elisabetta
Zerjal, Tatiana
Pitel, Frédérique
Klopp, Christophe
Lagarrigue, Sandrine
author_facet Jehl, Frédéric
Degalez, Fabien
Bernard, Maria
Lecerf, Frédéric
Lagoutte, Laetitia
Désert, Colette
Coulée, Manon
Bouchez, Olivier
Leroux, Sophie
Abasht, Behnam
Tixier-Boichard, Michèle
Bed’hom, Bertrand
Burlot, Thierry
Gourichon, David
Bardou, Philippe
Acloque, Hervé
Foissac, Sylvain
Djebali, Sarah
Giuffra, Elisabetta
Zerjal, Tatiana
Pitel, Frédérique
Klopp, Christophe
Lagarrigue, Sandrine
author_sort Jehl, Frédéric
collection PubMed
description In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.
format Online
Article
Text
id pubmed-8273700
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82737002021-07-13 RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species Jehl, Frédéric Degalez, Fabien Bernard, Maria Lecerf, Frédéric Lagoutte, Laetitia Désert, Colette Coulée, Manon Bouchez, Olivier Leroux, Sophie Abasht, Behnam Tixier-Boichard, Michèle Bed’hom, Bertrand Burlot, Thierry Gourichon, David Bardou, Philippe Acloque, Hervé Foissac, Sylvain Djebali, Sarah Giuffra, Elisabetta Zerjal, Tatiana Pitel, Frédérique Klopp, Christophe Lagarrigue, Sandrine Front Genet Genetics In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies. Frontiers Media S.A. 2021-06-28 /pmc/articles/PMC8273700/ /pubmed/34262593 http://dx.doi.org/10.3389/fgene.2021.655707 Text en Copyright © 2021 Jehl, Degalez, Bernard, Lecerf, Lagoutte, Désert, Coulée, Bouchez, Leroux, Abasht, Tixier-Boichard, Bed’hom, Burlot, Gourichon, Bardou, Acloque, Foissac, Djebali, Giuffra, Zerjal, Pitel, Klopp and Lagarrigue. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Jehl, Frédéric
Degalez, Fabien
Bernard, Maria
Lecerf, Frédéric
Lagoutte, Laetitia
Désert, Colette
Coulée, Manon
Bouchez, Olivier
Leroux, Sophie
Abasht, Behnam
Tixier-Boichard, Michèle
Bed’hom, Bertrand
Burlot, Thierry
Gourichon, David
Bardou, Philippe
Acloque, Hervé
Foissac, Sylvain
Djebali, Sarah
Giuffra, Elisabetta
Zerjal, Tatiana
Pitel, Frédérique
Klopp, Christophe
Lagarrigue, Sandrine
RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species
title RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species
title_full RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species
title_fullStr RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species
title_full_unstemmed RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species
title_short RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species
title_sort rna-seq data for reliable snp detection and genotype calling: interest for coding variant characterization and cis-regulation analysis by allele-specific expression in livestock species
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8273700/
https://www.ncbi.nlm.nih.gov/pubmed/34262593
http://dx.doi.org/10.3389/fgene.2021.655707
work_keys_str_mv AT jehlfrederic rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT degalezfabien rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT bernardmaria rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT lecerffrederic rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT lagouttelaetitia rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT desertcolette rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT couleemanon rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT bouchezolivier rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT lerouxsophie rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT abashtbehnam rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT tixierboichardmichele rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT bedhombertrand rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT burlotthierry rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT gourichondavid rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT bardouphilippe rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT acloqueherve rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT foissacsylvain rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT djebalisarah rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT giuffraelisabetta rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT zerjaltatiana rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT pitelfrederique rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT kloppchristophe rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies
AT lagarriguesandrine rnaseqdataforreliablesnpdetectionandgenotypecallinginterestforcodingvariantcharacterizationandcisregulationanalysisbyallelespecificexpressioninlivestockspecies