Cargando…

SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to no...

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez-Maestre, Hélène, Brinza, Lilia, Marchet, Camille, Kielbassa, Janice, Bastien, Sylvère, Boutigny, Mathilde, Monnin, David, Filali, Adil El, Carareto, Claudia Marcia, Vieira, Cristina, Picard, Franck, Kremer, Natacha, Vavre, Fabrice, Sagot, Marie-France, Lacroix, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100560/
https://www.ncbi.nlm.nih.gov/pubmed/27458203
http://dx.doi.org/10.1093/nar/gkw655
_version_ 1782466163403390976
author Lopez-Maestre, Hélène
Brinza, Lilia
Marchet, Camille
Kielbassa, Janice
Bastien, Sylvère
Boutigny, Mathilde
Monnin, David
Filali, Adil El
Carareto, Claudia Marcia
Vieira, Cristina
Picard, Franck
Kremer, Natacha
Vavre, Fabrice
Sagot, Marie-France
Lacroix, Vincent
author_facet Lopez-Maestre, Hélène
Brinza, Lilia
Marchet, Camille
Kielbassa, Janice
Bastien, Sylvère
Boutigny, Mathilde
Monnin, David
Filali, Adil El
Carareto, Claudia Marcia
Vieira, Cristina
Picard, Franck
Kremer, Natacha
Vavre, Fabrice
Sagot, Marie-France
Lacroix, Vincent
author_sort Lopez-Maestre, Hélène
collection PubMed
description SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
format Online
Article
Text
id pubmed-5100560
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-51005602016-11-10 SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence Lopez-Maestre, Hélène Brinza, Lilia Marchet, Camille Kielbassa, Janice Bastien, Sylvère Boutigny, Mathilde Monnin, David Filali, Adil El Carareto, Claudia Marcia Vieira, Cristina Picard, Franck Kremer, Natacha Vavre, Fabrice Sagot, Marie-France Lacroix, Vincent Nucleic Acids Res Methods Online SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest. Oxford University Press 2016-11-02 2016-07-25 /pmc/articles/PMC5100560/ /pubmed/27458203 http://dx.doi.org/10.1093/nar/gkw655 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Lopez-Maestre, Hélène
Brinza, Lilia
Marchet, Camille
Kielbassa, Janice
Bastien, Sylvère
Boutigny, Mathilde
Monnin, David
Filali, Adil El
Carareto, Claudia Marcia
Vieira, Cristina
Picard, Franck
Kremer, Natacha
Vavre, Fabrice
Sagot, Marie-France
Lacroix, Vincent
SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
title SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
title_full SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
title_fullStr SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
title_full_unstemmed SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
title_short SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
title_sort snp calling from rna-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100560/
https://www.ncbi.nlm.nih.gov/pubmed/27458203
http://dx.doi.org/10.1093/nar/gkw655
work_keys_str_mv AT lopezmaestrehelene snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT brinzalilia snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT marchetcamille snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT kielbassajanice snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT bastiensylvere snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT boutignymathilde snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT monnindavid snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT filaliadilel snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT cararetoclaudiamarcia snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT vieiracristina snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT picardfranck snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT kremernatacha snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT vavrefabrice snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT sagotmariefrance snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence
AT lacroixvincent snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence