Cargando…
SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence
SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to no...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100560/ https://www.ncbi.nlm.nih.gov/pubmed/27458203 http://dx.doi.org/10.1093/nar/gkw655 |
_version_ | 1782466163403390976 |
---|---|
author | Lopez-Maestre, Hélène Brinza, Lilia Marchet, Camille Kielbassa, Janice Bastien, Sylvère Boutigny, Mathilde Monnin, David Filali, Adil El Carareto, Claudia Marcia Vieira, Cristina Picard, Franck Kremer, Natacha Vavre, Fabrice Sagot, Marie-France Lacroix, Vincent |
author_facet | Lopez-Maestre, Hélène Brinza, Lilia Marchet, Camille Kielbassa, Janice Bastien, Sylvère Boutigny, Mathilde Monnin, David Filali, Adil El Carareto, Claudia Marcia Vieira, Cristina Picard, Franck Kremer, Natacha Vavre, Fabrice Sagot, Marie-France Lacroix, Vincent |
author_sort | Lopez-Maestre, Hélène |
collection | PubMed |
description | SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest. |
format | Online Article Text |
id | pubmed-5100560 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-51005602016-11-10 SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence Lopez-Maestre, Hélène Brinza, Lilia Marchet, Camille Kielbassa, Janice Bastien, Sylvère Boutigny, Mathilde Monnin, David Filali, Adil El Carareto, Claudia Marcia Vieira, Cristina Picard, Franck Kremer, Natacha Vavre, Fabrice Sagot, Marie-France Lacroix, Vincent Nucleic Acids Res Methods Online SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest. Oxford University Press 2016-11-02 2016-07-25 /pmc/articles/PMC5100560/ /pubmed/27458203 http://dx.doi.org/10.1093/nar/gkw655 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Lopez-Maestre, Hélène Brinza, Lilia Marchet, Camille Kielbassa, Janice Bastien, Sylvère Boutigny, Mathilde Monnin, David Filali, Adil El Carareto, Claudia Marcia Vieira, Cristina Picard, Franck Kremer, Natacha Vavre, Fabrice Sagot, Marie-France Lacroix, Vincent SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
title | SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
title_full | SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
title_fullStr | SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
title_full_unstemmed | SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
title_short | SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
title_sort | snp calling from rna-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100560/ https://www.ncbi.nlm.nih.gov/pubmed/27458203 http://dx.doi.org/10.1093/nar/gkw655 |
work_keys_str_mv | AT lopezmaestrehelene snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT brinzalilia snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT marchetcamille snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT kielbassajanice snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT bastiensylvere snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT boutignymathilde snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT monnindavid snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT filaliadilel snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT cararetoclaudiamarcia snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT vieiracristina snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT picardfranck snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT kremernatacha snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT vavrefabrice snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT sagotmariefrance snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence AT lacroixvincent snpcallingfromrnaseqdatawithoutareferencegenomeidentificationquantificationdifferentialanalysisandimpactontheproteinsequence |