Cargando…

CAARS: comparative assembly and annotation of RNA-Seq data

MOTIVATION: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-S...

Descripción completa

Detalles Bibliográficos
Autores principales: Rey, Carine, Veber, Philippe, Boussau, Bastien, Sémon, Marie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6596894/
https://www.ncbi.nlm.nih.gov/pubmed/30452539
http://dx.doi.org/10.1093/bioinformatics/bty903
_version_ 1783430514568331264
author Rey, Carine
Veber, Philippe
Boussau, Bastien
Sémon, Marie
author_facet Rey, Carine
Veber, Philippe
Boussau, Bastien
Sémon, Marie
author_sort Rey, Carine
collection PubMed
description MOTIVATION: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS: We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION: CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6596894
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65968942019-07-03 CAARS: comparative assembly and annotation of RNA-Seq data Rey, Carine Veber, Philippe Boussau, Bastien Sémon, Marie Bioinformatics Original Papers MOTIVATION: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS: We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION: CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07-01 2018-11-19 /pmc/articles/PMC6596894/ /pubmed/30452539 http://dx.doi.org/10.1093/bioinformatics/bty903 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Rey, Carine
Veber, Philippe
Boussau, Bastien
Sémon, Marie
CAARS: comparative assembly and annotation of RNA-Seq data
title CAARS: comparative assembly and annotation of RNA-Seq data
title_full CAARS: comparative assembly and annotation of RNA-Seq data
title_fullStr CAARS: comparative assembly and annotation of RNA-Seq data
title_full_unstemmed CAARS: comparative assembly and annotation of RNA-Seq data
title_short CAARS: comparative assembly and annotation of RNA-Seq data
title_sort caars: comparative assembly and annotation of rna-seq data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6596894/
https://www.ncbi.nlm.nih.gov/pubmed/30452539
http://dx.doi.org/10.1093/bioinformatics/bty903
work_keys_str_mv AT reycarine caarscomparativeassemblyandannotationofrnaseqdata
AT veberphilippe caarscomparativeassemblyandannotationofrnaseqdata
AT boussaubastien caarscomparativeassemblyandannotationofrnaseqdata
AT semonmarie caarscomparativeassemblyandannotationofrnaseqdata