Cargando…

De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

BACKGROUND: The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the l...

Descripción completa

Detalles Bibliográficos
Autores principales: Aluome, Christelle, Aubert, Grégoire, Alves Carvalho, Susete, Le Paslier, Marie-Christine, Burstin, Judith, Brunel, Dominique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750290/
https://www.ncbi.nlm.nih.gov/pubmed/26864345
http://dx.doi.org/10.1186/s13104-016-1903-z
_version_ 1782415413591670784
author Aluome, Christelle
Aubert, Grégoire
Alves Carvalho, Susete
Le Paslier, Marie-Christine
Burstin, Judith
Brunel, Dominique
author_facet Aluome, Christelle
Aubert, Grégoire
Alves Carvalho, Susete
Le Paslier, Marie-Christine
Burstin, Judith
Brunel, Dominique
author_sort Aluome, Christelle
collection PubMed
description BACKGROUND: The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the “Gene-space” regions including the promoter, exon and intron sequences are considered. RESULTS: We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisumsativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array. CONCLUSION: The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-1903-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4750290
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47502902016-02-12 De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources Aluome, Christelle Aubert, Grégoire Alves Carvalho, Susete Le Paslier, Marie-Christine Burstin, Judith Brunel, Dominique BMC Res Notes Technical Note BACKGROUND: The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the “Gene-space” regions including the promoter, exon and intron sequences are considered. RESULTS: We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisumsativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array. CONCLUSION: The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-1903-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-11 /pmc/articles/PMC4750290/ /pubmed/26864345 http://dx.doi.org/10.1186/s13104-016-1903-z Text en © Aluome et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Aluome, Christelle
Aubert, Grégoire
Alves Carvalho, Susete
Le Paslier, Marie-Christine
Burstin, Judith
Brunel, Dominique
De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
title De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
title_full De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
title_fullStr De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
title_full_unstemmed De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
title_short De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
title_sort de novo construction of a “gene-space” for diploid plant genome rich in repetitive sequences by an iterative process of extraction and assembly of ngs reads (ipea protocol) with limited computing resources
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750290/
https://www.ncbi.nlm.nih.gov/pubmed/26864345
http://dx.doi.org/10.1186/s13104-016-1903-z
work_keys_str_mv AT aluomechristelle denovoconstructionofagenespacefordiploidplantgenomerichinrepetitivesequencesbyaniterativeprocessofextractionandassemblyofngsreadsipeaprotocolwithlimitedcomputingresources
AT aubertgregoire denovoconstructionofagenespacefordiploidplantgenomerichinrepetitivesequencesbyaniterativeprocessofextractionandassemblyofngsreadsipeaprotocolwithlimitedcomputingresources
AT alvescarvalhosusete denovoconstructionofagenespacefordiploidplantgenomerichinrepetitivesequencesbyaniterativeprocessofextractionandassemblyofngsreadsipeaprotocolwithlimitedcomputingresources
AT lepasliermariechristine denovoconstructionofagenespacefordiploidplantgenomerichinrepetitivesequencesbyaniterativeprocessofextractionandassemblyofngsreadsipeaprotocolwithlimitedcomputingresources
AT burstinjudith denovoconstructionofagenespacefordiploidplantgenomerichinrepetitivesequencesbyaniterativeprocessofextractionandassemblyofngsreadsipeaprotocolwithlimitedcomputingresources
AT bruneldominique denovoconstructionofagenespacefordiploidplantgenomerichinrepetitivesequencesbyaniterativeprocessofextractionandassemblyofngsreadsipeaprotocolwithlimitedcomputingresources