Cargando…

De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

BACKGROUND: The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the l...

Descripción completa

Detalles Bibliográficos
Autores principales: Aluome, Christelle, Aubert, Grégoire, Alves Carvalho, Susete, Le Paslier, Marie-Christine, Burstin, Judith, Brunel, Dominique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4750290/
https://www.ncbi.nlm.nih.gov/pubmed/26864345
http://dx.doi.org/10.1186/s13104-016-1903-z
Descripción
Sumario:BACKGROUND: The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the “Gene-space” regions including the promoter, exon and intron sequences are considered. RESULTS: We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisumsativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array. CONCLUSION: The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-1903-z) contains supplementary material, which is available to authorized users.