Cargando…

Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods

BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference...

Descripción completa

Detalles Bibliográficos
Autores principales: Martin, Guillaume, Baurens, Franc-Christophe, Droc, Gaëtan, Rouard, Mathieu, Cenci, Alberto, Kilian, Andrzej, Hastie, Alex, Doležel, Jaroslav, Aury, Jean-Marc, Alberti, Adriana, Carreel, Françoise, D’Hont, Angélique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4793746/
https://www.ncbi.nlm.nih.gov/pubmed/26984673
http://dx.doi.org/10.1186/s12864-016-2579-4
_version_ 1782421415243284480
author Martin, Guillaume
Baurens, Franc-Christophe
Droc, Gaëtan
Rouard, Mathieu
Cenci, Alberto
Kilian, Andrzej
Hastie, Alex
Doležel, Jaroslav
Aury, Jean-Marc
Alberti, Adriana
Carreel, Françoise
D’Hont, Angélique
author_facet Martin, Guillaume
Baurens, Franc-Christophe
Droc, Gaëtan
Rouard, Mathieu
Cenci, Alberto
Kilian, Andrzej
Hastie, Alex
Doležel, Jaroslav
Aury, Jean-Marc
Alberti, Adriana
Carreel, Françoise
D’Hont, Angélique
author_sort Martin, Guillaume
collection PubMed
description BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4793746
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47937462016-03-17 Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods Martin, Guillaume Baurens, Franc-Christophe Droc, Gaëtan Rouard, Mathieu Cenci, Alberto Kilian, Andrzej Hastie, Alex Doležel, Jaroslav Aury, Jean-Marc Alberti, Adriana Carreel, Françoise D’Hont, Angélique BMC Genomics Methodology Article BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-16 /pmc/articles/PMC4793746/ /pubmed/26984673 http://dx.doi.org/10.1186/s12864-016-2579-4 Text en © Martin et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Martin, Guillaume
Baurens, Franc-Christophe
Droc, Gaëtan
Rouard, Mathieu
Cenci, Alberto
Kilian, Andrzej
Hastie, Alex
Doležel, Jaroslav
Aury, Jean-Marc
Alberti, Adriana
Carreel, Françoise
D’Hont, Angélique
Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
title Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
title_full Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
title_fullStr Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
title_full_unstemmed Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
title_short Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
title_sort improvement of the banana “musa acuminata” reference sequence using ngs data and semi-automated bioinformatics methods
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4793746/
https://www.ncbi.nlm.nih.gov/pubmed/26984673
http://dx.doi.org/10.1186/s12864-016-2579-4
work_keys_str_mv AT martinguillaume improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT baurensfrancchristophe improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT drocgaetan improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT rouardmathieu improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT cencialberto improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT kilianandrzej improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT hastiealex improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT dolezeljaroslav improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT auryjeanmarc improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT albertiadriana improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT carreelfrancoise improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods
AT dhontangelique improvementofthebananamusaacuminatareferencesequenceusingngsdataandsemiautomatedbioinformaticsmethods