Improvement of the banana Musa acuminata reference sequence using NGS data and semi-automated bioinformatics methods

Martin Guillaume, Baurens Franc-Christophe, Droc Gaëtan, Rouard Mathieu, Cenci Alberto, Kilian Andrzej, Hastie Alex, Dolezelova M., Aury Jean-Marc, Alberti Adriana, Carreel Françoise, D'Hont Angélique. 2016. Improvement of the banana Musa acuminata reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics, 17 (243), 12 p.

Journal article ; Article de recherche ; Article de revue à facteur d'impact Revue en libre accès total
Published version - Anglais
Use under authorization by the author or CIRAD.

Télécharger (1MB) | Preview
Published version - Anglais
Use under authorization by the author or CIRAD.

Télécharger (1MB) | Preview

Url - jeu de données : / Url - jeu de données :


Additional Information : Jeux de données : "Availability of supporting data Datasets (contigs, scaffold assembly, Pseudo-molecules, makers matrix and raw data of the genome map) are available through the banana genome hub ( and the 5 kb library is deposited on the ENA read archive (ID number: ERP013665)"

Abstract : Background: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). Results: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %. Conclusion: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species. (Résumé d'auteur)

Mots-clés Agrovoc : Musa acuminata, Bioinformatique, Génome, Séquence nucléotidique, ADN, Carte génétique, Génétique des populations, Génie génétique, Marqueur génétique

Mots-clés géographiques Agrovoc : Guadeloupe

Mots-clés complémentaires : Séquencage

Mots-clés libres : Musa acuminata, Genome assembly, Bioinformatics tool, Paired-end sequences, GBS, Genome map

Classification Agris : F30 - Plant genetics and breeding
U10 - Computer science, mathematics and statistics

Champ stratégique Cirad : Axe 1 (2014-2018) - Agriculture écologiquement intensive

Auteurs et affiliations

  • Martin Guillaume, CIRAD-BIOS-UMR AGAP (FRA) ORCID: 0000-0002-1801-7500
  • Baurens Franc-Christophe, CIRAD-BIOS-UMR AGAP (FRA) ORCID: 0000-0002-5219-8771
  • Droc Gaëtan, CIRAD-BIOS-UMR AGAP (FRA)
  • Rouard Mathieu, Bioversity International (FRA)
  • Cenci Alberto, Bioversity International (ITA)
  • Kilian Andrzej, Diversity Arrays Technology (AUS)
  • Hastie Alex, BioNano Genomics (USA)
  • Dolezelova M., Institute of Experimental Botany (CZE)
  • Aury Jean-Marc, CEA (FRA)
  • Alberti Adriana, CEA (FRA)
  • Carreel Françoise, CIRAD-BIOS-UMR AGAP (FRA)
  • D'Hont Angélique, CIRAD-BIOS-UMR AGAP (FRA)

Source : Cirad-Agritrop (

View Item (staff only) View Item (staff only)

[ Page générée et mise en cache le 2021-02-26 ]