Cargando…

Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Verteb...

Descripción completa

Detalles Bibliográficos
Autores principales: Larivière, Delphine, Abueg, Linelle, Brajuka, Nadolina, Gallardo-Alba, Cristóbal, Grüning, Bjorn, Ko, Byung June, Ostrovsky, Alex, Palmada-Flores, Marc, Pickett, Brandon D., Rabbani, Keon, Balacco, Jennifer R., Chaisson, Mark, Cheng, Haoyu, Collins, Joanna, Denisova, Alexandra, Fedrigo, Olivier, Gallo, Guido Roberto, Giani, Alice Maria, Gooder, Grenville MacDonald, Jain, Nivesh, Johnson, Cassidy, Kim, Heebal, Lee, Chul, Marques-Bonet, Tomas, O'Toole, Brian, Rhie, Arang, Secomandi, Simona, Sozzoni, Marcella, Tilley, Tatiana, Uliano-Silva, Marcela, van den Beek, Marius, Waterhouse, Robert M., Phillippy, Adam M., Jarvis, Erich D., Schatz, Michael C., Nekrutenko, Anton, Formenti, Giulio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327048/
https://www.ncbi.nlm.nih.gov/pubmed/37425881
http://dx.doi.org/10.1101/2023.06.28.546576
_version_ 1785069548089638912
author Larivière, Delphine
Abueg, Linelle
Brajuka, Nadolina
Gallardo-Alba, Cristóbal
Grüning, Bjorn
Ko, Byung June
Ostrovsky, Alex
Palmada-Flores, Marc
Pickett, Brandon D.
Rabbani, Keon
Balacco, Jennifer R.
Chaisson, Mark
Cheng, Haoyu
Collins, Joanna
Denisova, Alexandra
Fedrigo, Olivier
Gallo, Guido Roberto
Giani, Alice Maria
Gooder, Grenville MacDonald
Jain, Nivesh
Johnson, Cassidy
Kim, Heebal
Lee, Chul
Marques-Bonet, Tomas
O'Toole, Brian
Rhie, Arang
Secomandi, Simona
Sozzoni, Marcella
Tilley, Tatiana
Uliano-Silva, Marcela
van den Beek, Marius
Waterhouse, Robert M.
Phillippy, Adam M.
Jarvis, Erich D.
Schatz, Michael C.
Nekrutenko, Anton
Formenti, Giulio
author_facet Larivière, Delphine
Abueg, Linelle
Brajuka, Nadolina
Gallardo-Alba, Cristóbal
Grüning, Bjorn
Ko, Byung June
Ostrovsky, Alex
Palmada-Flores, Marc
Pickett, Brandon D.
Rabbani, Keon
Balacco, Jennifer R.
Chaisson, Mark
Cheng, Haoyu
Collins, Joanna
Denisova, Alexandra
Fedrigo, Olivier
Gallo, Guido Roberto
Giani, Alice Maria
Gooder, Grenville MacDonald
Jain, Nivesh
Johnson, Cassidy
Kim, Heebal
Lee, Chul
Marques-Bonet, Tomas
O'Toole, Brian
Rhie, Arang
Secomandi, Simona
Sozzoni, Marcella
Tilley, Tatiana
Uliano-Silva, Marcela
van den Beek, Marius
Waterhouse, Robert M.
Phillippy, Adam M.
Jarvis, Erich D.
Schatz, Michael C.
Nekrutenko, Anton
Formenti, Giulio
author_sort Larivière, Delphine
collection PubMed
description Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).
format Online
Article
Text
id pubmed-10327048
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-103270482023-07-08 Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy Larivière, Delphine Abueg, Linelle Brajuka, Nadolina Gallardo-Alba, Cristóbal Grüning, Bjorn Ko, Byung June Ostrovsky, Alex Palmada-Flores, Marc Pickett, Brandon D. Rabbani, Keon Balacco, Jennifer R. Chaisson, Mark Cheng, Haoyu Collins, Joanna Denisova, Alexandra Fedrigo, Olivier Gallo, Guido Roberto Giani, Alice Maria Gooder, Grenville MacDonald Jain, Nivesh Johnson, Cassidy Kim, Heebal Lee, Chul Marques-Bonet, Tomas O'Toole, Brian Rhie, Arang Secomandi, Simona Sozzoni, Marcella Tilley, Tatiana Uliano-Silva, Marcela van den Beek, Marius Waterhouse, Robert M. Phillippy, Adam M. Jarvis, Erich D. Schatz, Michael C. Nekrutenko, Anton Formenti, Giulio bioRxiv Article Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals). Cold Spring Harbor Laboratory 2023-06-30 /pmc/articles/PMC10327048/ /pubmed/37425881 http://dx.doi.org/10.1101/2023.06.28.546576 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Larivière, Delphine
Abueg, Linelle
Brajuka, Nadolina
Gallardo-Alba, Cristóbal
Grüning, Bjorn
Ko, Byung June
Ostrovsky, Alex
Palmada-Flores, Marc
Pickett, Brandon D.
Rabbani, Keon
Balacco, Jennifer R.
Chaisson, Mark
Cheng, Haoyu
Collins, Joanna
Denisova, Alexandra
Fedrigo, Olivier
Gallo, Guido Roberto
Giani, Alice Maria
Gooder, Grenville MacDonald
Jain, Nivesh
Johnson, Cassidy
Kim, Heebal
Lee, Chul
Marques-Bonet, Tomas
O'Toole, Brian
Rhie, Arang
Secomandi, Simona
Sozzoni, Marcella
Tilley, Tatiana
Uliano-Silva, Marcela
van den Beek, Marius
Waterhouse, Robert M.
Phillippy, Adam M.
Jarvis, Erich D.
Schatz, Michael C.
Nekrutenko, Anton
Formenti, Giulio
Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
title Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
title_full Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
title_fullStr Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
title_full_unstemmed Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
title_short Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
title_sort scalable, accessible, and reproducible reference genome assembly and evaluation in galaxy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327048/
https://www.ncbi.nlm.nih.gov/pubmed/37425881
http://dx.doi.org/10.1101/2023.06.28.546576
work_keys_str_mv AT larivieredelphine scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT abueglinelle scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT brajukanadolina scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT gallardoalbacristobal scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT gruningbjorn scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT kobyungjune scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT ostrovskyalex scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT palmadafloresmarc scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT pickettbrandond scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT rabbanikeon scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT balaccojenniferr scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT chaissonmark scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT chenghaoyu scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT collinsjoanna scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT denisovaalexandra scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT fedrigoolivier scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT galloguidoroberto scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT gianialicemaria scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT goodergrenvillemacdonald scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT jainnivesh scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT johnsoncassidy scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT kimheebal scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT leechul scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT marquesbonettomas scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT otoolebrian scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT rhiearang scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT secomandisimona scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT sozzonimarcella scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT tilleytatiana scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT ulianosilvamarcela scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT vandenbeekmarius scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT waterhouserobertm scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT phillippyadamm scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT jarviserichd scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT schatzmichaelc scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT nekrutenkoanton scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy
AT formentigiulio scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy