Cargando…
Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy
Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Verteb...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327048/ https://www.ncbi.nlm.nih.gov/pubmed/37425881 http://dx.doi.org/10.1101/2023.06.28.546576 |
_version_ | 1785069548089638912 |
---|---|
author | Larivière, Delphine Abueg, Linelle Brajuka, Nadolina Gallardo-Alba, Cristóbal Grüning, Bjorn Ko, Byung June Ostrovsky, Alex Palmada-Flores, Marc Pickett, Brandon D. Rabbani, Keon Balacco, Jennifer R. Chaisson, Mark Cheng, Haoyu Collins, Joanna Denisova, Alexandra Fedrigo, Olivier Gallo, Guido Roberto Giani, Alice Maria Gooder, Grenville MacDonald Jain, Nivesh Johnson, Cassidy Kim, Heebal Lee, Chul Marques-Bonet, Tomas O'Toole, Brian Rhie, Arang Secomandi, Simona Sozzoni, Marcella Tilley, Tatiana Uliano-Silva, Marcela van den Beek, Marius Waterhouse, Robert M. Phillippy, Adam M. Jarvis, Erich D. Schatz, Michael C. Nekrutenko, Anton Formenti, Giulio |
author_facet | Larivière, Delphine Abueg, Linelle Brajuka, Nadolina Gallardo-Alba, Cristóbal Grüning, Bjorn Ko, Byung June Ostrovsky, Alex Palmada-Flores, Marc Pickett, Brandon D. Rabbani, Keon Balacco, Jennifer R. Chaisson, Mark Cheng, Haoyu Collins, Joanna Denisova, Alexandra Fedrigo, Olivier Gallo, Guido Roberto Giani, Alice Maria Gooder, Grenville MacDonald Jain, Nivesh Johnson, Cassidy Kim, Heebal Lee, Chul Marques-Bonet, Tomas O'Toole, Brian Rhie, Arang Secomandi, Simona Sozzoni, Marcella Tilley, Tatiana Uliano-Silva, Marcela van den Beek, Marius Waterhouse, Robert M. Phillippy, Adam M. Jarvis, Erich D. Schatz, Michael C. Nekrutenko, Anton Formenti, Giulio |
author_sort | Larivière, Delphine |
collection | PubMed |
description | Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals). |
format | Online Article Text |
id | pubmed-10327048 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-103270482023-07-08 Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy Larivière, Delphine Abueg, Linelle Brajuka, Nadolina Gallardo-Alba, Cristóbal Grüning, Bjorn Ko, Byung June Ostrovsky, Alex Palmada-Flores, Marc Pickett, Brandon D. Rabbani, Keon Balacco, Jennifer R. Chaisson, Mark Cheng, Haoyu Collins, Joanna Denisova, Alexandra Fedrigo, Olivier Gallo, Guido Roberto Giani, Alice Maria Gooder, Grenville MacDonald Jain, Nivesh Johnson, Cassidy Kim, Heebal Lee, Chul Marques-Bonet, Tomas O'Toole, Brian Rhie, Arang Secomandi, Simona Sozzoni, Marcella Tilley, Tatiana Uliano-Silva, Marcela van den Beek, Marius Waterhouse, Robert M. Phillippy, Adam M. Jarvis, Erich D. Schatz, Michael C. Nekrutenko, Anton Formenti, Giulio bioRxiv Article Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals). Cold Spring Harbor Laboratory 2023-06-30 /pmc/articles/PMC10327048/ /pubmed/37425881 http://dx.doi.org/10.1101/2023.06.28.546576 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Larivière, Delphine Abueg, Linelle Brajuka, Nadolina Gallardo-Alba, Cristóbal Grüning, Bjorn Ko, Byung June Ostrovsky, Alex Palmada-Flores, Marc Pickett, Brandon D. Rabbani, Keon Balacco, Jennifer R. Chaisson, Mark Cheng, Haoyu Collins, Joanna Denisova, Alexandra Fedrigo, Olivier Gallo, Guido Roberto Giani, Alice Maria Gooder, Grenville MacDonald Jain, Nivesh Johnson, Cassidy Kim, Heebal Lee, Chul Marques-Bonet, Tomas O'Toole, Brian Rhie, Arang Secomandi, Simona Sozzoni, Marcella Tilley, Tatiana Uliano-Silva, Marcela van den Beek, Marius Waterhouse, Robert M. Phillippy, Adam M. Jarvis, Erich D. Schatz, Michael C. Nekrutenko, Anton Formenti, Giulio Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy |
title | Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy |
title_full | Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy |
title_fullStr | Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy |
title_full_unstemmed | Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy |
title_short | Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy |
title_sort | scalable, accessible, and reproducible reference genome assembly and evaluation in galaxy |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327048/ https://www.ncbi.nlm.nih.gov/pubmed/37425881 http://dx.doi.org/10.1101/2023.06.28.546576 |
work_keys_str_mv | AT larivieredelphine scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT abueglinelle scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT brajukanadolina scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT gallardoalbacristobal scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT gruningbjorn scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT kobyungjune scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT ostrovskyalex scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT palmadafloresmarc scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT pickettbrandond scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT rabbanikeon scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT balaccojenniferr scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT chaissonmark scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT chenghaoyu scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT collinsjoanna scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT denisovaalexandra scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT fedrigoolivier scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT galloguidoroberto scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT gianialicemaria scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT goodergrenvillemacdonald scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT jainnivesh scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT johnsoncassidy scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT kimheebal scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT leechul scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT marquesbonettomas scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT otoolebrian scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT rhiearang scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT secomandisimona scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT sozzonimarcella scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT tilleytatiana scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT ulianosilvamarcela scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT vandenbeekmarius scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT waterhouserobertm scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT phillippyadamm scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT jarviserichd scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT schatzmichaelc scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT nekrutenkoanton scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy AT formentigiulio scalableaccessibleandreproduciblereferencegenomeassemblyandevaluationingalaxy |