Cargando…

Progressive Cactus is a multiple-genome aligner for the thousand-genome era

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies(1–3). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Informati...

Descripción completa

Detalles Bibliográficos
Autores principales: Armstrong, Joel, Hickey, Glenn, Diekhans, Mark, Fiddes, Ian T., Novak, Adam M., Deran, Alden, Fang, Qi, Xie, Duo, Feng, Shaohong, Stiller, Josefin, Genereux, Diane, Johnson, Jeremy, Marinescu, Voichita Dana, Alföldi, Jessica, Harris, Robert S., Lindblad-Toh, Kerstin, Haussler, David, Karlsson, Elinor, Jarvis, Erich D., Zhang, Guojie, Paten, Benedict
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673649/
https://www.ncbi.nlm.nih.gov/pubmed/33177663
http://dx.doi.org/10.1038/s41586-020-2871-y
_version_ 1783611362235121664
author Armstrong, Joel
Hickey, Glenn
Diekhans, Mark
Fiddes, Ian T.
Novak, Adam M.
Deran, Alden
Fang, Qi
Xie, Duo
Feng, Shaohong
Stiller, Josefin
Genereux, Diane
Johnson, Jeremy
Marinescu, Voichita Dana
Alföldi, Jessica
Harris, Robert S.
Lindblad-Toh, Kerstin
Haussler, David
Karlsson, Elinor
Jarvis, Erich D.
Zhang, Guojie
Paten, Benedict
author_facet Armstrong, Joel
Hickey, Glenn
Diekhans, Mark
Fiddes, Ian T.
Novak, Adam M.
Deran, Alden
Fang, Qi
Xie, Duo
Feng, Shaohong
Stiller, Josefin
Genereux, Diane
Johnson, Jeremy
Marinescu, Voichita Dana
Alföldi, Jessica
Harris, Robert S.
Lindblad-Toh, Kerstin
Haussler, David
Karlsson, Elinor
Jarvis, Erich D.
Zhang, Guojie
Paten, Benedict
author_sort Armstrong, Joel
collection PubMed
description New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies(1–3). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database(4) increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies(5) are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus(6), a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
format Online
Article
Text
id pubmed-7673649
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-76736492021-05-06 Progressive Cactus is a multiple-genome aligner for the thousand-genome era Armstrong, Joel Hickey, Glenn Diekhans, Mark Fiddes, Ian T. Novak, Adam M. Deran, Alden Fang, Qi Xie, Duo Feng, Shaohong Stiller, Josefin Genereux, Diane Johnson, Jeremy Marinescu, Voichita Dana Alföldi, Jessica Harris, Robert S. Lindblad-Toh, Kerstin Haussler, David Karlsson, Elinor Jarvis, Erich D. Zhang, Guojie Paten, Benedict Nature Article New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies(1–3). For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database(4) increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies(5) are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus(6), a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far. Nature Publishing Group UK 2020-11-11 2020 /pmc/articles/PMC7673649/ /pubmed/33177663 http://dx.doi.org/10.1038/s41586-020-2871-y Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Armstrong, Joel
Hickey, Glenn
Diekhans, Mark
Fiddes, Ian T.
Novak, Adam M.
Deran, Alden
Fang, Qi
Xie, Duo
Feng, Shaohong
Stiller, Josefin
Genereux, Diane
Johnson, Jeremy
Marinescu, Voichita Dana
Alföldi, Jessica
Harris, Robert S.
Lindblad-Toh, Kerstin
Haussler, David
Karlsson, Elinor
Jarvis, Erich D.
Zhang, Guojie
Paten, Benedict
Progressive Cactus is a multiple-genome aligner for the thousand-genome era
title Progressive Cactus is a multiple-genome aligner for the thousand-genome era
title_full Progressive Cactus is a multiple-genome aligner for the thousand-genome era
title_fullStr Progressive Cactus is a multiple-genome aligner for the thousand-genome era
title_full_unstemmed Progressive Cactus is a multiple-genome aligner for the thousand-genome era
title_short Progressive Cactus is a multiple-genome aligner for the thousand-genome era
title_sort progressive cactus is a multiple-genome aligner for the thousand-genome era
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673649/
https://www.ncbi.nlm.nih.gov/pubmed/33177663
http://dx.doi.org/10.1038/s41586-020-2871-y
work_keys_str_mv AT armstrongjoel progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT hickeyglenn progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT diekhansmark progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT fiddesiant progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT novakadamm progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT deranalden progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT fangqi progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT xieduo progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT fengshaohong progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT stillerjosefin progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT genereuxdiane progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT johnsonjeremy progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT marinescuvoichitadana progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT alfoldijessica progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT harrisroberts progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT lindbladtohkerstin progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT hausslerdavid progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT karlssonelinor progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT jarviserichd progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT zhangguojie progressivecactusisamultiplegenomealignerforthethousandgenomeera
AT patenbenedict progressivecactusisamultiplegenomealignerforthethousandgenomeera