Cargando…

Hybrid assembly with long and short reads improves discovery of gene family expansions

BACKGROUND: Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. METHODS: We developed a hybrid assembly pipeline called “Alpaca” that can operat...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Jason R., Zhou, Peng, Mudge, Joann, Gurtowski, James, Lee, Hayan, Ramaraj, Thiruvarangan, Walenz, Brian P., Liu, Junqi, Stupar, Robert M., Denny, Roxanne, Song, Li, Singh, Namrata, Maron, Lyza G., McCouch, Susan R., McCombie, W. Richard, Schatz, Michael C., Tiffin, Peter, Young, Nevin D., Silverstein, Kevin A. T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518131/
https://www.ncbi.nlm.nih.gov/pubmed/28724409
http://dx.doi.org/10.1186/s12864-017-3927-8
_version_ 1783251431953793024
author Miller, Jason R.
Zhou, Peng
Mudge, Joann
Gurtowski, James
Lee, Hayan
Ramaraj, Thiruvarangan
Walenz, Brian P.
Liu, Junqi
Stupar, Robert M.
Denny, Roxanne
Song, Li
Singh, Namrata
Maron, Lyza G.
McCouch, Susan R.
McCombie, W. Richard
Schatz, Michael C.
Tiffin, Peter
Young, Nevin D.
Silverstein, Kevin A. T.
author_facet Miller, Jason R.
Zhou, Peng
Mudge, Joann
Gurtowski, James
Lee, Hayan
Ramaraj, Thiruvarangan
Walenz, Brian P.
Liu, Junqi
Stupar, Robert M.
Denny, Roxanne
Song, Li
Singh, Namrata
Maron, Lyza G.
McCouch, Susan R.
McCombie, W. Richard
Schatz, Michael C.
Tiffin, Peter
Young, Nevin D.
Silverstein, Kevin A. T.
author_sort Miller, Jason R.
collection PubMed
description BACKGROUND: Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. METHODS: We developed a hybrid assembly pipeline called “Alpaca” that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation. RESULTS: Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies. CONCLUSION: Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3927-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5518131
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55181312017-08-16 Hybrid assembly with long and short reads improves discovery of gene family expansions Miller, Jason R. Zhou, Peng Mudge, Joann Gurtowski, James Lee, Hayan Ramaraj, Thiruvarangan Walenz, Brian P. Liu, Junqi Stupar, Robert M. Denny, Roxanne Song, Li Singh, Namrata Maron, Lyza G. McCouch, Susan R. McCombie, W. Richard Schatz, Michael C. Tiffin, Peter Young, Nevin D. Silverstein, Kevin A. T. BMC Genomics Methodology Article BACKGROUND: Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. METHODS: We developed a hybrid assembly pipeline called “Alpaca” that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation. RESULTS: Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies. CONCLUSION: Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3927-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-19 /pmc/articles/PMC5518131/ /pubmed/28724409 http://dx.doi.org/10.1186/s12864-017-3927-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Miller, Jason R.
Zhou, Peng
Mudge, Joann
Gurtowski, James
Lee, Hayan
Ramaraj, Thiruvarangan
Walenz, Brian P.
Liu, Junqi
Stupar, Robert M.
Denny, Roxanne
Song, Li
Singh, Namrata
Maron, Lyza G.
McCouch, Susan R.
McCombie, W. Richard
Schatz, Michael C.
Tiffin, Peter
Young, Nevin D.
Silverstein, Kevin A. T.
Hybrid assembly with long and short reads improves discovery of gene family expansions
title Hybrid assembly with long and short reads improves discovery of gene family expansions
title_full Hybrid assembly with long and short reads improves discovery of gene family expansions
title_fullStr Hybrid assembly with long and short reads improves discovery of gene family expansions
title_full_unstemmed Hybrid assembly with long and short reads improves discovery of gene family expansions
title_short Hybrid assembly with long and short reads improves discovery of gene family expansions
title_sort hybrid assembly with long and short reads improves discovery of gene family expansions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5518131/
https://www.ncbi.nlm.nih.gov/pubmed/28724409
http://dx.doi.org/10.1186/s12864-017-3927-8
work_keys_str_mv AT millerjasonr hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT zhoupeng hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT mudgejoann hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT gurtowskijames hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT leehayan hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT ramarajthiruvarangan hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT walenzbrianp hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT liujunqi hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT stuparrobertm hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT dennyroxanne hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT songli hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT singhnamrata hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT maronlyzag hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT mccouchsusanr hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT mccombiewrichard hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT schatzmichaelc hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT tiffinpeter hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT youngnevind hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions
AT silversteinkevinat hybridassemblywithlongandshortreadsimprovesdiscoveryofgenefamilyexpansions